ASR demo using onnx-asr

onnx-asr is a Python package for Automatic Speech Recognition using ONNX models. The package is written in pure Python with minimal dependencies (no pytorch or transformers).

Supports Parakeet TDT 0.6B V2 (En) and GigaAM v2 (Ru) models (and many other modern models). You can also use it with your own model if it has a supported architecture.

Russian ASR models

  • gigaam-v2-ctc - Sber GigaAM v2 CTC (origin, onnx)
  • gigaam-v2-rnnt - Sber GigaAM v2 RNN-T (origin, onnx)
  • nemo-fastconformer-ru-ctc - Nvidia FastConformer-Hybrid Large (ru) with CTC decoder (origin, onnx)
  • nemo-fastconformer-ru-rnnt - Nvidia FastConformer-Hybrid Large (ru) with RNN-T decoder (origin, onnx)
  • whisper-base - OpenAI Whisper Base exported with onnxruntime (origin, onnx)
  • alphacep/vosk-model-ru - Alpha Cephei Vosk 0.54-ru (origin)
  • alphacep/vosk-model-small-ru - Alpha Cephei Vosk 0.52-small-ru (origin)

English ASR models

  • nemo-parakeet-ctc-0.6b - Nvidia Parakeet CTC 0.6B (en) (origin, onnx)
  • nemo-parakeet-tdt-0.6b-v2 - Nvidia Parakeet TDT 0.6B V2 (en) (origin, onnx)
  • whisper-base - OpenAI Whisper Base exported with onnxruntime (origin, onnx)

VAD models