---
license: cc-by-4.0
track_downloads: true
language:
- en
- es
- fr
- de
- bg
- hr
- cs
- da
- nl
- et
- fi
- el
- hu
- it
- lv
- lt
- mt
- pl
- pt
- ro
- sk
- sl
- sv
- ru
- uk

pipeline_tag: automatic-speech-recognition
library_name: nemo
datasets:
- nvidia/Granary
- nemo/asr-set-3.0
thumbnail: null
tags:
- automatic-speech-recognition
- speech
- audio
- Transducer
- TDT
- FastConformer
- Conformer
- pytorch
- NeMo
- hf-asr-leaderboard
widget:
- example_title: Librispeech sample 1
  src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
- example_title: Librispeech sample 2
  src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
model-index:
- name: parakeet-tdt-0.6b-v3
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: AMI (Meetings test)
      type: edinburghcstr/ami
      config: ihm
      split: test
      args:
        language: en
    metrics:
    - name: Test WER
      type: wer
      value: 11.31
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Earnings-22
      type: revdotcom/earnings22
      split: test
      args:
        language: en
    metrics:
    - name: Test WER
      type: wer
      value: 11.42
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: GigaSpeech
      type: speechcolab/gigaspeech
      split: test
      args:
        language: en
    metrics:
    - name: Test WER
      type: wer
      value: 9.59
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: LibriSpeech (clean)
      type: librispeech_asr
      config: other
      split: test
      args:
        language: en
    metrics:
    - name: Test WER
      type: wer
      value: 1.93
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: LibriSpeech (other)
      type: librispeech_asr
      config: other
      split: test
      args:
        language: en
    metrics:
    - name: Test WER
      type: wer
      value: 3.59
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: SPGI Speech
      type: kensho/spgispeech
      config: test
      split: test
      args:
        language: en
    metrics:
    - name: Test WER
      type: wer
      value: 3.97
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: tedlium-v3
      type: LIUM/tedlium
      config: release1
      split: test
      args:
        language: en
    metrics:
    - name: Test WER
      type: wer
      value: 2.75
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Vox Populi
      type: facebook/voxpopuli
      config: en
      split: test
      args:
        language: en
    metrics:
    - name: Test WER
      type: wer
      value: 6.14
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: bg_bg
      split: test
      args:
        language: bg
    metrics:
    - name: Test WER (Bg)
      type: wer
      value: 12.64
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: cs_cz
      split: test
      args:
        language: cs
    metrics:
    - name: Test WER (Cs)
      type: wer
      value: 11.01
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: da_dk
      split: test
      args:
        language: da
    metrics:
    - name: Test WER (Da)
      type: wer
      value: 18.41
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: de_de
      split: test
      args:
        language: de
    metrics:
    - name: Test WER (De)
      type: wer
      value: 5.04
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: el_gr
      split: test
      args:
        language: el
    metrics:
    - name: Test WER (El)
      type: wer
      value: 20.70
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: en_us
      split: test
      args:
        language: en
    metrics:
    - name: Test WER (En)
      type: wer
      value: 4.85
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: es_419
      split: test
      args:
        language: es
    metrics:
    - name: Test WER (Es)
      type: wer
      value: 3.45
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: et_ee
      split: test
      args:
        language: et
    metrics:
    - name: Test WER (Et)
      type: wer
      value: 17.73
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: fi_fi
      split: test
      args:
        language: fi
    metrics:
    - name: Test WER (Fi)
      type: wer
      value: 13.21
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: fr_fr
      split: test
      args:
        language: fr
    metrics:
    - name: Test WER (Fr)
      type: wer
      value: 5.15
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: hr_hr
      split: test
      args:
        language: hr
    metrics:
    - name: Test WER (Hr)
      type: wer
      value: 12.46
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: hu_hu
      split: test
      args:
        language: hu
    metrics:
    - name: Test WER (Hu)
      type: wer
      value: 15.72
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: it_it
      split: test
      args:
        language: it
    metrics:
    - name: Test WER (It)
      type: wer
      value: 3.00
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: lt_lt
      split: test
      args:
        language: lt
    metrics:
    - name: Test WER (Lt)
      type: wer
      value: 20.35
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: lv_lv
      split: test
      args:
        language: lv
    metrics:
    - name: Test WER (Lv)
      type: wer
      value: 22.84
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: mt_mt
      split: test
      args:
        language: mt
    metrics:
    - name: Test WER (Mt)
      type: wer
      value: 20.46
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: nl_nl
      split: test
      args:
        language: nl
    metrics:
    - name: Test WER (Nl)
      type: wer
      value: 7.48
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: pl_pl
      split: test
      args:
        language: pl
    metrics:
    - name: Test WER (Pl)
      type: wer
      value: 7.31
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: pt_br
      split: test
      args:
        language: pt
    metrics:
    - name: Test WER (Pt)
      type: wer
      value: 4.76
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: ro_ro
      split: test
      args:
        language: ro
    metrics:
    - name: Test WER (Ro)
      type: wer
      value: 12.44
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: ru_ru
      split: test
      args:
        language: ru
    metrics:
    - name: Test WER (Ru)
      type: wer
      value: 5.51
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: sk_sk
      split: test
      args:
        language: sk
    metrics:
    - name: Test WER (Sk)
      type: wer
      value: 8.82
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: sl_si
      split: test
      args:
        language: sl
    metrics:
    - name: Test WER (Sl)
      type: wer
      value: 24.03
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: sv_se
      split: test
      args:
        language: sv
    metrics:
    - name: Test WER (Sv)
      type: wer
      value: 15.08
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: FLEURS
      type: google/fleurs
      config: uk_ua
      split: test
      args:
        language: uk
    metrics:
    - name: Test WER (Uk)
      type: wer
      value: 6.79
  # Multilingual LibriSpeech ASR Results
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: Multilingual LibriSpeech
      type: facebook/multilingual_librispeech
      config: spanish
      split: test
      args:
        language: es
    metrics:
    - name: Test WER (Es)
      type: wer
      value: 4.39
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: Multilingual LibriSpeech
      type: facebook/multilingual_librispeech
      config: french
      split: test
      args:
        language: fr
    metrics:
    - name: Test WER (Fr)
      type: wer
      value: 4.97
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: Multilingual LibriSpeech
      type: facebook/multilingual_librispeech
      config: italian
      split: test
      args:
        language: it
    metrics:
    - name: Test WER (It)
      type: wer
      value: 10.08
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: Multilingual LibriSpeech
      type: facebook/multilingual_librispeech
      config: dutch
      split: test
      args:
        language: nl
    metrics:
    - name: Test WER (Nl)
      type: wer
      value: 12.78
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: Multilingual LibriSpeech
      type: facebook/multilingual_librispeech
      config: polish
      split: test
      args:
        language: pl
    metrics:
    - name: Test WER (Pl)
      type: wer
      value: 7.28
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: Multilingual LibriSpeech
      type: facebook/multilingual_librispeech
      config: portuguese
      split: test
      args:
        language: pt
    metrics:
    - name: Test WER (Pt)
      type: wer
      value: 7.50
  # CoVoST2 ASR Results
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: CoVoST2
      type: covost2
      config: de
      split: test
      args:
        language: de
    metrics:
    - name: Test WER (De)
      type: wer
      value: 4.84
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: CoVoST2
      type: covost2
      config: en
      split: test
      args:
        language: en
    metrics:
    - name: Test WER (En)
      type: wer
      value: 6.80
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: CoVoST2
      type: covost2
      config: es
      split: test
      args:
        language: es
    metrics:
    - name: Test WER (Es)
      type: wer
      value: 3.41
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: CoVoST2
      type: covost2
      config: et
      split: test
      args:
        language: et
    metrics:
    - name: Test WER (Et)
      type: wer
      value: 22.04
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: CoVoST2
      type: covost2
      config: fr
      split: test
      args:
        language: fr
    metrics:
    - name: Test WER (Fr)
      type: wer
      value: 6.05
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: CoVoST2
      type: covost2
      config: it
      split: test
      args:
        language: it
    metrics:
    - name: Test WER (It)
      type: wer
      value: 3.69
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: CoVoST2
      type: covost2
      config: lv
      split: test
      args:
        language: lv
    metrics:
    - name: Test WER (Lv)
      type: wer
      value: 38.36
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: CoVoST2
      type: covost2
      config: nl
      split: test
      args:
        language: nl
    metrics:
    - name: Test WER (Nl)
      type: wer
      value: 6.50
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: CoVoST2
      type: covost2
      config: pt
      split: test
      args:
        language: pt
    metrics:
    - name: Test WER (Pt)
      type: wer
      value: 3.96
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: CoVoST2
      type: covost2
      config: ru
      split: test
      args:
        language: ru
    metrics:
    - name: Test WER (Ru)
      type: wer
      value: 3.00
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: CoVoST2
      type: covost2
      config: sl
      split: test
      args:
        language: sl
    metrics:
    - name: Test WER (Sl)
      type: wer
      value: 31.80
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: CoVoST2
      type: covost2
      config: sv
      split: test
      args:
        language: sv
    metrics:
    - name: Test WER (Sv)
      type: wer
      value: 20.16
  - task:
      type: Automatic Speech Recognition
      name: automatic-speech-recognition
    dataset:
      name: CoVoST2
      type: covost2
      config: uk
      split: test
      args:
        language: uk
    metrics:
    - name: Test WER (Uk)
      type: wer
      value: 5.10
metrics:
- wer
---

# **<span style="color:#76b900;">🦜 parakeet-tdt-0.6b-v3: Multilingual Speech-to-Text Model</span>**

<style>
img {
 display: inline;
}
</style>

[![Model architecture](https://img.shields.io/badge/Model_Arch-FastConformer--TDT-blue#model-badge)](#model-architecture)
| [![Model size](https://img.shields.io/badge/Params-0.6B-green#model-badge)](#model-architecture)
| [![Language](https://img.shields.io/badge/Language-EU_Languages-blue#model-badge)](#datasets)

## <span style="color:#466f00;">Description:</span>

`parakeet-tdt-0.6b-v3` is a 600-million-parameter multilingual automatic speech recognition (ASR) model designed for high-throughput speech-to-text transcription. It extends the [parakeet-tdt-0.6b-v2](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) model by expanding language support from English to 25 European languages. The model automatically detects the language of the audio and transcribes it without requiring additional prompting. It is part of a series of models that leverage the [Granary](https://huggingface.co/datasets/nvidia/Granary) [1, 2] multilingual corpus as their primary training dataset.

🗣️ Try Demo here: https://huggingface.co/spaces/nvidia/parakeet-tdt-0.6b-v3 

**Supported Languages:**  
Bulgarian (**bg**), Croatian (**hr**), Czech (**cs**), Danish (**da**), Dutch (**nl**), English (**en**), Estonian (**et**), Finnish (**fi**), French (**fr**), German (**de**), Greek (**el**), Hungarian (**hu**), Italian (**it**), Latvian (**lv**), Lithuanian (**lt**), Maltese (**mt**), Polish (**pl**), Portuguese (**pt**), Romanian (**ro**), Slovak (**sk**), Slovenian (**sl**), Spanish (**es**), Swedish (**sv**), Russian (**ru**), Ukrainian (**uk**)

This model is ready for commercial/non-commercial use.