SOTAVerified

Automatic Speech Recognition

Papers

Showing 2650 of 3174 papers

TitleStatusHype
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic LearningCode2
4-bit Conformer with Native Quantization Aware Training for Speech RecognitionCode2
Robust Self-Supervised Audio-Visual Speech RecognitionCode2
AIR-Bench: Benchmarking Large Audio-Language Models via Generative ComprehensionCode2
Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic UnitsCode2
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU LanguagesCode2
NusaCrowd: Open Source Initiative for Indonesian NLP ResourcesCode2
PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker RecordingsCode2
PromptASR for contextualized ASR with controllable styleCode2
Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text RecognitionCode2
Large Language Models are Strong Audio-Visual Speech Recognition LearnersCode2
LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR ModelsCode2
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech CodecCode2
Fast Transformers with Clustered AttentionCode2
LiteASR: Efficient Automatic Speech Recognition with Low-Rank ApproximationCode2
Dialectal Coverage And Generalization in Arabic Speech RecognitionCode2
DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech RecognitionCode2
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile InstructionsCode2
Large Language Models are Efficient Learners of Noise-Robust Speech RecognitionCode2
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPTCode2
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster PredictionCode2
An Embarrassingly Simple Approach for LLM with Strong ASR CapacityCode2
CMGAN: Conformer-Based Metric-GAN for Monaural Speech EnhancementCode2
BLASER: A Text-Free Speech-to-Speech Translation Evaluation MetricCode2
Auto-AVSR: Audio-Visual Speech Recognition with Automatic LabelsCode2
Show:102550
← PrevPage 2 of 127Next →

No leaderboard results yet.