Audio-Visual Speech Recognition
Audio-visual speech recognition is the task of transcribing a paired audio and visual stream into text.
Papers
Showing 1–10 of 100 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | ES³ Base* | Word Error Rate (WER) | 11 | — | Unverified |