Audio-Visual Speech Recognition
Audio-visual speech recognition is the task of transcribing a paired audio and visual stream into text.
Papers
Showing 1–10 of 100 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | AVCRFormer | Top-1 Accuracy | 98.81 | — | Unverified |
| 2 | 2DCNN + BiLSTM + ResNet + MLF | Top-1 Accuracy | 98.76 | — | Unverified |
| 3 | PBL | Top-1 Accuracy | 98.3 | — | Unverified |