Audio Generation
Audio generation (synthesis) is the task of generating raw audio such as speech.
( Image credit: MelNet )
Papers
Showing 1–10 of 270 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | AudioGen | FD_openl3 | 185.53 | — | Unverified |
| 2 | AudioLDM2-large | FD_openl3 | 158.04 | — | Unverified |
| 3 | Stable Audio 2.0 | FD_openl3 | 110.62 | — | Unverified |
| 4 | Stable Audio | FD_openl3 | 103.66 | — | Unverified |
| 5 | ETTA | FD_openl3 | 80.13 | — | Unverified |
| 6 | TangoFlux-base | FD_openl3 | 79.7 | — | Unverified |
| 7 | Stable Audio Open | FD_openl3 | 78.24 | — | Unverified |
| 8 | TangoFlux | FD_openl3 | 75.1 | — | Unverified |
| 9 | ETTA-FT-AC-100k | FD_openl3 | 61.79 | — | Unverified |
| 10 | Diffsound | FAD | 7.75 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | VAB-Encodec (Ours) | Bits per byte | 40 | — | Unverified |
| 2 | Sparse Transformer 152M (strided) | Bits per byte | 1.97 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | SymphonyNet | Human listening average results | 3.5 | — | Unverified |