SOTAVerified

Audio Synthesis

Papers

Showing 150 of 127 papers

TitleStatusHype
MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling0
Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance0
Diffusion-Based Symbolic Regression0
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet0
Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism0
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis0
Fast Differentiable Modal Simulation of Non-linear Strings, Membranes, and PlatesCode1
Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis0
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis0
Designing Neural Synthesizers for Low-Latency Interaction0
Long-Video Audio Synthesis with Multi-Agent Collaboration0
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision0
XAttnMark: Learning Robust Audio Watermarking with Cross-Attention0
Generative diffusion model with inverse renormalization group flowsCode1
Customized Condition Controllable Generation for Video Soundtrack0
Tri-Ergon: Fine-grained Video-to-Audio Generation with Multi-modal Conditions and LUFS Control0
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio SynthesisCode7
CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder0
Zero-Shot Mono-to-Binaural Speech Synthesis0
Generalized Diffusion Model with Adjusted Offset Noise0
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified FlowsCode2
VQalAttent: a Transparent Speech Generation Pipeline based on Transformer-learned VQ-VAE Latent Space0
Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement0
Array2BR: An End-to-End Noise-immune Binaural Audio Synthesis from Microphone-array Signals0
Where are we in audio deepfake detection? A systematic analysis over generative and detection modelsCode1
PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models0
ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend ConditioningCode1
D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack0
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis0
Fast, High-Quality and Parameter-Efficient Articulatory Synthesis using Differentiable DSP0
Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani Classical Music0
Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound0
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos0
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech20
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis0
LiteFocus: Accelerated Diffusion Inference for Long Audio SynthesisCode1
Taming Data and Transformers for Audio GenerationCode2
AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis0
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems0
Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-SynthesisCode2
AudioLCM: Text-to-Audio Generation with Latent Consistency ModelsCode5
Creative Text-to-Audio Generation via Synthesizer Programming0
Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting0
Differentiable All-pole Filters for Time-varying Audio SystemsCode2
Diffusion-TS: Interpretable Diffusion for General Time Series GenerationCode3
Text2Data: Low-Resource Data Generation with Textual Control0
DiffMoog: a Differentiable Modular Synthesizer for Sound MatchingCode2
T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound SynthesisCode1
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head ModelsCode2
Fast Diffusion GAN Model for Symbolic Music Generation Controlled by Emotions0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.