SOTAVerified

FAD

Papers

Showing 150 of 62 papers

TitleStatusHype
Addressing Emotion Bias in Music Emotion Recognition and Generation with Frechet Audio DistanceCode3
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video GenerationCode2
L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object DetectionCode2
Adapting Frechet Audio Distance for Generative Music EvaluationCode2
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion ModelsCode2
Taming Data and Transformers for Audio GenerationCode2
FlowDec: A flow-based full-band general audio codec with high perceptual qualityCode2
Efficient Autoregressive Audio Modeling via Next-Scale PredictionCode2
KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio GenerationCode2
Frechet Music Distance: A Metric For Generative Symbolic Music EvaluationCode1
DOSE : Drum One-Shot Extraction from Music MixtureCode1
Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial NetworksCode1
Multi-Source Music Generation with Latent DiffusionCode1
BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio GenerationCode1
AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object DetectionCode1
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference OptimizationCode1
Representation Sharing for Fast Object Detector Search and BeyondCode1
Aligning Text-to-Music Evaluation with Human PreferencesCode1
Twitch Plays Pokemon, Machine Learns Twitch: Unsupervised Context-Aware Anomaly Detection for Identifying Trolls in Streaming DataCode0
Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and InferenceCode0
Refined Semantic Enhancement towards Frequency Diffusion for Video CaptioningCode0
AnoPLe: Few-Shot Anomaly Detection via Bi-directional Prompt Learning with Only Normal SamplesCode0
Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRICode0
Generating Diverse Vocal Bursts with StyleGAN2 and MEL-SpectrogramsCode0
CLOTH4D: A Dataset for Clothed Human ReconstructionCode0
Latent CLAP Loss for Better Foley Sound SynthesisCode0
Phase asymmetry guided adaptive fractional-order total variation and diffusion for feature-preserving ultrasound despeckling0
Predicting Personal Traits from Facial Images using Convolutional Neural Networks Augmented with Facial Landmark Information0
Quantum Machine Learning: Fad or Future?0
RenderBox: Expressive Performance Rendering with Text Control0
Responding to Illegal Activities Along the Canadian Coastlines Using Reinforcement Learning0
Retrieval-Augmented Text-to-Audio Generation0
Sensing Performance of Multi-Channel RFID-based Finger Augmentation Devices for Tactile Internet0
Sound Scene Synthesis at the DCASE 2024 Challenge0
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis0
Tuna-AI: tuna biomass estimation with Machine Learning models trained on oceanography and echosounder FAD data0
Market Making with Fads, Informed, and Uninformed Traders0
A Fast Automatic Method for Deconvoluting Macro X-ray Fluorescence Data Collected from Easel Paintings0
A General Framework for Learning Procedural Audio Models of Environmental Sounds0
A Study on Robustness to Perturbations for Representations of Environmental Sound0
Audiobox: Unified Audio Generation with Natural Language Prompts0
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech20
Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings0
Detecting immune cells with label-free two-photon autofluorescence and deep learning0
Diffusion based Text-to-Music Generation with Global and Local Text based Conditioning0
DRAGON: Distributional Rewards Optimize Diffusion Generative Models0
Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion0
Enhancing U.S. swine farm preparedness for infectious foreign animal diseases with rapid access to biosecurity information0
Exploring compressibility of transformer based text-to-music (TTM) models0
FaceCat: Enhancing Face Recognition Security with a Unified Diffusion Model0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.