| Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics | Apr 25, 2024 | Audio ClassificationTransfer Learning | CodeCode Available | 3 |
| COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations | Apr 25, 2024 | Contrastive LearningMusic Generation | CodeCode Available | 3 |
| Evolve Cost-aware Acquisition Functions Using Large Language Models | Apr 25, 2024 | Bayesian OptimizationDecision Making | CodeCode Available | 3 |
| SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension | Apr 25, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 3 |
| GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting | Apr 24, 2024 | 3DGSAttribute | CodeCode Available | 3 |
| Improving Dictionary Learning with Gated Sparse Autoencoders | Apr 24, 2024 | Dictionary Learning | CodeCode Available | 3 |
| Retrieval Head Mechanistically Explains Long-Context Factuality | Apr 24, 2024 | Continual PretrainingHallucination | CodeCode Available | 3 |
| CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models | Apr 24, 2024 | Consistent Character GenerationWord Embeddings | CodeCode Available | 3 |
| Taming Diffusion Probabilistic Models for Character Control | Apr 23, 2024 | Computational EfficiencyDiversity | CodeCode Available | 3 |
| TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting | Apr 23, 2024 | | CodeCode Available | 3 |
| SST: Multi-Scale Hybrid Mamba-Transformer Experts for Long-Short Range Time Series Forecasting | Apr 23, 2024 | MambaTime Series | CodeCode Available | 3 |
| FlashSpeech: Efficient Zero-Shot Speech Synthesis | Apr 23, 2024 | RhythmSpeech Synthesis | CodeCode Available | 3 |
| UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition | Apr 23, 2024 | DecoderDiversity | CodeCode Available | 3 |
| ID-Animator: Zero-Shot Identity-Preserving Human Video Generation | Apr 23, 2024 | AttributeVideo Generation | CodeCode Available | 3 |
| From Matching to Generation: A Survey on Generative Information Retrieval | Apr 23, 2024 | Incremental LearningInformation Retrieval | CodeCode Available | 3 |
| MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making | Apr 22, 2024 | Decision MakingMedical Diagnosis | CodeCode Available | 3 |
| MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | Apr 22, 2024 | Common Sense ReasoningGPU | CodeCode Available | 3 |
| SnapKV: LLM Knows What You are Looking for Before Generation | Apr 22, 2024 | 16kGPU | CodeCode Available | 3 |
| SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion | Apr 22, 2024 | Multivariate Time Series ForecastingTime Series | CodeCode Available | 3 |
| Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer | Apr 21, 2024 | Face ParsingSemantic Parsing | CodeCode Available | 3 |
| A Survey on the Memory Mechanism of Large Language Model based Agents | Apr 21, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| DMesh: A Differentiable Mesh Representation | Apr 20, 2024 | | CodeCode Available | 3 |
| STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases | Apr 19, 2024 | BenchmarkingRetrieval | CodeCode Available | 3 |
| On-Demand Earth System Data Cubes | Apr 19, 2024 | | CodeCode Available | 3 |
| AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation | Apr 19, 2024 | Action Generation | CodeCode Available | 3 |
| DeepFake-O-Meter v2.0: An Open Platform for DeepFake Detection | Apr 19, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 3 |
| TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding | Apr 18, 2024 | GPU | CodeCode Available | 3 |
| When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes | Apr 18, 2024 | Contrastive LearningFew-Shot Learning | CodeCode Available | 3 |
| On the Content Bias in Fréchet Video Distance | Apr 18, 2024 | Video Generation | CodeCode Available | 3 |
| Moving Object Segmentation: All You Need Is SAM (and Flow) | Apr 18, 2024 | AllMotion Segmentation | CodeCode Available | 3 |
| PureForest: A Large-Scale Aerial Lidar and Aerial Imagery Dataset for Tree Species Classification in Monospecific Forests | Apr 18, 2024 | Deep Learning | CodeCode Available | 3 |
| Learning with 3D rotations, a hitchhiker's guide to SO(3) | Apr 17, 2024 | | CodeCode Available | 3 |
| SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap | Apr 17, 2024 | Camera CalibrationGame State Reconstruction | CodeCode Available | 3 |
| MobileNetV4 -- Universal Models for the Mobile Ecosystem | Apr 16, 2024 | Image ClassificationNeural Architecture Search | CodeCode Available | 3 |
| Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification | Apr 16, 2024 | Feature EngineeringLanguage Modeling | CodeCode Available | 3 |
| The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report | Apr 16, 2024 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 3 |
| Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation | Apr 15, 2024 | Contrastive LearningDescriptive | CodeCode Available | 3 |
| Scoring Time Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription | Apr 15, 2024 | Music Transcription | CodeCode Available | 3 |
| How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model | Apr 15, 2024 | DecoderImage Segmentation | CodeCode Available | 3 |
| OneChart: Purify the Chart Structural Extraction via One Auxiliary Token | Apr 15, 2024 | Decoder | CodeCode Available | 3 |
| A Survey on Deep Learning for Theorem Proving | Apr 15, 2024 | Automated Theorem ProvingDeep Learning | CodeCode Available | 3 |
| FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba | Apr 15, 2024 | Infrared And Visible Image FusionMamba | CodeCode Available | 3 |
| SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation | Apr 15, 2024 | Brain Tumor SegmentationDecoder | CodeCode Available | 3 |
| RF-Diffusion: Radio Signal Generation via Time-Frequency Diffusion | Apr 14, 2024 | Time Series | CodeCode Available | 3 |
| DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector | Apr 13, 2024 | Data AugmentationKey Point Matching | CodeCode Available | 3 |
| TSLANet: Rethinking Transformers for Time Series Representation Learning | Apr 12, 2024 | Anomaly DetectionComputational Efficiency | CodeCode Available | 3 |
| Probing the 3D Awareness of Visual Foundation Models | Apr 12, 2024 | | CodeCode Available | 3 |
| BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking | Apr 12, 2024 | Motion CompensationMulti-Object Tracking | CodeCode Available | 3 |
| Taming Stable Diffusion for Text to 360° Panorama Image Generation | Apr 11, 2024 | DenoisingImage Generation | CodeCode Available | 3 |
| View Selection for 3D Captioning via Diffusion Ranking | Apr 11, 2024 | 3D Object CaptioningHallucination | CodeCode Available | 3 |