| Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs | Jun 14, 2024 | Memorization | CodeCode Available | 2 |
| Consistency-diversity-realism Pareto fronts of conditional image generative models | Jun 14, 2024 | Diversity | CodeCode Available | 2 |
| Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection | Jun 14, 2024 | Decoderspeech-recognition | CodeCode Available | 2 |
| ControlVAR: Exploring Controllable Visual Autoregressive Modeling | Jun 14, 2024 | Image Generation | CodeCode Available | 2 |
| CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models | Jun 14, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 2 |
| DurLAR: A High-fidelity 128-channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-modal Autonomous Driving Applications | Jun 14, 2024 | Autonomous DrivingDepth Estimation | CodeCode Available | 2 |
| Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation | Jun 14, 2024 | NavigateVision and Language Navigation | CodeCode Available | 2 |
| EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models | Jun 14, 2024 | 3D Object Detection3D Reconstruction | CodeCode Available | 2 |
| BEACON: Benchmark for Comprehensive RNA Tasks and Language Models | Jun 14, 2024 | Language Modelling | CodeCode Available | 2 |
| ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation | Jun 14, 2024 | Code Generation | CodeCode Available | 2 |
| QQQ: Quality Quattuor-Bit Quantization for Large Language Models | Jun 14, 2024 | Quantization | CodeCode Available | 2 |
| PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting | Jun 14, 2024 | NeRFNovel View Synthesis | CodeCode Available | 2 |
| An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare Records | Jun 13, 2024 | Adversarial RobustnessExplainable Artificial Intelligence (XAI) | CodeCode Available | 2 |
| Dynamic Asset Allocation with Asset-Specific Regime Forecasts | Jun 13, 2024 | | CodeCode Available | 2 |
| Interpreting the Weight Space of Customized Diffusion Models | Jun 13, 2024 | | CodeCode Available | 2 |
| Yo'LLaVA: Your Personalized Language and Vision Assistant | Jun 13, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 2 |
| Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs | Jun 13, 2024 | BenchmarkingGPU | CodeCode Available | 2 |
| Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors | Jun 13, 2024 | Data AugmentationText Detection | CodeCode Available | 2 |
| Understanding Hallucinations in Diffusion Models through Mode Interpolation | Jun 13, 2024 | HallucinationImage Generation | CodeCode Available | 2 |
| Fredformer: Frequency Debiased Transformer for Time Series Forecasting | Jun 13, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| On Softmax Direct Preference Optimization for Recommendation | Jun 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Classic GNNs are Strong Baselines: Reassessing GNNs for Node Classification | Jun 13, 2024 | Node ClassificationNode Property Prediction | CodeCode Available | 2 |
| Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs | Jun 13, 2024 | Arithmetic ReasoningFact Verification | CodeCode Available | 2 |
| DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer | Jun 13, 2024 | Face Image QualityFace Image Quality Assessment | CodeCode Available | 2 |
| LRM-Zero: Training Large Reconstruction Models with Synthesized Data | Jun 13, 2024 | 3D Reconstruction | CodeCode Available | 2 |
| Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models | Jun 13, 2024 | MathQuantization | CodeCode Available | 2 |
| STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery | Jun 13, 2024 | Graph GenerationObject | CodeCode Available | 2 |
| StreamBench: Towards Benchmarking Continuous Improvement of Language Agents | Jun 13, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 2 |
| Towards Vision-Language Geo-Foundation Model: A Survey | Jun 13, 2024 | Earth ObservationImage Captioning | CodeCode Available | 2 |
| BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection | Jun 13, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance | Jun 13, 2024 | Motion GenerationPosition | CodeCode Available | 2 |
| An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios | Jun 13, 2024 | Language IdentificationSelf-Supervised Learning | CodeCode Available | 2 |
| S^3 -- Semantic Signal Separation | Jun 13, 2024 | blind source separationTopic Models | CodeCode Available | 2 |
| CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making | Jun 13, 2024 | Decision Making | CodeCode Available | 2 |
| Explore the Limits of Omni-modal Pretraining at Scale | Jun 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models | Jun 13, 2024 | | CodeCode Available | 2 |
| An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval | Jun 13, 2024 | Contrastive LearningImage Retrieval | CodeCode Available | 2 |
| Enhancing Diagnostic Accuracy in Rare and Common Fundus Diseases with a Knowledge-Rich Vision-Language Model | Jun 13, 2024 | DiagnosticImage Retrieval | CodeCode Available | 2 |
| Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions | Jun 13, 2024 | Philosophy | CodeCode Available | 2 |
| CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models | Jun 13, 2024 | Object | CodeCode Available | 2 |
| Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs | Jun 13, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 2 |
| Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges | Jun 13, 2024 | Emotion RecognitionMusic Emotion Recognition | CodeCode Available | 2 |
| BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics | Jun 13, 2024 | Benchmarking | CodeCode Available | 2 |
| LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning | Jun 12, 2024 | text-to-speechText to Speech | CodeCode Available | 2 |
| Real-world Image Dehazing with Coherence-based Pseudo Labeling and Cooperative Unfolding Network | Jun 12, 2024 | Image Dehazing | CodeCode Available | 2 |
| Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models | Jun 12, 2024 | Image Compression | CodeCode Available | 2 |
| LVBench: An Extreme Long Video Understanding Benchmark | Jun 12, 2024 | Decision MakingVideo Understanding | CodeCode Available | 2 |
| DehazeDCT: Towards Effective Non-Homogeneous Dehazing via Deformable Convolutional Transformer | Jun 12, 2024 | Image DehazingNonhomogeneous Image Dehazing | CodeCode Available | 2 |
| Time-MMD: Multi-Domain Multimodal Dataset for Time Series Analysis | Jun 12, 2024 | Time SeriesTime Series Analysis | CodeCode Available | 2 |
| Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio | Jun 12, 2024 | Clustering | CodeCode Available | 2 |