| Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation | Oct 7, 2024 | Prompt EngineeringVideo Generation | CodeCode Available | 2 |
| ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery | Oct 7, 2024 | scientific discovery | CodeCode Available | 2 |
| Ensured: Explanations for Decreasing the Epistemic Uncertainty in Predictions | Oct 7, 2024 | | CodeCode Available | 2 |
| SecAlign: Defending Against Prompt Injection with Preference Optimization | Oct 7, 2024 | | CodeCode Available | 2 |
| A Simple Image Segmentation Framework via In-Context Examples | Oct 7, 2024 | DecoderImage Segmentation | CodeCode Available | 2 |
| TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention | Oct 7, 2024 | Position | CodeCode Available | 2 |
| Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration | Oct 7, 2024 | Image RestorationNavigate | CodeCode Available | 2 |
| Causal Context Adjustment Loss for Learned Image Compression | Oct 7, 2024 | Image Compression | CodeCode Available | 2 |
| TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles | Oct 7, 2024 | Logical Reasoning | CodeCode Available | 2 |
| Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting | Oct 7, 2024 | 3DGS | CodeCode Available | 2 |
| Towards Ultra-Low-Power Neuromorphic Speech Enhancement with Spiking-FullSubNet | Oct 7, 2024 | DenoisingSpeech Denoising | CodeCode Available | 2 |
| TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens | Oct 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality | Oct 7, 2024 | Causal Inferencecounterfactual | CodeCode Available | 2 |
| Differential Transformer | Oct 7, 2024 | HallucinationIn-Context Learning | CodeCode Available | 2 |
| Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment Analysis | Oct 6, 2024 | Multimodal Sentiment AnalysisSentiment Analysis | CodeCode Available | 2 |
| Generative Flows on Synthetic Pathway for Drug Design | Oct 6, 2024 | Drug DesignDrug Discovery | CodeCode Available | 2 |
| dattri: A Library for Efficient Data Attribution | Oct 6, 2024 | Benchmarking | CodeCode Available | 2 |
| Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval | Oct 6, 2024 | Community DetectionInformation Retrieval | CodeCode Available | 2 |
| GenSim: A General Social Simulation Platform with Large Language Model based Agents | Oct 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights | Oct 6, 2024 | | CodeCode Available | 2 |
| LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation | Oct 6, 2024 | Pose EstimationVisual Localization | CodeCode Available | 2 |
| DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion | Oct 6, 2024 | DeepFake DetectionDomain Generalization | CodeCode Available | 2 |
| TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting | Oct 6, 2024 | Multivariate Time Series ForecastingTime Series | CodeCode Available | 2 |
| UniMuMo: Unified Text, Music and Motion Generation | Oct 6, 2024 | DecoderMotion Generation | CodeCode Available | 2 |
| Hammer: Robust Function-Calling for On-Device Language Models via Function Masking | Oct 6, 2024 | | CodeCode Available | 2 |
| Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement | Oct 6, 2024 | Mathematical ReasoningMeta-Learning | CodeCode Available | 2 |
| Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution | Oct 5, 2024 | Image Super-ResolutionKnowledge Distillation | CodeCode Available | 2 |
| A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models | Oct 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| An Electrocardiogram Foundation Model Built on over 10 Million Recordings with External Evaluation across Multiple Domains | Oct 5, 2024 | DiagnosticEvent Detection | CodeCode Available | 2 |
| DeFoG: Discrete Flow Matching for Graph Generation | Oct 5, 2024 | DenoisingGraph Generation | CodeCode Available | 2 |
| SyllableLM: Learning Coarse Semantic Units for Speech Language Models | Oct 5, 2024 | ClusteringLanguage Modeling | CodeCode Available | 2 |
| Learning Truncated Causal History Model for Video Restoration | Oct 4, 2024 | DeblurringDenoising | CodeCode Available | 2 |
| Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models | Oct 4, 2024 | DecoderHallucination | CodeCode Available | 2 |
| Oscillatory State-Space Models | Oct 4, 2024 | MambaState Space Models | CodeCode Available | 2 |
| Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering | Oct 4, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 |
| Mamba in Vision: A Comprehensive Survey of Techniques and Applications | Oct 4, 2024 | MambaState Space Models | CodeCode Available | 2 |
| Multi-Robot Motion Planning with Diffusion Models | Oct 4, 2024 | Motion Planning | CodeCode Available | 2 |
| Dynamic Diffusion Transformer | Oct 4, 2024 | Image Generation | CodeCode Available | 2 |
| Exploring the Benefit of Activation Sparsity in Pre-training | Oct 4, 2024 | | CodeCode Available | 2 |
| ToolGen: Unified Tool Retrieval and Calling via Generation | Oct 4, 2024 | RetrievalText Generation | CodeCode Available | 2 |
| Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review | Oct 4, 2024 | Knowledge DistillationLogical Reasoning | CodeCode Available | 2 |
| Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models | Oct 4, 2024 | Dense Video CaptioningSentence | CodeCode Available | 2 |
| Scaling Large Motion Models with Million-Level Human Motions | Oct 4, 2024 | Motion Generation | CodeCode Available | 2 |
| Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models | Oct 4, 2024 | | CodeCode Available | 2 |
| Steering Large Language Models between Code Execution and Textual Reasoning | Oct 4, 2024 | Code GenerationMath | CodeCode Available | 2 |
| Autoregressive Action Sequence Learning for Robotic Manipulation | Oct 4, 2024 | ChunkingLanguage Modeling | CodeCode Available | 2 |
| MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task | Oct 4, 2024 | Translation | CodeCode Available | 2 |
| Generative Artificial Intelligence for Navigating Synthesizable Chemical Space | Oct 4, 2024 | Drug DiscoveryNavigate | CodeCode Available | 2 |
| GraphRouter: A Graph-based Router for LLM Selections | Oct 4, 2024 | Transductive Learning | CodeCode Available | 2 |
| AutoPenBench: Benchmarking Generative Agents for Penetration Testing | Oct 4, 2024 | Benchmarking | CodeCode Available | 2 |