| Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues | Oct 14, 2024 | LLM JailbreakSafety Alignment | CodeCode Available | 2 |
| MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding | Oct 15, 2024 | Visual Question Answering | CodeCode Available | 2 |
| Evaluating Morphological Compositional Generalization in Large Language Models | Oct 16, 2024 | Text Generation | CodeCode Available | 2 |
| IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning | Oct 19, 2024 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| DM-Codec: Distilling Multimodal Representations for Speech Tokenization | Oct 19, 2024 | Self-Supervised LearningSpeech Tokenization | CodeCode Available | 2 |
| GPT or BERT: why not both? | Oct 31, 2024 | Causal Language ModelingLanguage Modeling | CodeCode Available | 2 |
| Model merging with SVD to tie the Knots | Oct 25, 2024 | model | CodeCode Available | 2 |
| SciPIP: An LLM-based Scientific Paper Idea Proposer | Oct 30, 2024 | Retrieval | CodeCode Available | 2 |
| Ada-MSHyper: Adaptive Multi-Scale Hypergraph Transformer for Time Series Forecasting | Oct 31, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection | Nov 12, 2024 | Optical Flow EstimationOut-of-Distribution Detection | CodeCode Available | 2 |
| MetaOpenFOAM: an LLM-based multi-agent framework for CFD | Jul 31, 2024 | RAGRetrieval-augmented Generation | CodeCode Available | 2 |
| PyGen: A Collaborative Human-AI Approach to Python Package Creation | Nov 13, 2024 | Code Generation | CodeCode Available | 2 |
| Disentangling Memory and Reasoning Ability in Large Language Models | Nov 20, 2024 | Decision MakingRetrieval | CodeCode Available | 2 |
| MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective | Nov 21, 2024 | Image ComprehensionImage Generation | CodeCode Available | 2 |
| vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation | Nov 26, 2024 | Image SegmentationMedical Image Analysis | CodeCode Available | 2 |
| TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models | Nov 27, 2024 | Garment ReconstructionImage Generation | CodeCode Available | 2 |
| TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting | Nov 29, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| Lost & Found: Tracking Changes from Egocentric Observations in 3D Dynamic Scene Graphs | Nov 28, 2024 | Object | CodeCode Available | 2 |
| X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models | Dec 2, 2024 | Image GenerationIn-Context Learning | CodeCode Available | 2 |
| CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking | Dec 1, 2024 | Bug fixingCode Generation | CodeCode Available | 2 |
| FLAIR: VLM with Fine-grained Language-informed Image Representations | Dec 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario | Jan 17, 2025 | | CodeCode Available | 2 |
| SoRA: Singular Value Decomposed Low-Rank Adaptation for Domain Generalizable Representation Learning | Dec 5, 2024 | Domain AdaptationDomain Generalization | CodeCode Available | 2 |
| Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation | Dec 5, 2024 | Image ComprehensionRepresentation Learning | CodeCode Available | 2 |
| JPC: Flexible Inference for Predictive Coding Networks in JAX | Dec 4, 2024 | | CodeCode Available | 2 |
| MESA: Effective Matching Redundancy Reduction by Semantic Area Segmentation | Aug 1, 2024 | Patch Matching | CodeCode Available | 2 |
| DriveMM: All-in-One Large Multimodal Model for Autonomous Driving | Dec 10, 2024 | AllAutonomous Driving | CodeCode Available | 2 |
| MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction | Dec 12, 2024 | 3D ReconstructionMotion Estimation | CodeCode Available | 2 |
| MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark | Dec 19, 2024 | MMLUMultiple-choice | CodeCode Available | 2 |
| MR-GDINO: Efficient Open-World Continual Object Detection | Dec 20, 2024 | Continual Learningobject-detection | CodeCode Available | 2 |
| Scenario-Wise Rec: A Multi-Scenario Recommendation Benchmark | Dec 23, 2024 | | CodeCode Available | 2 |
| EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation | Dec 24, 2024 | Image CaptioningImage Generation | CodeCode Available | 2 |
| Test-time Computing: from System-1 Thinking to System-2 Thinking | Jan 5, 2025 | | CodeCode Available | 2 |
| TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios | Jan 10, 2025 | Aerial Scene ClassificationCPU | CodeCode Available | 2 |
| Russian Financial Statements Database: A firm-level collection of the universe of financial statements | Jan 10, 2025 | Imputation | CodeCode Available | 2 |
| ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation | Jan 11, 2025 | Chart UnderstandingCode Generation | CodeCode Available | 2 |
| Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models | May 25, 2023 | Conditional Text-to-Image SynthesisImage Generation | CodeCode Available | 2 |
| ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | Feb 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SalM2: An Extremely Lightweight Saliency Mamba Model for Real-Time Cognitive Awareness of Driver Attention | Feb 22, 2025 | Mamba | CodeCode Available | 2 |
| TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators | Feb 20, 2025 | BenchmarkingCode Generation | CodeCode Available | 2 |
| A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations | Feb 14, 2025 | Survey | CodeCode Available | 2 |
| Sanity Checking Causal Representation Learning on a Simple Real-World System | Feb 27, 2025 | Representation Learning | CodeCode Available | 2 |
| Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation | Feb 27, 2025 | Contrastive LearningDiagnostic | CodeCode Available | 2 |
| A Training-free LLM-based Approach to General Chinese Character Error Correction | Feb 21, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SemiSAM+: Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation Models | Feb 28, 2025 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 |
| Neural Posterior Estimation for Cataloging Astronomical Images with Spatially Varying Backgrounds and Point Spread Functions | Feb 28, 2025 | Variational Inference | CodeCode Available | 2 |
| AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit Topologies | Feb 28, 2025 | | CodeCode Available | 2 |
| Patch-wise Structural Loss for Time Series Forecasting | Mar 2, 2025 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation | Mar 5, 2025 | ObjectReferring Video Object Segmentation | CodeCode Available | 2 |
| MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments | Mar 4, 2025 | 2D Panoptic SegmentationGraph Generation | CodeCode Available | 2 |