| Text2midi: Generating Symbolic Music from Captions | Dec 21, 2024 | Decoder | CodeCode Available | 2 |
| Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement | Dec 21, 2024 | Mamba | CodeCode Available | 2 |
| PruneVid: Visual Token Pruning for Efficient Video Large Language Models | Dec 20, 2024 | Video Understanding | CodeCode Available | 2 |
| Personalized Representation from Personalized Generation | Dec 20, 2024 | Contrastive LearningImage Generation | CodeCode Available | 2 |
| Offline Reinforcement Learning for LLM Multi-Step Reasoning | Dec 20, 2024 | GSM8KMath | CodeCode Available | 2 |
| MR-GDINO: Efficient Open-World Continual Object Detection | Dec 20, 2024 | Continual Learningobject-detection | CodeCode Available | 2 |
| FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF | Dec 20, 2024 | Privacy Preservingreinforcement-learning | CodeCode Available | 2 |
| fluke: Federated Learning Utility frameworK for Experimentation and research | Dec 20, 2024 | Federated Learning | CodeCode Available | 2 |
| PyBOP: A Python package for battery model optimisation and parameterisation | Dec 20, 2024 | | CodeCode Available | 2 |
| Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration | Dec 20, 2024 | Human Agent Collaboration | CodeCode Available | 2 |
| XRAG: eXamining the Core -- Benchmarking Foundational Components in Advanced Retrieval-Augmented Generation | Dec 20, 2024 | BenchmarkingDiagnostic | CodeCode Available | 2 |
| ChangeDiff: A Multi-Temporal Change Detection Data Generator with Flexible Text Prompts via Diffusion Model | Dec 20, 2024 | Change Detection | CodeCode Available | 2 |
| Mapping the Mind of an Instruction-based Image Editing using SMILE | Dec 20, 2024 | Autonomous Driving | CodeCode Available | 2 |
| Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking | Dec 20, 2024 | MambaObject Tracking | CodeCode Available | 2 |
| Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer | Dec 19, 2024 | Image ManipulationImage Manipulation Localization | CodeCode Available | 2 |
| Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties | Dec 19, 2024 | Anomaly DetectionObject | CodeCode Available | 2 |
| PsyDraw: A Multi-Agent Multimodal System for Mental Health Screening in Left-Behind Children | Dec 19, 2024 | | CodeCode Available | 2 |
| MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark | Dec 19, 2024 | MMLUMultiple-choice | CodeCode Available | 2 |
| LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis | Dec 19, 2024 | Object | CodeCode Available | 2 |
| A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space | Dec 19, 2024 | Computational Efficiencyobject-detection | CodeCode Available | 2 |
| Preventing Local Pitfalls in Vector Quantization via Optimal Transport | Dec 19, 2024 | Image ReconstructionQuantization | CodeCode Available | 2 |
| Tests for model misspecification in simulation-based inference: from local distortions to global model checks | Dec 19, 2024 | Anomaly Detectionmodel | CodeCode Available | 2 |
| ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing | Dec 19, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Next Patch Prediction for Autoregressive Visual Generation | Dec 19, 2024 | Image GenerationPrediction | CodeCode Available | 2 |
| Learning charges and long-range interactions from energies and forces | Dec 19, 2024 | | CodeCode Available | 2 |
| FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching | Dec 19, 2024 | Image GenerationPrediction | CodeCode Available | 2 |
| AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving | Dec 19, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 2 |
| Agent-SafetyBench: Evaluating the Safety of LLM Agents | Dec 19, 2024 | | CodeCode Available | 2 |
| Fietje: An open, efficient LLM for Dutch | Dec 19, 2024 | Linguistic AcceptabilitySentiment Analysis | CodeCode Available | 2 |
| DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space | Dec 19, 2024 | | CodeCode Available | 2 |
| Mesoscopic Insights: Orchestrating Multi-scale & Hybrid Architecture for Image Manipulation Localization | Dec 18, 2024 | Image Manipulation | CodeCode Available | 2 |
| Joint Perception and Prediction for Autonomous Driving: A Survey | Dec 18, 2024 | Autonomous Drivingmotion prediction | CodeCode Available | 2 |
| ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning | Dec 18, 2024 | | CodeCode Available | 2 |
| Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection | Dec 18, 2024 | | CodeCode Available | 2 |
| Open Universal Arabic ASR Leaderboard | Dec 18, 2024 | Benchmarking | CodeCode Available | 2 |
| Large Language Model Enhanced Recommender Systems: A Survey | Dec 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models | Dec 18, 2024 | Reasoning SegmentationSegmentation | CodeCode Available | 2 |
| Alignment faking in large language models | Dec 18, 2024 | Large Language Model | CodeCode Available | 2 |
| RelationField: Relate Anything in Radiance Fields | Dec 18, 2024 | 3d scene graph generationGraph Generation | CodeCode Available | 2 |
| A Survey on LLM Inference-Time Self-Improvement | Dec 18, 2024 | Survey | CodeCode Available | 2 |
| Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image Segmentation | Dec 18, 2024 | Image SegmentationKnowledge Distillation | CodeCode Available | 2 |
| Modality-Independent Graph Neural Networks with Global Transformers for Multimodal Recommendation | Dec 18, 2024 | Graph LearningMulti-modal Recommendation | CodeCode Available | 2 |
| AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities | Dec 18, 2024 | Change DetectionDiversity | CodeCode Available | 2 |
| ArchesWeather & ArchesWeatherGen: a deterministic and generative model for efficient ML weather forecasting | Dec 17, 2024 | GPUWeather Forecasting | CodeCode Available | 2 |
| SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation | Dec 17, 2024 | Fact VerificationKnowledge Graphs | CodeCode Available | 2 |
| Guiding Generative Protein Language Models with Reinforcement Learning | Dec 17, 2024 | Diversityreinforcement-learning | CodeCode Available | 2 |
| SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents | Dec 17, 2024 | Task Planning | CodeCode Available | 2 |
| OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain | Dec 17, 2024 | RAGRetrieval | CodeCode Available | 2 |
| Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency | Dec 17, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models | Dec 17, 2024 | | CodeCode Available | 2 |