| DeepInteraction++: Multi-Modality Interaction for Autonomous Driving | Aug 9, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 3 |
| ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities | Aug 8, 2024 | | CodeCode Available | 3 |
| 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data | Aug 7, 2024 | 16k2k | CodeCode Available | 3 |
| Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields | Aug 7, 2024 | 3DGSModel Compression | CodeCode Available | 3 |
| Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws | Aug 6, 2024 | Data Poisoning | CodeCode Available | 3 |
| MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine | Aug 6, 2024 | Medical Visual Question AnsweringOrgan Detection | CodeCode Available | 3 |
| Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection | Aug 6, 2024 | audio moment retrievalHighlight Detection | CodeCode Available | 3 |
| Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2 | Aug 3, 2024 | DiversitySegmentation | CodeCode Available | 3 |
| RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework | Aug 2, 2024 | BenchmarkingDataset Generation | CodeCode Available | 3 |
| MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models | Aug 2, 2024 | Multimodal ReasoningMultiple-choice | CodeCode Available | 3 |
| multiGradICON: A Foundation Model for Multimodal Medical Image Registration | Aug 1, 2024 | AnatomyDeep Learning | CodeCode Available | 3 |
| Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names | Aug 1, 2024 | | CodeCode Available | 3 |
| MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities | Aug 1, 2024 | MathMM-Vet | CodeCode Available | 3 |
| DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving | Aug 1, 2024 | | CodeCode Available | 3 |
| UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model | Aug 1, 2024 | | CodeCode Available | 3 |
| Large Language Monkeys: Scaling Inference Compute with Repeated Sampling | Jul 31, 2024 | GSM8KMath | CodeCode Available | 3 |
| Beat this! Accurate beat tracking without DBN postprocessing | Jul 31, 2024 | Beat TrackingDownbeat Tracking | CodeCode Available | 3 |
| Hyper-parameter tuning for text guided image editing | Jul 31, 2024 | text-guided-image-editing | CodeCode Available | 3 |
| ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget | Jul 31, 2024 | Document-level Closed Information ExtractionEntity Linking | CodeCode Available | 3 |
| ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models | Jul 31, 2024 | Domain GeneralizationPrompt Learning | CodeCode Available | 3 |
| Comgra: A Tool for Analyzing and Debugging Neural Networks | Jul 31, 2024 | | CodeCode Available | 3 |
| Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection | Jul 30, 2024 | object-detectionObject Detection | CodeCode Available | 3 |
| Diffusion Feedback Helps CLIP See Better | Jul 29, 2024 | image-classificationImage Classification | CodeCode Available | 3 |
| rLLM: Relational Table Learning with LLMs | Jul 29, 2024 | ClassificationNode Classification | CodeCode Available | 3 |
| RelBench: A Benchmark for Deep Learning on Relational Databases | Jul 29, 2024 | Deep LearningFeature Engineering | CodeCode Available | 3 |
| Practical Video Object Detection via Feature Selection and Aggregation | Jul 29, 2024 | feature selectionGPU | CodeCode Available | 3 |
| Theia: Distilling Diverse Vision Foundation Models for Robot Learning | Jul 29, 2024 | | CodeCode Available | 3 |
| OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at Scale | Jul 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents | Jul 26, 2024 | BenchmarkingCode Generation | CodeCode Available | 3 |
| Keypoint Promptable Re-Identification | Jul 25, 2024 | Metric LearningOccluded Person Re-Identification | CodeCode Available | 3 |
| Harnessing Temporal Causality for Advanced Temporal Action Detection | Jul 25, 2024 | Action DetectionAction Recognition | CodeCode Available | 3 |
| LION: Linear Group RNN for 3D Object Detection in Point Clouds | Jul 25, 2024 | 3D Object DetectionLong-range modeling | CodeCode Available | 3 |
| EAFormer: Scene Text Segmentation with Edge-Aware Transformers | Jul 24, 2024 | DecoderSegmentation | CodeCode Available | 3 |
| Sentiment Reasoning for Healthcare | Jul 24, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 |
| HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Jul 24, 2024 | BenchmarkingHuman Animation | CodeCode Available | 3 |
| Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model | Jul 24, 2024 | Image InpaintingObject | CodeCode Available | 3 |
| 3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities | Jul 24, 2024 | 3DGSSurvey | CodeCode Available | 3 |
| AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking | Jul 23, 2024 | BenchmarkingTransfer Learning | CodeCode Available | 3 |
| Pareto Front Approximation for Multi-Objective Session-Based Recommender Systems | Jul 23, 2024 | Recommendation Systems | CodeCode Available | 3 |
| Reinforcement Learning Meets Visual Odometry | Jul 22, 2024 | Decision Makingreinforcement-learning | CodeCode Available | 3 |
| AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection | Jul 22, 2024 | Anomaly DetectionLanguage Modeling | CodeCode Available | 3 |
| Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models | Jul 22, 2024 | Image Animation | CodeCode Available | 3 |
| Odyssey: Empowering Minecraft Agents with Open-World Skills | Jul 22, 2024 | Language ModellingLarge Language Model | CodeCode Available | 3 |
| SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models | Jul 22, 2024 | Language Modeling | CodeCode Available | 3 |
| PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements | Jul 22, 2024 | Chatbot | CodeCode Available | 3 |
| TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON | Jul 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| LLMmap: Fingerprinting For Large Language Models | Jul 22, 2024 | RAG | CodeCode Available | 3 |
| vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving | Jul 22, 2024 | CPUGPU | CodeCode Available | 3 |
| Local All-Pair Correspondence for Point Tracking | Jul 22, 2024 | AllPoint Tracking | CodeCode Available | 3 |
| Compact Language Models via Pruning and Knowledge Distillation | Jul 19, 2024 | Knowledge DistillationLanguage Modeling | CodeCode Available | 3 |