| OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning | Dec 31, 2024 | BenchmarkingLogical Reasoning | CodeCode Available | 4 |
| VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling | Dec 31, 2024 | Memorization | CodeCode Available | 4 |
| Training Software Engineering Agents and Verifiers with SWE-Gym | Dec 30, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization | Dec 30, 2024 | Audio GenerationGPU | CodeCode Available | 4 |
| MINIMA: Modality Invariant Image Matching | Dec 27, 2024 | | CodeCode Available | 4 |
| The Thousand Brains Project: A New Paradigm for Sensorimotor Intelligence | Dec 24, 2024 | Continual Learning | CodeCode Available | 4 |
| Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders | Dec 23, 2024 | 3D Shape ModelingBenchmarking | CodeCode Available | 4 |
| LLM4AD: A Platform for Algorithm Design with Large Language Model | Dec 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving | Dec 19, 2024 | Autonomous Driving | CodeCode Available | 4 |
| Dimension Reduction with Locally Adjusted Graphs | Dec 19, 2024 | Dimensionality Reduction | CodeCode Available | 4 |
| Human-Humanoid Robots Cross-Embodiment Behavior-Skill Transfer Using Decomposed Adversarial Learning from Demonstration | Dec 19, 2024 | Human-Object Interaction Detectionmotion retargeting | CodeCode Available | 4 |
| Autoregressive Video Generation without Vector Quantization | Dec 18, 2024 | Image GenerationPrediction | CodeCode Available | 4 |
| Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces | Dec 18, 2024 | Question AnsweringSpatial Reasoning | CodeCode Available | 4 |
| SocialED: A Python Library for Social Event Detection | Dec 18, 2024 | CPUEvent Detection | CodeCode Available | 4 |
| Neural general circulation models optimized to predict satellite-based precipitation observations | Dec 16, 2024 | | CodeCode Available | 4 |
| SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator | Dec 16, 2024 | GSM8KLanguage Modeling | CodeCode Available | 4 |
| DisCo-DSO: Coupling Discrete and Continuous Optimization for Efficient Generative Design in Hybrid Spaces | Dec 15, 2024 | Symbolic Regression | CodeCode Available | 4 |
| Towards Effective, Efficient and Unsupervised Social Event Detection in the Hyperbolic Space | Dec 14, 2024 | Event Detection | CodeCode Available | 4 |
| Hidden Biases of End-to-End Driving Datasets | Dec 12, 2024 | Bench2DriveCARLA Leaderboard 2.0 | CodeCode Available | 4 |
| Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders | Dec 12, 2024 | Gaze Target Estimation | CodeCode Available | 4 |
| MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning | Dec 12, 2024 | class-incremental learningClass Incremental Learning | CodeCode Available | 4 |
| Video Seal: Open and Efficient Video Watermarking | Dec 12, 2024 | Video CompressionVideo Editing | CodeCode Available | 4 |
| FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models | Dec 11, 2024 | | CodeCode Available | 4 |
| SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints | Dec 10, 2024 | 4D reconstructionVideo Generation | CodeCode Available | 4 |
| SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models | Dec 10, 2024 | Action RecognitionSpatial Reasoning | CodeCode Available | 4 |
| MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds | Dec 9, 2024 | Camera CalibrationCamera Pose Estimation | CodeCode Available | 4 |
| Gated Delta Networks: Improving Mamba2 with Delta Rule | Dec 9, 2024 | Common Sense ReasoningLanguage Modeling | CodeCode Available | 4 |
| You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale | Dec 9, 2024 | 3D Generation3D geometry | CodeCode Available | 4 |
| Fully Open Source Moxin-7B Technical Report | Dec 8, 2024 | | CodeCode Available | 4 |
| LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods | Dec 7, 2024 | | CodeCode Available | 4 |
| UniScene: Unified Occupancy-centric Driving Scene Generation | Dec 6, 2024 | Autonomous DrivingScene Generation | CodeCode Available | 4 |
| Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering | Dec 5, 2024 | Novel View Synthesis | CodeCode Available | 4 |
| Liquid: Language Models are Scalable Multi-modal Generators | Dec 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction | Dec 5, 2024 | 3D Semantic Occupancy PredictionAutonomous Driving | CodeCode Available | 4 |
| Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise | Dec 5, 2024 | DenoisingImage Restoration | CodeCode Available | 4 |
| Best-of-N Jailbreaking | Dec 4, 2024 | | CodeCode Available | 4 |
| Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach | Dec 4, 2024 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 4 |
| Weighted-Reward Preference Optimization for Implicit Model Fusion | Dec 4, 2024 | model | CodeCode Available | 4 |
| Navigation World Models | Dec 4, 2024 | Robot NavigationVideo Generation | CodeCode Available | 4 |
| Taming Scalable Visual Tokenizer for Autoregressive Image Generation | Dec 3, 2024 | Image GenerationImage Reconstruction | CodeCode Available | 4 |
| HaGRIDv2: 1M Images for Static and Dynamic Hand Gesture Recognition | Dec 2, 2024 | Gesture RecognitionHand Detection | CodeCode Available | 4 |
| FullStack Bench: Evaluating LLMs as Full Stack Coders | Nov 30, 2024 | | CodeCode Available | 4 |
| FLARE: Toward Universal Dataset Purification against Backdoor Attacks | Nov 29, 2024 | All | CodeCode Available | 4 |
| Multimodal Whole Slide Foundation Model for Pathology | Nov 29, 2024 | Cross-Modal Retrievalmodel | CodeCode Available | 4 |
| AGS-Mesh: Adaptive Gaussian Splatting and Meshing with Geometric Priors for Indoor Room Reconstruction Using Smartphones | Nov 28, 2024 | 3D ReconstructionNovel View Synthesis | CodeCode Available | 4 |
| sbi reloaded: a toolkit for simulation-based inference workflows | Nov 26, 2024 | Bayesian InferenceDiagnostic | CodeCode Available | 4 |
| Identity-Preserving Text-to-Video Generation by Frequency Decomposition | Nov 26, 2024 | Human-Domain Subject-to-VideoImage to Video Generation | CodeCode Available | 4 |
| One Diffusion to Generate Them All | Nov 25, 2024 | AllCamera Pose Estimation | CodeCode Available | 4 |
| Parameter Efficient Instruction Tuning: An Empirical Study | Nov 25, 2024 | Instruction FollowingMemorization | CodeCode Available | 4 |
| From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge | Nov 25, 2024 | | CodeCode Available | 4 |