| Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model | Apr 23, 2024 | 3D Point Cloud ClassificationMamba | CodeCode Available | 2 |
| A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation | Jun 11, 2024 | DecoderSimultaneous Speech-to-Speech Translation | CodeCode Available | 2 |
| Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models | Jun 17, 2024 | Benchmarking | CodeCode Available | 2 |
| VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation | Jul 4, 2024 | | CodeCode Available | 2 |
| HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance | Jul 9, 2024 | BenchmarkingConditional Image Generation | CodeCode Available | 2 |
| Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models | Aug 4, 2024 | | CodeCode Available | 2 |
| RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images | Aug 27, 2024 | | CodeCode Available | 2 |
| MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale | Aug 29, 2024 | Deep Reinforcement LearningImitation Learning | CodeCode Available | 2 |
| VAE Explainer: Supplement Learning Variational Autoencoders with Interactive Visualization | Sep 13, 2024 | Math | CodeCode Available | 2 |
| Compositional Video Generation as Flow Equalization | Jun 10, 2024 | Video EditingVideo Generation | CodeCode Available | 2 |
| Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach | Sep 24, 2024 | Multi-Objective Reinforcement LearningReinforcement Learning (RL) | CodeCode Available | 2 |
| Balancing LoRA Performance and Efficiency with Simple Shard Sharing | Sep 19, 2024 | Computational EfficiencyGSM8K | CodeCode Available | 2 |
| PGN: The RNN's New Successor is Effective for Long-Range Time Series Forecasting | Sep 26, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation | Sep 29, 2024 | Image Enhancement | CodeCode Available | 2 |
| GraphRouter: A Graph-based Router for LLM Selections | Oct 4, 2024 | Transductive Learning | CodeCode Available | 2 |
| Learning to Optimize for Mixed-Integer Non-linear Programming with Feasibility Guarantees | Oct 14, 2024 | | CodeCode Available | 2 |
| Spiking GS: Towards High-Accuracy and Low-Cost Surface Reconstruction via Spiking Neuron-based Gaussian Splatting | Oct 9, 2024 | Surface Reconstruction | CodeCode Available | 2 |
| PAPILLON: Privacy Preservation from Internet-based and Local Language Model Ensembles | Oct 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Extended Mind Transformers | Jun 4, 2024 | Common Sense Reasoningcounterfactual | CodeCode Available | 2 |
| Combining Induction and Transduction for Abstract Reasoning | Nov 4, 2024 | ARCProgram Synthesis | CodeCode Available | 2 |
| Adaptive Length Image Tokenization via Recurrent Allocation | Nov 4, 2024 | Decoder | CodeCode Available | 2 |
| MC-LLaVA: Multi-Concept Personalized Vision-Language Model | Nov 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Are large language models superhuman chemists? | Apr 1, 2024 | Benchmarking | CodeCode Available | 2 |
| OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection | Nov 26, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| Monet: Mixture of Monosemantic Experts for Transformers | Dec 5, 2024 | Dictionary LearningMixture-of-Experts | CodeCode Available | 2 |
| MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization | Dec 9, 2024 | Visual Question Answering (VQA) | CodeCode Available | 2 |
| Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity | Dec 9, 2024 | Anomaly Detectiontext annotation | CodeCode Available | 2 |
| FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning | Dec 16, 2024 | DeepFake Detectiondiffusion-generated faces detection | CodeCode Available | 2 |
| OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System | Dec 28, 2024 | | CodeCode Available | 2 |
| R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization | Jan 2, 2025 | Data AugmentationVisual Localization | CodeCode Available | 2 |
| DiffGraph: Heterogeneous Graph Diffusion Model | Jan 4, 2025 | DenoisingGraph Generation | CodeCode Available | 2 |
| FlexCloud: Direct, Modular Georeferencing and Drift-Correction of Point Cloud Maps | Feb 1, 2025 | Autonomous Drivingmotion prediction | CodeCode Available | 2 |
| The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering | Feb 5, 2025 | Hallucination | CodeCode Available | 2 |
| SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL | Feb 17, 2025 | Few-Shot LearningHeuristic Search | CodeCode Available | 2 |
| VaViM and VaVAM: Autonomous Driving through Video Generative Modeling | Feb 21, 2025 | Autonomous DrivingImitation Learning | CodeCode Available | 2 |
| Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection | Apr 9, 2025 | Contrastive Learningcounterfactual | CodeCode Available | 2 |
| A Survey on Industrial Anomalies Synthesis | Feb 23, 2025 | Survey | CodeCode Available | 2 |
| Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think | Feb 27, 2025 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| InsTaG: Learning Personalized 3D Talking Head from Few-Second Video | Feb 27, 2025 | 3DGSTalking Head Generation | CodeCode Available | 2 |
| AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning | Mar 3, 2025 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| WritingBench: A Comprehensive Benchmark for Generative Writing | Mar 7, 2025 | Text Generation | CodeCode Available | 2 |
| SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing | Mar 18, 2025 | DenoisingMotion Generation | CodeCode Available | 2 |
| Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling | Jan 20, 2025 | Imitation LearningLanguage Modeling | CodeCode Available | 2 |
| MegaMath: Pushing the Limits of Open Math Corpora | Apr 3, 2025 | DiversityMath | CodeCode Available | 2 |
| POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction | Apr 8, 2025 | 3D ReconstructionDepth Estimation | CodeCode Available | 2 |
| Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation | May 16, 2025 | 3D geometryNavigate | CodeCode Available | 2 |
| GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning | May 16, 2025 | Data Augmentation | CodeCode Available | 2 |
| μPC: Scaling Predictive Coding to 100+ Layer Networks | May 19, 2025 | | CodeCode Available | 2 |
| VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank | May 20, 2025 | Image GenerationImage Quality Assessment | CodeCode Available | 2 |
| CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features | May 26, 2025 | | CodeCode Available | 2 |