| Mixed-Curvature Decision Trees and Random Forests | Jun 7, 2024 | | CodeCode Available | 2 |
| SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion | Sep 26, 2024 | DescriptiveGeneralized Referring Expression Comprehension | CodeCode Available | 2 |
| GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding | Nov 16, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| RecFlow: An Industrial Full Flow Recommendation Dataset | Oct 28, 2024 | Recommendation SystemsSelection bias | CodeCode Available | 2 |
| LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization | Mar 11, 2025 | GPUImage Generation | CodeCode Available | 2 |
| ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware | Dec 2, 2018 | GPUImage Classification | CodeCode Available | 2 |
| PerAct2: Benchmarking and Learning for Robotic Bimanual Manipulation Tasks | Jun 29, 2024 | Diversity | CodeCode Available | 2 |
| GPQA: A Graduate-Level Google-Proof Q&A Benchmark | Nov 20, 2023 | Multiple-choice | CodeCode Available | 2 |
| PruneVid: Visual Token Pruning for Efficient Video Large Language Models | Dec 20, 2024 | Video Understanding | CodeCode Available | 2 |
| Voice Conversion With Just Nearest Neighbors | May 30, 2023 | Voice Conversion | CodeCode Available | 2 |
| Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers | Mar 5, 2022 | Semantic SegmentationWeakly supervised Semantic Segmentation | CodeCode Available | 2 |
| DreamLLM: Synergistic Multimodal Comprehension and Creation | Sep 20, 2023 | multimodal generationVisual Question Answering | CodeCode Available | 2 |
| On-Device Domain Generalization | Sep 15, 2022 | Data AugmentationDomain Generalization | CodeCode Available | 2 |
| Dynamic Early Exit in Reasoning Models | Apr 22, 2025 | GSM8KMath | CodeCode Available | 2 |
| Medical Vision Generalist: Unifying Medical Imaging Tasks in Context | Jun 8, 2024 | Conditional Image GenerationDenoising | CodeCode Available | 2 |
| AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark | Dec 17, 2024 | Information RetrievalRetrieval | CodeCode Available | 2 |
| Revisiting Adversarial Training under Long-Tailed Distributions | Mar 15, 2024 | Adversarial DefenseData Augmentation | CodeCode Available | 2 |
| Many-Shot In-Context Learning in Multimodal Foundation Models | May 16, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| Towards Unified Keyframe Propagation Models | May 19, 2022 | Image InpaintingVideo Editing | CodeCode Available | 2 |
| Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks | Mar 27, 2025 | Imitation LearningMathematical Reasoning | CodeCode Available | 2 |
| OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents | Jun 17, 2025 | | CodeCode Available | 2 |
| A Versatile Framework for Multi-scene Person Re-identification | Mar 17, 2024 | Data AugmentationPerson Re-Identification | CodeCode Available | 2 |
| Measuring Massive Multitask Language Understanding | Sep 7, 2020 | Elementary MathematicsMulti-task Language Understanding | CodeCode Available | 2 |
| CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games | Mar 12, 2025 | Decision MakingVision-Language-Action | CodeCode Available | 2 |
| Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning | Mar 20, 2025 | ClassificationFew-Shot Learning | CodeCode Available | 2 |
| Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer | Dec 1, 2021 | | CodeCode Available | 2 |
| YOLO-UniOW: Efficient Universal Open-World Object Detection | Dec 30, 2024 | Incremental LearningObject | CodeCode Available | 2 |
| Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction | Aug 26, 2022 | Surface Reconstruction | CodeCode Available | 2 |
| DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems | Feb 6, 2024 | | CodeCode Available | 2 |
| CLRerNet: Improving Confidence of Lane Detection with LaneIoU | May 15, 2023 | Autonomous DrivingLane Detection | CodeCode Available | 2 |
| Do we actually understand the impact of renewables on electricity prices? A causal inference approach | Jan 10, 2025 | Causal Inference | CodeCode Available | 2 |
| Transformer Circuit Faithfulness Metrics are not Robust | Jul 11, 2024 | | CodeCode Available | 2 |
| Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement | May 6, 2024 | Computational EfficiencyDeep Learning | CodeCode Available | 2 |
| COVID-19 Image Data Collection: Prospective Predictions Are the Future | Jun 22, 2020 | Management | CodeCode Available | 2 |
| BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages | Feb 17, 2025 | Emotion Recognition | CodeCode Available | 2 |
| Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models | Jan 25, 2025 | AttributeContrastive Learning | CodeCode Available | 2 |
| Source-free Subject Adaptation for EEG-based Visual Recognition | Jan 20, 2023 | EEGElectroencephalogram (EEG) | CodeCode Available | 2 |
| HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States | Feb 20, 2025 | | CodeCode Available | 2 |
| Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy | Oct 13, 2024 | DenoisingPrediction | CodeCode Available | 2 |
| LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation | Mar 30, 2023 | Image GenerationLayout-to-Image Generation | CodeCode Available | 2 |
| CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction | Oct 2, 2023 | image-classificationImage Classification | CodeCode Available | 2 |
| Order Constraints in Optimal Transport | Oct 14, 2021 | Natural Language Inference | CodeCode Available | 2 |
| Real-time Scene Text Detection with Differentiable Binarization | Nov 20, 2019 | BinarizationOptical Character Recognition (OCR) | CodeCode Available | 2 |
| An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | Oct 22, 2020 | image-classificationSemantic Segmentation | CodeCode Available | 2 |
| VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment | Jan 3, 2025 | Computational EfficiencyScene Understanding | CodeCode Available | 2 |
| Hopular: Modern Hopfield Networks for Tabular Data | Jun 1, 2022 | Deep LearningGeneral Classification | CodeCode Available | 2 |
| TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes | Mar 28, 2024 | 3D dense captioningDense Captioning | CodeCode Available | 2 |
| Improving the Training of Rectified Flows | May 30, 2024 | Image GenerationKnowledge Distillation | CodeCode Available | 2 |
| A Systematic Study of Joint Representation Learning on Protein Sequences and Structures | Mar 11, 2023 | Contrastive LearningProtein Function Prediction | CodeCode Available | 2 |
| Evaluating the Performance of Large Language Models on GAOKAO Benchmark | May 21, 2023 | | CodeCode Available | 2 |