| Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme | Apr 3, 2025 | Reinforcement Learning (RL)Visual Reasoning | CodeCode Available | 2 | 5 |
| Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets | Jun 5, 2025 | | CodeCode Available | 2 | 5 |
| HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding | Apr 20, 2024 | cross-modal alignmentVisual Grounding | CodeCode Available | 2 | 5 |
| D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive Security | Feb 15, 2025 | Task Planning | CodeCode Available | 2 | 5 |
| Learning Diffusion Priors from Observations by Expectation Maximization | May 22, 2024 | | CodeCode Available | 2 | 5 |
| Emotionally Enhanced Talking Face Generation | Mar 21, 2023 | Face GenerationTalking Face Generation | CodeCode Available | 2 | 5 |
| Vision Transformer with Quadrangle Attention | Mar 27, 2023 | object-detectionObject Detection | CodeCode Available | 2 | 5 |
| DM-NeRF: 3D Scene Geometry Decomposition and Manipulation from 2D Images | Aug 15, 2022 | NeRFObject | CodeCode Available | 2 | 5 |
| FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation | May 23, 2023 | FormLanguage Modelling | CodeCode Available | 2 | 5 |
| AbdomenAtlas-8K: Annotating 8,000 CT Volumes for Multi-Organ Segmentation in Three Weeks | May 16, 2023 | 8kActive Learning | CodeCode Available | 2 | 5 |
| SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning | Nov 15, 2024 | Image Quality AssessmentLanguage Modeling | CodeCode Available | 2 | 5 |
| Towards Metrical Reconstruction of Human Faces | Apr 13, 2022 | 2k3D Face Reconstruction | CodeCode Available | 2 | 5 |
| Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus | Nov 19, 2024 | Formal LogicLogical Reasoning | CodeCode Available | 2 | 5 |
| TorchAudio: Building Blocks for Audio and Speech Processing | Oct 28, 2021 | BIG-bench Machine LearningGPU | CodeCode Available | 2 | 5 |
| Deep Learning Accelerated Quantum Transport Simulations in Nanoelectronics: From Break Junctions to Field-Effect Transistors | Nov 13, 2024 | Computational Efficiency | CodeCode Available | 2 | 5 |
| Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification | Sep 1, 2024 | Scene ClassificationTransductive Zero-Shot Classification | CodeCode Available | 2 | 5 |
| Software package for simulations using the coarse-grained CALVADOS model | Apr 14, 2025 | | CodeCode Available | 2 | 5 |
| Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval | Mar 7, 2022 | Information RetrievalPassage Retrieval | CodeCode Available | 2 | 5 |
| FourierGNN: Rethinking Multivariate Time Series Forecasting from a Pure Graph Perspective | Nov 10, 2023 | Graph Neural NetworkMultivariate Time Series Forecasting | CodeCode Available | 2 | 5 |
| Interactive4D: Interactive 4D LiDAR Segmentation | Oct 10, 2024 | Interactive SegmentationSegmentation | CodeCode Available | 2 | 5 |
| Prototypical Networks for Few-shot Learning | Mar 15, 2017 | Category-Agnostic Pose EstimationFew-Shot Image Classification | CodeCode Available | 2 | 5 |
| BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo | Sep 21, 2022 | 3D Object DetectionDepth Estimation | CodeCode Available | 2 | 5 |
| Language is All a Graph Needs | Aug 14, 2023 | AllGraph Learning | CodeCode Available | 2 | 5 |
| Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark | Apr 23, 2025 | | CodeCode Available | 2 | 5 |
| GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians | Dec 4, 2023 | Motion Estimation | CodeCode Available | 2 | 5 |