| How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models | Apr 29, 2024 | Code Generation | CodeCode Available | 2 |
| InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions | Apr 12, 2023 | DenoisingMotion Generation | CodeCode Available | 2 |
| Where a Strong Backbone Meets Strong Features -- ActionFormer for Ego4D Moment Queries Challenge | Nov 16, 2022 | Action LocalizationMoment Queries | CodeCode Available | 2 |
| Reconstructing People, Places, and Cameras | Dec 23, 2024 | Camera Pose EstimationPose Estimation | CodeCode Available | 2 |
| HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes | Sep 30, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Deep Unrestricted Document Image Rectification | Apr 18, 2023 | Local Distortion | CodeCode Available | 2 |
| Compositional Flows for 3D Molecule and Synthesis Pathway Co-design | Apr 10, 2025 | Drug Design | CodeCode Available | 2 |
| One Thousand and One Pairs: A "novel" challenge for long-context language models | Jun 24, 2024 | RetrievalSentence | CodeCode Available | 2 |
| Detecting CSV File Dialects by Table Uniformity Measurement and Data Type Inference | Feb 15, 2024 | CSV dialect detection | CodeCode Available | 2 |
| Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response | Dec 18, 2023 | Contrastive Learning | CodeCode Available | 2 |
| EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance | Sep 2, 2024 | AudioCapsAudio captioning | CodeCode Available | 2 |
| GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots | Sep 12, 2022 | | CodeCode Available | 2 |
| Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks | Oct 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective | Feb 22, 2024 | HallucinationSentence | CodeCode Available | 2 |
| Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text | Apr 14, 2023 | Few-Shot Learning | CodeCode Available | 2 |
| ProbPose: A Probabilistic Approach to 2D Human Pose Estimation | Dec 3, 2024 | 2D Human Pose EstimationData Augmentation | CodeCode Available | 2 |
| CARD: Classification and Regression Diffusion Models | Jun 15, 2022 | ClassificationDenoising | CodeCode Available | 2 |
| DeTPP: Leveraging Object Detection for Robust Long-Horizon Event Prediction | Aug 23, 2024 | DiversityPoint Processes | CodeCode Available | 2 |
| TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning | Jun 21, 2024 | FairnessGeographic Question Answering | CodeCode Available | 2 |
| OrthoPlanes: A Novel Representation for Better 3D-Awareness of GANs | Sep 27, 2023 | | CodeCode Available | 2 |
| I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts | May 25, 2025 | Mixture-of-Expertsmultimodal interaction | CodeCode Available | 2 |
| Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution | Dec 3, 2022 | Box-supervised Instance SegmentationDecoder | CodeCode Available | 2 |
| GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents | May 29, 2025 | | CodeCode Available | 2 |
| RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | Jul 28, 2023 | ObjectQuestion Answering | CodeCode Available | 2 |
| OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction | Mar 15, 2022 | 3D ReconstructionGraph Neural Network | CodeCode Available | 2 |
| Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset | Jul 3, 2023 | Human Mesh RecoveryMotion Generation | CodeCode Available | 2 |
| Routoo: Learning to Route to Large Language Models Effectively | Jan 25, 2024 | MMLUMulti-task Language Understanding | CodeCode Available | 2 |
| Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization | Aug 18, 2023 | | CodeCode Available | 2 |
| Objects as Points | Apr 16, 2019 | 3D Object DetectionKeypoint Detection | CodeCode Available | 2 |
| Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval | Apr 11, 2024 | DecoderDense Video Captioning | CodeCode Available | 2 |
| CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities | Mar 21, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation | Mar 17, 2025 | Data InteractionScene Understanding | CodeCode Available | 2 |
| Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models | Apr 6, 2024 | Image GenerationUnconditional Image Generation | CodeCode Available | 2 |
| Dataset Regeneration for Sequential Recommendation | May 28, 2024 | Recommendation SystemsSequential Recommendation | CodeCode Available | 2 |
| CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models | Jun 10, 2024 | Fairness | CodeCode Available | 2 |
| M^2SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation | Mar 20, 2023 | Computed Tomography (CT)Decoder | CodeCode Available | 2 |
| VQF: Highly Accurate IMU Orientation Estimation with Bias Estimation and Magnetic Disturbance Rejection | Mar 31, 2022 | | CodeCode Available | 2 |
| REAL-Colon: A dataset for developing real-world AI applications in colonoscopy | Mar 4, 2024 | Benchmarking | CodeCode Available | 2 |
| SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization | Dec 20, 2022 | Dialogue GenerationLanguage Modeling | CodeCode Available | 2 |
| Context-Aware Video Instance Segmentation | Jul 3, 2024 | Instance SegmentationPanoptic Segmentation | CodeCode Available | 2 |
| Benchmarking Graph Neural Networks | Mar 2, 2020 | BenchmarkingGraph Classification | CodeCode Available | 2 |
| PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images | Jul 13, 2022 | 3D human pose and shape estimation3D Human Pose Estimation | CodeCode Available | 2 |
| Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering | Nov 26, 2024 | PrognosisQuestion Answering | CodeCode Available | 2 |
| MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control | Mar 18, 2024 | Instruction FollowingMinecraft | CodeCode Available | 2 |
| DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models | May 31, 2024 | cross-modal alignmentVisual Localization | CodeCode Available | 2 |
| PerCo (SD): Open Perceptual Compression | Sep 30, 2024 | AttributeImage Compression | CodeCode Available | 2 |
| Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture | Mar 12, 2024 | Motion MagnificationRepresentation Learning | CodeCode Available | 2 |
| MINERVA: Evaluating Complex Video Reasoning | May 1, 2025 | BenchmarkingTemporal Localization | CodeCode Available | 2 |
| RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning | Apr 9, 2023 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 2 |
| Universal Guidance for Diffusion Models | Feb 14, 2023 | Face Recognitionobject-detection | CodeCode Available | 2 |