| GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting | Dec 28, 2024 | Camera LocalizationPose Estimation | CodeCode Available | 2 |
| Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion | Jun 5, 2024 | 3D Generation3D Reconstruction | CodeCode Available | 2 |
| SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models | Aug 31, 2023 | DecoderLanguage Modeling | CodeCode Available | 2 |
| Video Polyp Segmentation: A Deep Learning Perspective | Mar 27, 2022 | AttributeDeep Learning | CodeCode Available | 2 |
| P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds | Jul 7, 2024 | 3D Single Object TrackingGPU | CodeCode Available | 2 |
| PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models | May 15, 2024 | Benchmarking | CodeCode Available | 2 |
| mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics | Jul 20, 2024 | | CodeCode Available | 2 |
| BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning | Feb 27, 2024 | Drug DiscoveryForward reaction prediction | CodeCode Available | 2 |
| Melting Pot 2.0 | Nov 24, 2022 | Artificial LifeNavigate | CodeCode Available | 2 |
| A Novel Approach to Industrial Defect Generation through Blended Latent Diffusion Model with Online Adaptation | Feb 29, 2024 | Anomaly DetectionDecoder | CodeCode Available | 2 |
| SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation | Feb 12, 2025 | Earth Observationobject-detection | CodeCode Available | 2 |
| SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama | Aug 18, 2024 | Script GenerationVideo Captioning | CodeCode Available | 2 |
| GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization | Sep 24, 2024 | 3D geometry3DGS | CodeCode Available | 2 |
| TraDiffusion: Trajectory-Based Training-Free Image Generation | Aug 19, 2024 | Image Generation | CodeCode Available | 2 |
| Mephisto: A Framework for Portable, Reproducible, and Iterative Crowdsourcing | Jan 12, 2023 | | CodeCode Available | 2 |
| Attention-based Deep Multiple Instance Learning | Feb 13, 2018 | Aerial Scene ClassificationMultiple Instance Learning | CodeCode Available | 2 |
| Interacting Attention Graph for Single Image Two-Hand Reconstruction | Mar 17, 2022 | 3D Interacting Hand Pose EstimationVocal Bursts Valence Prediction | CodeCode Available | 2 |
| Frequency-domain MLPs are More Effective Learners in Time Series Forecasting | Nov 10, 2023 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| REALY: Rethinking the Evaluation of 3D Face Reconstruction | Mar 18, 2022 | 3D Face ReconstructionFace Reconstruction | CodeCode Available | 2 |
| SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training | Jan 25, 2022 | DenoisingRepresentation Learning | CodeCode Available | 2 |
| Does Image Anonymization Impact Computer Vision Training? | Jun 8, 2023 | Face AnonymizationInstance Segmentation | CodeCode Available | 2 |
| NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario | May 24, 2023 | Autonomous DrivingQuestion Answering | CodeCode Available | 2 |
| In-Context Imitation Learning via Next-Token Prediction | Aug 28, 2024 | Imitation LearningPrediction | CodeCode Available | 2 |
| A Hybrid Transformer-Mamba Network for Single Image Deraining | Aug 31, 2024 | MambaRain Removal | CodeCode Available | 2 |
| Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization | Apr 8, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| LViT: Language meets Vision Transformer in Medical Image Segmentation | Jun 29, 2022 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 |
| gRNAde: Geometric Deep Learning for 3D RNA inverse design | May 24, 2023 | 3D geometryDeep Learning | CodeCode Available | 2 |
| Uni3D: Exploring Unified 3D Representation at Scale | Oct 10, 2023 | 3D Object ClassificationRetrieval | CodeCode Available | 2 |
| OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting | Jan 23, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation | Jul 17, 2024 | Anomaly DetectionAnomaly Segmentation | CodeCode Available | 2 |
| Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning | Apr 21, 2025 | AllForm | CodeCode Available | 2 |
| Cross-Prediction-Powered Inference | Sep 28, 2023 | Decision MakingMissing Labels | CodeCode Available | 2 |
| LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation | Apr 12, 2024 | | CodeCode Available | 2 |
| FedCLIP: Fast Generalization and Personalization for CLIP in Federated Learning | Feb 27, 2023 | Federated LearningPrivacy Preserving | CodeCode Available | 2 |
| Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis | Jul 24, 2022 | 3D geometryNeRF | CodeCode Available | 2 |
| VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors | Jul 3, 2024 | Neural Rendering | CodeCode Available | 2 |
| Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models | Oct 10, 2024 | GSM8KMath | CodeCode Available | 2 |
| Exploring CLIP for Assessing the Look and Feel of Images | Jul 25, 2022 | Image Quality AssessmentNo-Reference Image Quality Assessment | CodeCode Available | 2 |
| Visual Perception by Large Language Model's Weights | May 30, 2024 | | CodeCode Available | 2 |
| MCP-Solver: Integrating Language Models with Constraint Programming Systems | Dec 31, 2024 | Natural Language Understanding | CodeCode Available | 2 |
| SegNet4D: Efficient Instance-Aware 4D Semantic Segmentation for LiDAR Point Cloud | Jun 24, 2024 | Autonomous DrivingAutonomous Navigation | CodeCode Available | 2 |
| Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation | Nov 20, 2023 | 3D Human Pose EstimationPose Estimation | CodeCode Available | 2 |
| Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing | Apr 3, 2025 | BenchmarkingLogical Reasoning | CodeCode Available | 2 |
| Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning | Oct 10, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| CMB: A Comprehensive Medical Benchmark in Chinese | Aug 17, 2023 | | CodeCode Available | 2 |
| Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy | Oct 2, 2024 | Motion PlanningRobot Manipulation | CodeCode Available | 2 |
| StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding | Sep 20, 2023 | Chart Question AnsweringChart Understanding | CodeCode Available | 2 |
| CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making | Jun 13, 2024 | Decision Making | CodeCode Available | 2 |
| The P^3 dataset: Pixels, Points and Polygons for Multimodal Building Vectorization | May 21, 2025 | | CodeCode Available | 2 |
| Protein Representation Learning by Geometric Structure Pretraining | Mar 11, 2022 | Contrastive LearningPrediction | CodeCode Available | 2 |