| TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Oct 25, 2024 | EgoSchemaHallucination | CodeCode Available | 2 | 5 |
| Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting | Dec 14, 2023 | | CodeCode Available | 2 | 5 |
| HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition | May 16, 2024 | Contrastive LearningSurgical phase recognition | CodeCode Available | 2 | 5 |
| System 2 Attention (is something you might need too) | Nov 20, 2023 | Math | CodeCode Available | 2 | 5 |
| BiM-VFI: directional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions | Dec 16, 2024 | Knowledge DistillationMotion Estimation | CodeCode Available | 2 | 5 |
| ChangeCLIP: Remote sensing change detection with multimodal vision-language representation learning | Jan 4, 2024 | Change DetectionDecoder | CodeCode Available | 2 | 5 |
| A Simple and Model-Free Path Filtering Algorithm for Smoothing and Accuracy | Jul 23, 2023 | Autonomous DrivingDenoising | CodeCode Available | 2 | 5 |
| CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection | Oct 4, 2023 | 3D Object Detectioncross-modal alignment | CodeCode Available | 2 | 5 |
| Uni-SMART: Universal Science Multimodal Analysis and Research Transformer | Mar 15, 2024 | Articles | CodeCode Available | 2 | 5 |
| RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images and A Benchmark | Mar 21, 2025 | Data Augmentation | CodeCode Available | 2 | 5 |
| FLOWR: Flow Matching for Structure-Aware De Novo, Interaction- and Fragment-Based Ligand Generation | Apr 14, 2025 | | CodeCode Available | 2 | 5 |
| Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows | Mar 3, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 2 | 5 |
| Refine3DNet: Scaling Precision in 3D Object Reconstruction from Multi-View RGB Images using Attention | Dec 1, 2024 | 3D Object Reconstruction3D Reconstruction | CodeCode Available | 2 | 5 |
| Few-Shot Bearing Fault Diagnosis Via Ensembling Transformer-Based Model With Mahalanobis Distance Metric Learning From Multiscale Features | Mar 25, 2024 | ClassificationFault Diagnosis | CodeCode Available | 2 | 5 |
| DGFont++: Robust Deformable Generative Networks for Unsupervised Font Generation | Dec 30, 2022 | Font GenerationImage-to-Image Translation | CodeCode Available | 2 | 5 |
| YOLOv5-6D: Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries | Mar 22, 2024 | 6D Pose Estimation using RGBGPU | CodeCode Available | 2 | 5 |
| Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs | Feb 4, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 2 | 5 |
| Analysing the Residual Stream of Language Models Under Knowledge Conflicts | Oct 21, 2024 | | CodeCode Available | 2 | 5 |
| JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework | Oct 11, 2024 | | CodeCode Available | 2 | 5 |
| Hypergraph Neural Networks | Sep 25, 2018 | Object RecognitionRepresentation Learning | CodeCode Available | 2 | 5 |
| Peeling Back the Layers: An In-Depth Evaluation of Encoder Architectures in Neural News Recommenders | Oct 2, 2024 | Model SelectionNews Recommendation | CodeCode Available | 2 | 5 |
| Efficient Non-stationary Online Learning by Wavelets with Applications to Online Distribution Shift Adaptation | Jul 21, 2024 | | CodeCode Available | 2 | 5 |
| ViSpeak: Visual Instruction Feedback in Streaming Videos | Mar 17, 2025 | Streaming video understandingVideo Understanding | CodeCode Available | 2 | 5 |
| SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery | Jul 17, 2022 | Land Cover ClassificationSemantic Segmentation | CodeCode Available | 2 | 5 |
| Self-Prompting Polyp Segmentation in Colonoscopy using Hybrid Yolo-SAM 2 Model | Sep 14, 2024 | Medical Image SegmentationPolyp Segmentation | CodeCode Available | 2 | 5 |
| Detection Transformer with Stable Matching | Apr 10, 2023 | DecoderPosition | CodeCode Available | 2 | 5 |
| Chain-of-Thought Reasoning Without Prompting | Feb 15, 2024 | Prompt Engineering | CodeCode Available | 2 | 5 |
| Domain Adaptation with a Single Vision-Language Embedding | Oct 28, 2024 | Domain AdaptationOne-shot Unsupervised Domain Adaptation | CodeCode Available | 2 | 5 |
| An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval | Jun 13, 2024 | Contrastive LearningImage Retrieval | CodeCode Available | 2 | 5 |
| HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation | Apr 15, 2025 | Benchmarkingscientific discovery | CodeCode Available | 2 | 5 |
| Prototype-based Cross-Modal Object Tracking | Dec 22, 2023 | ObjectObject Tracking | CodeCode Available | 2 | 5 |
| BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained Transformer | Jul 1, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models | Aug 1, 2024 | | CodeCode Available | 2 | 5 |
| 1st Place Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024 | Sep 28, 2024 | Position | CodeCode Available | 2 | 5 |
| C^2LEVA: Toward Comprehensive and Contamination-Free Language Model Evaluation | Dec 6, 2024 | Language Model EvaluationLanguage Modeling | CodeCode Available | 2 | 5 |
| Region Rebalance for Long-Tailed Semantic Segmentation | Apr 5, 2022 | SegmentationSemantic Segmentation | CodeCode Available | 2 | 5 |
| NLLB-CLIP -- train performant multilingual image retrieval model on a budget | Sep 4, 2023 | Image RetrievalRetrieval | CodeCode Available | 2 | 5 |
| TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis | May 2, 2023 | Moment RetrievalMotion Generation | CodeCode Available | 2 | 5 |
| Gaussian Processes for Big Data | Sep 26, 2013 | Gaussian ProcessesVariational Inference | CodeCode Available | 2 | 5 |
| DetGPT: Detect What You Need via Reasoning | May 23, 2023 | Autonomous DrivingObject | CodeCode Available | 2 | 5 |
| HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling | Aug 27, 2024 | Domain GeneralizationPrompt Engineering | CodeCode Available | 2 | 5 |
| GAIA: a benchmark for General AI Assistants | Nov 21, 2023 | Philosophy | CodeCode Available | 2 | 5 |
| WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects | Feb 18, 2025 | Machine Translation | CodeCode Available | 2 | 5 |
| Seeing through Satellite Images at Street Views | May 22, 2025 | | CodeCode Available | 2 | 5 |
| Large Language Models are In-Context Molecule Learners | Mar 7, 2024 | Cross-Modal RetrievalIn-Context Learning | CodeCode Available | 2 | 5 |
| Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models | Dec 19, 2023 | DenoisingNeural Architecture Search | CodeCode Available | 2 | 5 |
| Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent | May 12, 2025 | RAGReinforcement Learning (RL) | CodeCode Available | 2 | 5 |
| Deduplicating Training Data Mitigates Privacy Risks in Language Models | Feb 14, 2022 | | CodeCode Available | 2 | 5 |
| RandAugment: Practical automated data augmentation with a reduced search space | Sep 30, 2019 | Data AugmentationDomain Generalization | CodeCode Available | 2 | 5 |
| Mamba-R: Vision Mamba ALSO Needs Registers | May 23, 2024 | MambaSemantic Segmentation | CodeCode Available | 2 | 5 |