| Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting | Apr 7, 2025 | Boundary DetectionObject | CodeCode Available | 2 |
| VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation | Apr 5, 2025 | | CodeCode Available | 2 |
| Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models | Mar 21, 2025 | GSM8KQuestion Answering | CodeCode Available | 2 |
| LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models | Apr 14, 2025 | Equation DiscoveryMemorization | CodeCode Available | 2 |
| Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images | Apr 13, 2025 | GPU | CodeCode Available | 2 |
| GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing | Mar 13, 2024 | 3DGS | CodeCode Available | 2 |
| An All-Atom Generative Model for Designing Protein Complexes | Apr 17, 2025 | All | CodeCode Available | 2 |
| No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves | May 5, 2025 | Image GenerationRepresentation Learning | CodeCode Available | 2 |
| EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones | Nov 17, 2022 | Data AugmentationSelf-Supervised Learning | CodeCode Available | 2 |
| Multimodal Automated Fact-Checking: A Survey | May 22, 2023 | Fact CheckingMisinformation | CodeCode Available | 2 |
| HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention | Feb 22, 2024 | Image Inpaintingspeech-recognition | CodeCode Available | 2 |
| Synthetic Tumors Make AI Segment Tumors Better | Oct 26, 2022 | Tumor Segmentation | CodeCode Available | 2 |
| Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models | Nov 20, 2022 | Story ContinuationStory Visualization | CodeCode Available | 2 |
| Generative Diffusion Models on Graphs: Methods and Applications | Feb 6, 2023 | DenoisingGraph Generation | CodeCode Available | 2 |
| MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections | Mar 16, 2024 | 3D ReconstructionDenoising | CodeCode Available | 2 |
| BigSmall: Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological Measurements | Mar 21, 2023 | Multi-Task Learning | CodeCode Available | 2 |
| ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation | Apr 7, 2023 | 3D ReconstructionHomography Estimation | CodeCode Available | 2 |
| UCTB: An Urban Computing Tool Box for Building Spatiotemporal Prediction Services | Jun 7, 2023 | Diversity | CodeCode Available | 2 |
| Break-A-Scene: Extracting Multiple Concepts from a Single Image | May 25, 2023 | Complex Scene Breaking and Synthesis | CodeCode Available | 2 |
| Spectrum: Targeted Training on Signal to Noise Ratio | Jun 7, 2024 | GPU | CodeCode Available | 2 |
| Exploiting Scale-Variant Attention for Segmenting Small Medical Objects | Jul 10, 2024 | Cell SegmentationMRI segmentation | CodeCode Available | 2 |
| FB-BEV: BEV Representation from Forward-Backward View Transformations | Aug 4, 2023 | | CodeCode Available | 2 |
| GPT-Driver: Learning to Drive with GPT | Oct 2, 2023 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 2 |
| Personalizing Text-to-Image Generation via Aesthetic Gradients | Sep 25, 2022 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents | Nov 9, 2023 | Instruction FollowingLLM real-life tasks | CodeCode Available | 2 |
| Evolving Reservoirs for Meta Reinforcement Learning | Dec 9, 2023 | Meta Reinforcement Learningreinforcement-learning | CodeCode Available | 2 |
| Transformers are Multi-State RNNs | Jan 11, 2024 | Decoder | CodeCode Available | 2 |
| PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection | Jan 23, 2025 | object-detectionObject Detection | CodeCode Available | 2 |
| TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space | Feb 27, 2024 | Contrastive LearningHallucination | CodeCode Available | 2 |
| In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation | Mar 3, 2024 | HallucinationTruthfulQA | CodeCode Available | 2 |
| LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement | Mar 22, 2024 | Data AugmentationGSM8K | CodeCode Available | 2 |
| Transfer CLIP for Generalizable Image Denoising | Mar 22, 2024 | DecoderDenoising | CodeCode Available | 2 |
| Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization | Apr 23, 2024 | global-optimizationOptical Flow Estimation | CodeCode Available | 2 |
| Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields | Dec 6, 2023 | 3DGS3D scene Editing | CodeCode Available | 2 |
| CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning | Jul 5, 2022 | Code GenerationDecoder | CodeCode Available | 2 |
| H-Watch: An Open, Connected Platform for AI-Enhanced COVID19 Infection Symptoms Monitoring and Contact Tracing | Jul 31, 2024 | | CodeCode Available | 2 |
| Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos | Aug 26, 2024 | Large Language ModelMVBench | CodeCode Available | 2 |
| From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection | Sep 26, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models | Oct 4, 2024 | Dense Video CaptioningSentence | CodeCode Available | 2 |
| Trajectory Flow Matching with Applications to Clinical Time Series Modeling | Oct 28, 2024 | Time Series | CodeCode Available | 2 |
| StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification | Nov 11, 2024 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 2 |
| Semantic-Conditional Diffusion Networks for Image Captioning | Dec 6, 2022 | Cross-Modal RetrievalDecoder | CodeCode Available | 2 |
| LiSenNet: Lightweight Sub-band and Dual-Path Modeling for Real-Time Speech Enhancement | Sep 20, 2024 | Speech Enhancement | CodeCode Available | 2 |
| LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation | Feb 27, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| An Introduction to Neural Data Compression | Feb 14, 2022 | BIG-bench Machine LearningData Compression | CodeCode Available | 2 |
| Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment | Feb 17, 2025 | HallucinationLogical Reasoning | CodeCode Available | 2 |
| Real Time Speech Enhancement in the Waveform Domain | Jun 23, 2020 | CPUData Augmentation | CodeCode Available | 2 |
| RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction | Jul 7, 2025 | | CodeCode Available | 2 |
| Mechanistic Design and Scaling of Hybrid Architectures | Mar 26, 2024 | Mamba | CodeCode Available | 2 |
| CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments | Nov 4, 2024 | | CodeCode Available | 2 |