| SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation | Oct 17, 2024 | | CodeCode Available | 2 |
| A Comparative Study on Reasoning Patterns of OpenAI's o1 Model | Oct 17, 2024 | Math | CodeCode Available | 2 |
| RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards | Oct 17, 2024 | RAGRetrieval | CodeCode Available | 2 |
| LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch | Oct 17, 2024 | Code GenerationCombinatorial Optimization | CodeCode Available | 2 |
| ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding | Oct 17, 2024 | 3D Semantic SegmentationImage Generation | CodeCode Available | 2 |
| PUMA: Empowering Unified MLLM with Multi-granular Visual Generation | Oct 17, 2024 | DiversityImage Generation | CodeCode Available | 2 |
| VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding | Oct 17, 2024 | 3D geometry3D visual grounding | CodeCode Available | 2 |
| UniDrive: Towards Universal Driving Perception Across Camera Configurations | Oct 17, 2024 | Autonomous Driving | CodeCode Available | 2 |
| CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models | Oct 17, 2024 | Contrastive LearningDiversity | CodeCode Available | 2 |
| Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation | Oct 17, 2024 | | CodeCode Available | 2 |
| SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs | Oct 17, 2024 | | CodeCode Available | 2 |
| On the Role of Attention Heads in Large Language Model Safety | Oct 17, 2024 | AttributeLanguage Modeling | CodeCode Available | 2 |
| SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction | Oct 17, 2024 | Quantization | CodeCode Available | 2 |
| Local Off-Grid Weather Forecasting with Multi-Modal Earth Observation Data | Oct 16, 2024 | Decision MakingEarth Observation | CodeCode Available | 2 |
| CATCH: Channel-Aware multivariate Time Series Anomaly Detection via Frequency Patching | Oct 16, 2024 | Anomaly DetectionTime Series | CodeCode Available | 2 |
| Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective | Oct 16, 2024 | Conditional Image GenerationImage Generation | CodeCode Available | 2 |
| Explanation-Preserving Augmentation for Semi-Supervised Graph Representation Learning | Oct 16, 2024 | Graph ClassificationGraph Representation Learning | CodeCode Available | 2 |
| JudgeBench: A Benchmark for Evaluating LLM-based Judges | Oct 16, 2024 | Math | CodeCode Available | 2 |
| LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment | Oct 16, 2024 | Visual Localization | CodeCode Available | 2 |
| Evaluating Morphological Compositional Generalization in Large Language Models | Oct 16, 2024 | Text Generation | CodeCode Available | 2 |
| A Prompt-Based Knowledge Graph Foundation Model for Universal In-Context Reasoning | Oct 16, 2024 | In-Context LearningKnowledge Graphs | CodeCode Available | 2 |
| SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation | Oct 16, 2024 | DenoisingVideo Generation | CodeCode Available | 2 |
| GS^3: Efficient Relighting with Triple Gaussian Splatting | Oct 15, 2024 | GPU | CodeCode Available | 2 |
| WeatherDG: LLM-assisted Diffusion Model for Procedural Weather Generation in Domain-Generalized Semantic Segmentation | Oct 15, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 2 |
| Improving Long-Text Alignment for Text-to-Image Diffusion Models | Oct 15, 2024 | | CodeCode Available | 2 |
| VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI | Oct 15, 2024 | Question AnsweringVideo Question Answering | CodeCode Available | 2 |
| nvTorchCam: An Open-source Library for Camera-Agnostic Differentiable Geometric Vision | Oct 15, 2024 | Deep LearningGPU | CodeCode Available | 2 |
| Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement | Oct 15, 2024 | DisentanglementInductive Bias | CodeCode Available | 2 |
| Process Reward Model with Q-Value Rankings | Oct 15, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation | Oct 15, 2024 | HallucinationLanguage Modeling | CodeCode Available | 2 |
| Open World Object Detection: A Survey | Oct 15, 2024 | Incremental LearningObject | CodeCode Available | 2 |
| It Takes Two to Tango: Directly Optimizing for Constrained Synthesizability in Generative Molecular Design | Oct 15, 2024 | Drug Discoveryreinforcement-learning | CodeCode Available | 2 |
| Contrastive learning of cell state dynamics in response to perturbations | Oct 15, 2024 | Cell TrackingContrastive Learning | CodeCode Available | 2 |
| MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding | Oct 15, 2024 | Visual Question Answering | CodeCode Available | 2 |
| Multiview Scene Graph | Oct 15, 2024 | DecoderObject | CodeCode Available | 2 |
| MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models | Oct 15, 2024 | | CodeCode Available | 2 |
| When Attention Sink Emerges in Language Models: An Empirical View | Oct 14, 2024 | Quantization | CodeCode Available | 2 |
| A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration | Oct 14, 2024 | Point Cloud Registration | CodeCode Available | 2 |
| Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models | Oct 14, 2024 | 3D geometryDenoising | CodeCode Available | 2 |
| Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes | Oct 14, 2024 | Motion GenerationMotion Synthesis | CodeCode Available | 2 |
| GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs | Oct 14, 2024 | Few-Shot LearningTAG | CodeCode Available | 2 |
| A Scalable Communication Protocol for Networks of Large Language Models | Oct 14, 2024 | | CodeCode Available | 2 |
| Locality Alignment Improves Vision-Language Models | Oct 14, 2024 | Semantic SegmentationSpatial Reasoning | CodeCode Available | 2 |
| High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity | Oct 14, 2024 | DenoisingDichotomous Image Segmentation | CodeCode Available | 2 |
| Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts | Oct 14, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Adaptive Probabilistic ODE Solvers Without Adaptive Memory Requirements | Oct 14, 2024 | State EstimationTime Series | CodeCode Available | 2 |
| LVD-2M: A Long-take Video Dataset with Temporally Dense Captions | Oct 14, 2024 | Video CaptioningVideo Generation | CodeCode Available | 2 |
| Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs | Oct 14, 2024 | Computational EfficiencyQuestion Answering | CodeCode Available | 2 |
| Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models | Oct 14, 2024 | | CodeCode Available | 2 |
| Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads | Oct 14, 2024 | Talking Head Generation | CodeCode Available | 2 |