| DPLM-2: A Multimodal Diffusion Protein Language Model | Oct 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Automatically Interpreting Millions of Features in Large Language Models | Oct 17, 2024 | Semantic SimilaritySemantic Textual Similarity | CodeCode Available | 3 |
| Movie Gen: A Cast of Media Foundation Models | Oct 17, 2024 | Audio GenerationVideo Editing | CodeCode Available | 3 |
| MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models | Oct 16, 2024 | DiagnosticHallucination | CodeCode Available | 3 |
| The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio | Oct 16, 2024 | Hallucination | CodeCode Available | 3 |
| Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models | Oct 16, 2024 | HallucinationKnowledge Graphs | CodeCode Available | 3 |
| Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical Perception | Oct 16, 2024 | Binary ClassificationChunking | CodeCode Available | 3 |
| 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation | Oct 16, 2024 | AttributeImage Generation | CodeCode Available | 3 |
| PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking | Oct 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies | Oct 15, 2024 | | CodeCode Available | 3 |
| Latent Action Pretraining from Videos | Oct 15, 2024 | QuantizationRobot Manipulation | CodeCode Available | 3 |
| GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation | Oct 14, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 3 |
| LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory | Oct 14, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 3 |
| Predicting from Strings: Language Model Embeddings for Bayesian Optimization | Oct 14, 2024 | Bayesian OptimizationExperimental Design | CodeCode Available | 3 |
| LoLCATs: On Low-Rank Linearizing of Large Language Models | Oct 14, 2024 | MMLU | CodeCode Available | 3 |
| UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation | Oct 14, 2024 | Semantic SegmentationSemi-supervised Change Detection | CodeCode Available | 3 |
| Large-Scale 3D Medical Image Pre-training with Geometric Context Priors | Oct 13, 2024 | Contrastive LearningMedical Image Analysis | CodeCode Available | 3 |
| FlatQuant: Flatness Matters for LLM Quantization | Oct 12, 2024 | Quantization | CodeCode Available | 3 |
| MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection | Oct 12, 2024 | Anomaly Detection | CodeCode Available | 3 |
| C-Adapter: Adapting Deep Classifiers for Efficient Conformal Prediction Sets | Oct 12, 2024 | Conformal PredictionPrediction | CodeCode Available | 3 |
| CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation | Oct 12, 2024 | Conditional Image GenerationGPU | CodeCode Available | 3 |
| SceneCraft: Layout-Guided 3D Scene Generation | Oct 11, 2024 | 3D GenerationImage Generation | CodeCode Available | 3 |
| Baichuan-Omni Technical Report | Oct 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning | Oct 10, 2024 | 3D Parameter-Efficient Fine-Tuning for Classification3D Point Cloud Classification | CodeCode Available | 3 |
| Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis | Oct 10, 2024 | Feature CompressionImage Generation | CodeCode Available | 3 |
| Towards Next-Generation LLM-based Recommender Systems: A Survey and Beyond | Oct 10, 2024 | Large Language ModelRecommendation Systems | CodeCode Available | 3 |
| Fast Feedforward 3D Gaussian Splatting Compression | Oct 10, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 3 |
| Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow | Oct 9, 2024 | | CodeCode Available | 3 |
| Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making | Oct 9, 2024 | BenchmarkingDecision Making | CodeCode Available | 3 |
| TopoTune : A Framework for Generalized Combinatorial Complex Neural Networks | Oct 9, 2024 | Graph Neural Network | CodeCode Available | 3 |
| Rethinking the Evaluation of Visible and Infrared Image Fusion | Oct 9, 2024 | object-detectionObject Detection | CodeCode Available | 3 |
| AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image Generation | Oct 8, 2024 | DenoisingImage Generation | CodeCode Available | 3 |
| T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design | Oct 8, 2024 | Video AlignmentVideo Generation | CodeCode Available | 3 |
| AgentSquare: Automatic LLM Agent Search in Modular Design Space | Oct 8, 2024 | | CodeCode Available | 3 |
| Residual Kolmogorov-Arnold Network for Enhanced Deep Learning | Oct 7, 2024 | Computational EfficiencyDeep Learning | CodeCode Available | 3 |
| Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | Oct 7, 2024 | Natural Language Visual GroundingNavigate | CodeCode Available | 3 |
| SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference | Oct 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| High-Speed Stereo Visual SLAM for Low-Powered Computing Devices | Oct 5, 2024 | GPU | CodeCode Available | 3 |
| Accelerating Diffusion Transformers with Token-wise Feature Caching | Oct 5, 2024 | Video Generation | CodeCode Available | 3 |
| Neuron-Level Sequential Editing for Large Language Models | Oct 5, 2024 | Model Editing | CodeCode Available | 3 |
| MELODI: Exploring Memory Compression for Long Contexts | Oct 4, 2024 | | CodeCode Available | 3 |
| CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control | Oct 4, 2024 | Motion GenerationReinforcement Learning (RL) | CodeCode Available | 3 |
| SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation | Oct 4, 2024 | 16kCode Generation | CodeCode Available | 3 |
| AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models | Oct 3, 2024 | knowledge editingModel Editing | CodeCode Available | 3 |
| HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly | Oct 3, 2024 | RAG | CodeCode Available | 3 |
| AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs | Oct 3, 2024 | Red Teaming | CodeCode Available | 3 |
| RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph | Oct 3, 2024 | Code Generation | CodeCode Available | 3 |
| Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models | Oct 3, 2024 | | CodeCode Available | 3 |
| Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1 | Oct 3, 2024 | Scheduling | CodeCode Available | 3 |
| ControlAR: Controllable Image Generation with Autoregressive Models | Oct 3, 2024 | Image Generation | CodeCode Available | 3 |