| ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers | Dec 17, 2024 | ArticlesForm | CodeCode Available | 2 |
| AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark | Dec 17, 2024 | Information RetrievalRetrieval | CodeCode Available | 2 |
| Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning | Dec 17, 2024 | Denoising | CodeCode Available | 2 |
| DINO-Foresight: Looking into the Future with DINO | Dec 16, 2024 | Autonomous DrivingScene Understanding | CodeCode Available | 2 |
| Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection | Dec 16, 2024 | LLM-generated Text DetectionText Detection | CodeCode Available | 2 |
| ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data | Dec 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis | Dec 16, 2024 | DisentanglementMultimodal Sentiment Analysis | CodeCode Available | 2 |
| Generative Inbetweening through Frame-wise Conditions-Driven Video Generation | Dec 16, 2024 | Video Generation | CodeCode Available | 2 |
| Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach | Dec 16, 2024 | Representation LearningRetrieval | CodeCode Available | 2 |
| FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning | Dec 16, 2024 | DeepFake Detectiondiffusion-generated faces detection | CodeCode Available | 2 |
| Predicting the Original Appearance of Damaged Historical Documents | Dec 16, 2024 | Binarization | CodeCode Available | 2 |
| HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection | Dec 16, 2024 | 3D Object Detection3D Object Detection on View-of-Delft (val) | CodeCode Available | 2 |
| BiM-VFI: directional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions | Dec 16, 2024 | Knowledge DistillationMotion Estimation | CodeCode Available | 2 |
| RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation | Dec 16, 2024 | RAGRetrieval | CodeCode Available | 2 |
| The dark side of the forces: assessing non-conservative force models for atomistic machine learning | Dec 16, 2024 | Computational chemistryComputational Efficiency | CodeCode Available | 2 |
| SCoralDet: Efficient real-time underwater soft coral detection with YOLO | Dec 16, 2024 | 2D Object Detectionobject-detection | CodeCode Available | 2 |
| Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning | Dec 16, 2024 | HallucinationRobot Manipulation | CodeCode Available | 2 |
| Gramian Multimodal Representation Learning and Alignment | Dec 16, 2024 | Contrastive LearningRepresentation Learning | CodeCode Available | 2 |
| Causal Diffusion Transformers for Generative Modeling | Dec 16, 2024 | DecoderImage Generation | CodeCode Available | 2 |
| LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts | Dec 16, 2024 | General KnowledgeInstruction Following | CodeCode Available | 2 |
| No More Adam: Learning Rate Scaling at Initialization is All You Need | Dec 16, 2024 | All | CodeCode Available | 2 |
| FSTA-SNN:Frequency-based Spatial-Temporal Attention Module for Spiking Neural Networks | Dec 15, 2024 | | CodeCode Available | 2 |
| Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval | Dec 15, 2024 | Image RetrievalRetrieval | CodeCode Available | 2 |
| SHMT: Self-supervised Hierarchical Makeup Transfer via Latent Diffusion Models | Dec 15, 2024 | | CodeCode Available | 2 |
| Reliable, Reproducible, and Really Fast Leaderboards with Evalica | Dec 15, 2024 | | CodeCode Available | 2 |
| Exploring Enhanced Contextual Information for Video-Level Object Tracking | Dec 15, 2024 | ObjectObject Tracking | CodeCode Available | 2 |
| Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition | Dec 15, 2024 | Computational EfficiencyVideo Recognition | CodeCode Available | 2 |
| AirMorph: Topology-Preserving Deep Learning for Pulmonary Airway Analysis | Dec 15, 2024 | AnatomyDeep Learning | CodeCode Available | 2 |
| GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control | Dec 15, 2024 | Autonomous Driving | CodeCode Available | 2 |
| Physics-based battery model parametrisation from impedance data | Dec 14, 2024 | | CodeCode Available | 2 |
| NeuralPLexer3: Accurate Biomolecular Complex Structure Prediction with Flow Models | Dec 14, 2024 | BenchmarkingDrug Design | CodeCode Available | 2 |
| Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection | Dec 14, 2024 | Denoising | CodeCode Available | 2 |
| DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification | Dec 14, 2024 | Mixture-of-ExpertsObject | CodeCode Available | 2 |
| MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt | Dec 14, 2024 | MambaObject | CodeCode Available | 2 |
| Memory Efficient Matting with Adaptive Token Routing | Dec 14, 2024 | Image Matting | CodeCode Available | 2 |
| Mr. DETR: Instructive Multi-Route Training for Detection Transformers | Dec 13, 2024 | DecoderObject Detection | CodeCode Available | 2 |
| EvalGIM: A Library for Evaluating Generative Image Models | Dec 13, 2024 | BenchmarkingDiversity | CodeCode Available | 2 |
| Financial Fine-tuning a Large Time Series Model | Dec 13, 2024 | Image GenerationPrediction | CodeCode Available | 2 |
| UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities | Dec 13, 2024 | Contrastive Learning | CodeCode Available | 2 |
| Simple Guidance Mechanisms for Discrete Diffusion Models | Dec 13, 2024 | Image Generation | CodeCode Available | 2 |
| Efficient Large-Scale Traffic Forecasting with Transformers: A Spatial Data Management Perspective | Dec 13, 2024 | ManagementTraffic Prediction | CodeCode Available | 2 |
| GAOKAO-Eval: Does high scores truly reflect strong capabilities in LLMs? | Dec 13, 2024 | | CodeCode Available | 2 |
| You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects | Dec 13, 2024 | Large Language Model | CodeCode Available | 2 |
| GaussianAD: Gaussian-Centric End-to-End Autonomous Driving | Dec 13, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 2 |
| GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction | Dec 13, 2024 | Autonomous DrivingPrediction | CodeCode Available | 2 |
| AutoPatent: A Multi-Agent Framework for Automatic Patent Generation | Dec 13, 2024 | Text Generation | CodeCode Available | 2 |
| RemDet: Rethinking Efficient Model Design for UAV Object Detection | Dec 13, 2024 | Objectobject-detection | CodeCode Available | 2 |
| V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding | Dec 12, 2024 | Position | CodeCode Available | 2 |
| Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning | Dec 12, 2024 | Decision Making | CodeCode Available | 2 |
| Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation | Dec 12, 2024 | Image AugmentationImage Generation | CodeCode Available | 2 |