| VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models | May 23, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Generative Distribution Embeddings | May 23, 2025 | | CodeCode Available | 1 |
| HRSim: An agent-based simulation platform for high-capacity ride-sharing services | May 23, 2025 | | CodeCode Available | 1 |
| Taming Diffusion for Dataset Distillation with High Representativeness | May 23, 2025 | Dataset DistillationImage Generation | CodeCode Available | 1 |
| CENet: Context Enhancement Network for Medical Image Segmentation | May 23, 2025 | DecoderImage Segmentation | CodeCode Available | 1 |
| CausalDynamics: A large-scale benchmark for structural discovery of dynamical causal models | May 22, 2025 | Causal DiscoveryGraph Reconstruction | CodeCode Available | 1 |
| Background Matters: A Cross-view Bidirectional Modeling Framework for Semi-supervised Medical Image Segmentation | May 22, 2025 | Image SegmentationMedical Image Segmentation | CodeCode Available | 1 |
| Deep Learning-Driven Ultra-High-Definition Image Restoration: A Survey | May 22, 2025 | DeblurringDeep Learning | CodeCode Available | 1 |
| A Square Peg in a Square Hole: Meta-Expert for Long-Tailed Semi-Supervised Learning | May 22, 2025 | Pseudo Label | CodeCode Available | 1 |
| R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search | May 22, 2025 | | CodeCode Available | 1 |
| ChemMLLM: Chemical Multimodal Large Language Model | May 22, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models | May 22, 2025 | | CodeCode Available | 1 |
| From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition | May 22, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios | May 22, 2025 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering | May 22, 2025 | BenchmarkingEvidence Selection | CodeCode Available | 1 |
| PICT -- A Differentiable, GPU-Accelerated Multi-Block PISO Solver for Simulation-Coupled Learning Tasks in Fluid Dynamics | May 22, 2025 | GPU | CodeCode Available | 1 |
| CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms | May 22, 2025 | Token Reduction | CodeCode Available | 1 |
| Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning | May 22, 2025 | Misinformationreinforcement-learning | CodeCode Available | 1 |
| Efficient Motion Prompt Learning for Robust Visual Tracking | May 22, 2025 | DecoderPrompt Learning | CodeCode Available | 1 |
| Style Transfer with Diffusion Models for Synthetic-to-Real Domain Adaptation | May 22, 2025 | Domain AdaptationSemantic Segmentation | CodeCode Available | 1 |
| Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models | May 22, 2025 | Reinforcement Learning (RL) | CodeCode Available | 1 |
| MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming | May 22, 2025 | Red TeamingSafety Alignment | CodeCode Available | 1 |
| RealEngine: Simulating Autonomous Driving in Realistic Context | May 22, 2025 | 3D Scene ReconstructionAutonomous Driving | CodeCode Available | 1 |
| Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs | May 22, 2025 | DiagnosticMachine Unlearning | CodeCode Available | 1 |
| REOBench: Benchmarking Robustness of Earth Observation Foundation Models | May 22, 2025 | BenchmarkingContrastive Learning | CodeCode Available | 1 |
| Forward-only Diffusion Probabilistic Models | May 22, 2025 | Conditional Image GenerationImage Dehazing | CodeCode Available | 1 |
| Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection | May 22, 2025 | Decoder | CodeCode Available | 1 |
| LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding | May 22, 2025 | Position | CodeCode Available | 1 |
| ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects | May 22, 2025 | Text to SQLText-To-SQL | CodeCode Available | 1 |
| Flow Matching based Sequential Recommender Model | May 22, 2025 | modelSequential Recommendation | CodeCode Available | 1 |
| REPA Works Until It Doesn't: Early-Stopped, Holistic Alignment Supercharges Diffusion Training | May 22, 2025 | Denoising | CodeCode Available | 1 |
| O^2-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering | May 22, 2025 | Answer GenerationOpen-Ended Question Answering | CodeCode Available | 1 |
| FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail | May 22, 2025 | Demand ForecastingImputation | CodeCode Available | 1 |
| AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios | May 22, 2025 | BenchmarkingInstruction Following | CodeCode Available | 1 |
| DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos | May 22, 2025 | Natural Language Moment RetrievalNatural Language Queries | CodeCode Available | 1 |
| R^2ec: Towards Large Recommender Models with Reasoning | May 22, 2025 | | CodeCode Available | 1 |
| Guided Diffusion Sampling on Function Spaces with Applications to PDEs | May 22, 2025 | Denoising | CodeCode Available | 1 |
| Sketchy Bounding-box Supervision for 3D Instance Segmentation | May 22, 2025 | 3D Instance SegmentationInstance Segmentation | CodeCode Available | 1 |
| JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model | May 22, 2025 | GPULong-range modeling | CodeCode Available | 1 |
| Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On | May 22, 2025 | Image GenerationVirtual Try-on | CodeCode Available | 1 |
| Chirp Delay-Doppler Domain Modulation: A New Paradigm of Integrated Sensing and Communication for Autonomous Vehicles | May 22, 2025 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 |
| ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation | May 22, 2025 | Chunking | CodeCode Available | 1 |
| LINEA: Fast and Accurate Line Detection Using Scalable Transformers | May 22, 2025 | Line DetectionLine Segment Detection | CodeCode Available | 1 |
| Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey | May 22, 2025 | | CodeCode Available | 1 |
| Transformer brain encoders explain human high-level visual responses | May 22, 2025 | | CodeCode Available | 1 |
| OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning | May 22, 2025 | Open Vocabulary Panoptic SegmentationOpen Vocabulary Semantic Segmentation | CodeCode Available | 1 |
| CoNav: Collaborative Cross-Modal Reasoning for Embodied Navigation | May 22, 2025 | Scene UnderstandingSpatial Reasoning | CodeCode Available | 1 |
| OSCAR: One-Step Diffusion Codec for Image Compression Across Multiple Bit-rates | May 22, 2025 | DenoisingImage Compression | CodeCode Available | 1 |
| FoMoH: A clinically meaningful foundation model evaluation for structured electronic health records | May 22, 2025 | | CodeCode Available | 1 |
| Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression | May 22, 2025 | HallucinationImage Description | CodeCode Available | 1 |