| VTBench: Comprehensive Benchmark Suite Towards Real-World Virtual Try-on Models | May 26, 2025 | Occlusion HandlingVirtual Try-on | CodeCode Available | 1 |
| One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP | May 26, 2025 | AllImage Retrieval | CodeCode Available | 1 |
| MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness | May 26, 2025 | | CodeCode Available | 1 |
| Large Language Models for Planning: A Comprehensive and Systematic Survey | May 26, 2025 | Logical ReasoningNavigate | CodeCode Available | 1 |
| Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging | May 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Prot2Token: A Unified Framework for Protein Modeling via Next-Token Prediction | May 26, 2025 | DecoderMulti-Task Learning | CodeCode Available | 1 |
| KnowTrace: Bootstrapping Iterative Retrieval-Augmented Generation with Structured Knowledge Tracing | May 26, 2025 | Knowledge TracingMulti-hop Question Answering | CodeCode Available | 1 |
| PolyPose: Localizing Deformable Anatomy in 3D from Sparse 2D X-ray Images using Polyrigid Transforms | May 25, 2025 | AnatomyHyperparameter Optimization | CodeCode Available | 1 |
| PATS: Process-Level Adaptive Thinking Mode Switching | May 25, 2025 | Computational Efficiency | CodeCode Available | 1 |
| ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World | May 25, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| CMoS: Rethinking Time Series Prediction Through the Lens of Chunk-wise Spatial Correlations | May 25, 2025 | Time SeriesTime Series Forecasting | CodeCode Available | 1 |
| CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models | May 25, 2025 | | CodeCode Available | 1 |
| On the Role of Label Noise in the Feature Learning Process | May 25, 2025 | Learning with noisy labels | CodeCode Available | 1 |
| SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards | May 25, 2025 | Image CaptioningMultimodal Reasoning | CodeCode Available | 1 |
| Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval | May 25, 2025 | Passage RetrievalRetrieval | CodeCode Available | 1 |
| MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation | May 25, 2025 | Image GenerationImage Reconstruction | CodeCode Available | 1 |
| ADGSyn: Dual-Stream Learning for Efficient Anticancer Drug Synergy Prediction | May 25, 2025 | GPU | CodeCode Available | 1 |
| MMP-2K: A Benchmark Multi-Labeled Macro Photography Image Quality Assessment Database | May 25, 2025 | 2kDiversity | CodeCode Available | 1 |
| ReadBench: Measuring the Dense Text Visual Reading Ability of Vision-Language Models | May 25, 2025 | Optical Character Recognition (OCR)Reading Comprehension | CodeCode Available | 1 |
| Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs | May 25, 2025 | Machine TranslationMathematical Reasoning | CodeCode Available | 1 |
| SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data | May 25, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 1 |
| How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation | May 25, 2025 | 3D Panoptic SegmentationData Augmentation | CodeCode Available | 1 |
| Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering | May 25, 2025 | AnatomyBenchmarking | CodeCode Available | 1 |
| Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models | May 25, 2025 | Instruction Following | CodeCode Available | 1 |
| BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change | May 25, 2025 | Domain AdaptationUnsupervised Domain Adaptation | CodeCode Available | 1 |
| POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval | May 25, 2025 | Information RetrievalRAG | CodeCode Available | 1 |
| Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition | May 25, 2025 | Image Restoration | CodeCode Available | 1 |
| Structured Reinforcement Learning for Combinatorial Decision-Making | May 25, 2025 | Combinatorial OptimizationDecision Making | CodeCode Available | 1 |
| Step-level Reward for Free in RL-based T2I Diffusion Model Fine-tuning | May 25, 2025 | DenoisingReinforcement Learning (RL) | CodeCode Available | 1 |
| Behavior Injection: Preparing Language Models for Reinforcement Learning | May 25, 2025 | Data Augmentationreinforcement-learning | CodeCode Available | 1 |
| SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning | May 25, 2025 | BenchmarkingVisual Reasoning | CodeCode Available | 1 |
| FlashMD: long-stride, universal prediction of molecular dynamics | May 25, 2025 | Prediction | CodeCode Available | 1 |
| FP4 All the Way: Fully Quantized Training of LLMs | May 25, 2025 | AllQuantization | CodeCode Available | 1 |
| DISTA-Net: Dynamic Closely-Spaced Infrared Small Target Unmixing | May 25, 2025 | | CodeCode Available | 1 |
| Can Multimodal Large Language Models Understand Spatial Relations? | May 25, 2025 | Relation | CodeCode Available | 1 |
| LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling | May 25, 2025 | Computational EfficiencyMathematical Reasoning | CodeCode Available | 1 |
| STRICT: Stress Test of Rendering Images Containing Text | May 25, 2025 | Image GenerationInstruction Following | CodeCode Available | 1 |
| Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation | May 24, 2025 | Semantic SimilaritySemantic Textual Similarity | CodeCode Available | 1 |
| VORTA: Efficient Video Diffusion via Routing Sparse Attention | May 24, 2025 | Video Generation | CodeCode Available | 1 |
| Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework | May 24, 2025 | | CodeCode Available | 1 |
| PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs | May 24, 2025 | Quantization | CodeCode Available | 1 |
| GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains | May 24, 2025 | geo-localizationVisual Reasoning | CodeCode Available | 1 |
| Learning Fluid-Structure Interaction Dynamics with Physics-Informed Neural Networks and Immersed Boundary Methods | May 24, 2025 | | CodeCode Available | 1 |
| Removal of Hallucination on Hallucination: Debate-Augmented RAG | May 24, 2025 | HallucinationRAG | CodeCode Available | 1 |
| Enhancing Training Data Attribution with Representational Optimization | May 24, 2025 | | CodeCode Available | 1 |
| GainRAG: Preference Alignment in Retrieval-Augmented Generation through Gain Signal Synthesis | May 24, 2025 | RAGRetrieval | CodeCode Available | 1 |
| Mind the Gap: A Practical Attack on GGUF Quantization | May 24, 2025 | Code GenerationQuantization | CodeCode Available | 1 |
| LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift Analysis | May 24, 2025 | Malware Detection | CodeCode Available | 1 |
| Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framework | May 24, 2025 | Adversarial AttackSpeech Tokenization | CodeCode Available | 1 |
| DVD-Quant: Data-free Video Diffusion Transformers Quantization | May 24, 2025 | Data Free QuantizationQuantization | CodeCode Available | 1 |