| EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering | May 30, 2025 | Denoising | CodeCode Available | 2 |
| FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation | May 30, 2025 | Hallucination | CodeCode Available | 2 |
| Optimal Density Functions for Weighted Convolution in Learning Models | May 30, 2025 | DenoisingImage Denoising | CodeCode Available | 2 |
| Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents | May 30, 2025 | BenchmarkingBlocking | CodeCode Available | 2 |
| PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations | May 30, 2025 | | CodeCode Available | 2 |
| ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL | May 30, 2025 | Image GenerationLanguage Modeling | CodeCode Available | 2 |
| ViStoryBench: Comprehensive Benchmark Suite for Story Visualization | May 30, 2025 | Story Visualization | CodeCode Available | 2 |
| Tackling View-Dependent Semantics in 3D Language Gaussian Splatting | May 30, 2025 | 3D Scene ReconstructionScene Understanding | CodeCode Available | 2 |
| GeoVision Labeler: Zero-Shot Geospatial Classification with Vision and Language Models | May 30, 2025 | ClassificationDisaster Response | CodeCode Available | 2 |
| Logits-Based Finetuning | May 30, 2025 | Out of Distribution (OOD) Detection | CodeCode Available | 2 |
| TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor Cores | May 30, 2025 | 3DGS | CodeCode Available | 2 |
| Optimal Weighted Convolution for Classification and Denosing | May 30, 2025 | ClassificationDenoising | CodeCode Available | 2 |
| When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways | May 30, 2025 | Continual LearningImage Augmentation | CodeCode Available | 2 |
| One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory | May 29, 2025 | Contrastive LearningText Retrieval | CodeCode Available | 2 |
| Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation | May 29, 2025 | Portrait AnimationVideo Alignment | CodeCode Available | 2 |
| SWE-bench Goes Live! | May 29, 2025 | | CodeCode Available | 2 |
| ZeroGUI: Automating Online GUI Learning at Zero Human Cost | May 29, 2025 | | CodeCode Available | 2 |
| Diffusion Guidance Is a Controllable Policy Improvement Operator | May 29, 2025 | Offline RL | CodeCode Available | 2 |
| D-AR: Diffusion via Autoregressive Models | May 29, 2025 | Denoising | CodeCode Available | 2 |
| VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models | May 29, 2025 | Self-Supervised LearningVideo Generation | CodeCode Available | 2 |
| ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks | May 29, 2025 | Spatial Reasoning | CodeCode Available | 2 |
| HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions | May 29, 2025 | Image AnimationVideo Generation | CodeCode Available | 2 |
| MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming | May 29, 2025 | DiversityEfficient Exploration | CodeCode Available | 2 |
| Vision Language Models are Biased | May 29, 2025 | Board Gamescounterfactual | CodeCode Available | 2 |
| UniTEX: Universal High Fidelity Generative Texturing for 3D Shapes | May 29, 2025 | Texture Synthesis | CodeCode Available | 2 |
| VERINA: Benchmarking Verifiable Code Generation | May 29, 2025 | BenchmarkingCode Generation | CodeCode Available | 2 |
| UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning | May 29, 2025 | | CodeCode Available | 2 |
| Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model | May 29, 2025 | DecoderImage Generation | CodeCode Available | 2 |
| OpenUni: A Simple Baseline for Unified Multimodal Understanding and Generation | May 29, 2025 | | CodeCode Available | 2 |
| GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents | May 29, 2025 | | CodeCode Available | 2 |
| ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering | May 29, 2025 | Large Language ModelPrompt Engineering | CodeCode Available | 2 |
| TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models | May 29, 2025 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 2 |
| Securing AI Agents with Information-Flow Control | May 29, 2025 | | CodeCode Available | 2 |
| VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning | May 29, 2025 | Anomaly DetectionDescriptive | CodeCode Available | 2 |
| Model-Preserving Adaptive Rounding | May 29, 2025 | modelQuantization | CodeCode Available | 2 |
| ZIPA: A family of efficient models for multilingual phone recognition | May 29, 2025 | Diversity | CodeCode Available | 2 |
| DRO: A Python Library for Distributionally Robust Optimization in Machine Learning | May 29, 2025 | | CodeCode Available | 2 |
| ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS | May 29, 2025 | 3DGSGPU | CodeCode Available | 2 |
| Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO | May 28, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| Zero-Shot Vision Encoder Grafting via LLM Surrogates | May 28, 2025 | DecoderLanguage Modeling | CodeCode Available | 2 |
| GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control | May 28, 2025 | 3D geometryAutonomous Driving | CodeCode Available | 2 |
| cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning | May 28, 2025 | CAD ReconstructionLarge Language Model | CodeCode Available | 2 |
| DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials | May 28, 2025 | Drug Discoverygraph partitioning | CodeCode Available | 2 |
| DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction | May 27, 2025 | Image Generation | CodeCode Available | 2 |
| Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment | May 27, 2025 | Adversarial AttackClustering | CodeCode Available | 2 |
| Improved Representation Steering for Language Models | May 27, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution | May 27, 2025 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| Reinforcing General Reasoning without Verifiers | May 27, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-state | May 27, 2025 | MambaTime Series | CodeCode Available | 2 |
| UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents | May 27, 2025 | 16k | CodeCode Available | 2 |