| GenBreak: Red Teaming Text-to-Image Generators Using Large Language Models | Jun 11, 2025 | Large Language ModelRed Teaming | —Unverified | 0 |
| Disclosure Audits for LLM Agents | Jun 11, 2025 | DiagnosticLanguage Modeling | —Unverified | 0 |
| GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments | Jun 11, 2025 | Active LearningBenchmarking | —Unverified | 0 |
| DynaSubVAE: Adaptive Subgrouping for Scalable and Robust OOD Detection | Jun 11, 2025 | ClusteringRepresentation Learning | —Unverified | 0 |
| AtmosMJ: Revisiting Gating Mechanism for AI Weather Forecasting Beyond the Year Scale | Jun 11, 2025 | GPUWeather Forecasting | CodeCode Available | 0 |
| TaskCraft: Automated Generation of Agentic Tasks | Jun 11, 2025 | | CodeCode Available | 2 |
| Learning to Collaborate Over Graphs: A Selective Federated Multi-Task Learning Approach | Jun 11, 2025 | Community DetectionFairness | CodeCode Available | 0 |
| The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset | Jun 11, 2025 | Brain Computer Interface | —Unverified | 0 |
| Data-Driven Modeling of IRCU Patient Flow in the COVID-19 Pandemic | Jun 11, 2025 | Respiratory Failure | CodeCode Available | 0 |
| TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding | Jun 11, 2025 | 4kLanguage Modeling | —Unverified | 0 |
| NnD: Diffusion-based Generation of Physically-Nonnegative Objects | Jun 11, 2025 | Scene Generation | —Unverified | 0 |
| Textual Bayes: Quantifying Uncertainty in LLM-Based Systems | Jun 11, 2025 | Bayesian InferencePrompt Engineering | —Unverified | 0 |
| What is the Cost of Differential Privacy for Deep Learning-Based Trajectory Generation? | Jun 11, 2025 | | CodeCode Available | 0 |
| Chat-of-Thought: Collaborative Multi-Agent System for Generating Domain Specific Information | Jun 11, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation | Jun 11, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMs | Jun 11, 2025 | HallucinationObject Hallucination | CodeCode Available | 1 |
| Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval | Jun 11, 2025 | RetrievalText to Video Retrieval | —Unverified | 0 |
| Efficient kernelized bandit algorithms via exploration distributions | Jun 11, 2025 | Thompson Sampling | —Unverified | 0 |
| Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban | Jun 11, 2025 | Sokoban | CodeCode Available | 1 |
| A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy | Jun 11, 2025 | | CodeCode Available | 2 |
| A quantum semantic framework for natural language processing | Jun 11, 2025 | | CodeCode Available | 5 |
| Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs | Jun 11, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Exposure-slot: Exposure-centric representations learning with Slot-in-Slot Attention for Region-aware Exposure Correction | Jun 11, 2025 | Exposure CorrectionImage Enhancement | CodeCode Available | 1 |
| VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks | Jun 10, 2025 | Multiple-choiceOpen-Ended Question Answering | —Unverified | 0 |
| Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement (ICCV-25 🥳) | Jun 10, 2025 | DisentanglementPerson Re-Identification | —Unverified | 0 |
| Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search | Jun 10, 2025 | In-Context Learning | —Unverified | 0 |
| GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech Instructions | Jun 10, 2025 | text-to-speechText to Speech | CodeCode Available | 1 |
| ContextLoss: Context Information for Topology-Preserving Segmentation | Jun 10, 2025 | Image SegmentationSemantic Segmentation | —Unverified | 0 |
| Sparse Autoencoders Bridge The Deep Learning Model and The Brain | Jun 10, 2025 | Deep Learning | —Unverified | 0 |
| Grids Often Outperform Implicit Neural Representations | Jun 10, 2025 | DenoisingSuper-Resolution | CodeCode Available | 0 |
| GPU-accelerated Modeling of Biological Regulatory Networks | Jun 10, 2025 | CPUglobal-optimization | —Unverified | 0 |
| JAFAR: Jack up Any Feature at Any Resolution | Jun 10, 2025 | Feature Upsampling | CodeCode Available | 3 |
| Technical Report for Argoverse2 Scenario Mining Challenges on Iterative Error Correction and Spatially-Aware Prompting | Jun 10, 2025 | Autonomous DrivingCode Generation | —Unverified | 0 |
| Optimal Operating Strategy for PV-BESS Households: Balancing Self-Consumption and Self-Sufficiency | Jun 10, 2025 | Model Predictive ControlReinforcement Learning (RL) | —Unverified | 0 |
| Navigating High-Dimensional Backstage: A Guide for Exploring Literature for the Reliable Use of Dimensionality Reduction | Jun 10, 2025 | Dimensionality ReductionDiversity | —Unverified | 0 |
| Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models | Jun 10, 2025 | | —Unverified | 0 |
| A Multi-Modal Spatial Risk Framework for EV Charging Infrastructure Using Remote Sensing | Jun 10, 2025 | Spatial Reasoning | —Unverified | 0 |
| An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models | Jun 10, 2025 | Action GenerationImage Captioning | —Unverified | 0 |
| Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity | Jun 10, 2025 | Experimental Design | —Unverified | 0 |
| FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation | Jun 10, 2025 | Image-text RetrievalQuestion Answering | CodeCode Available | 2 |
| Exploring the Capabilities of the Frontier Large Language Models for Nuclear Energy Research | Jun 10, 2025 | Code GenerationPrompt Engineering | —Unverified | 0 |
| DualEquiNet: A Dual-Space Hierarchical Equivariant Network for Large Biomolecules | Jun 10, 2025 | Property Prediction | —Unverified | 0 |
| Scalable and Cost-Efficient de Novo Template-Based Molecular Generation | Jun 10, 2025 | DiversityDrug Design | CodeCode Available | 1 |
| SDMPrune: Self-Distillation MLP Pruning for Efficient Large Language Models | Jun 10, 2025 | | CodeCode Available | 1 |
| Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation | Jun 10, 2025 | FoveationImage Segmentation | CodeCode Available | 2 |
| Solving the Job Shop Scheduling Problem with Graph Neural Networks: A Customizable Reinforcement Learning Environment | Jun 10, 2025 | Combinatorial OptimizationImitation Learning | CodeCode Available | 2 |
| Monocular 3D Hand Pose Estimation with Implicit Camera Alignment | Jun 10, 2025 | 3D Hand Pose EstimationHand Pose Estimation | CodeCode Available | 1 |
| XGraphRAG: Interactive Visual Analysis for Graph-based Retrieval-Augmented Generation | Jun 10, 2025 | graph constructionLanguage Modeling | CodeCode Available | 0 |
| SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR | Jun 10, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data | Jun 10, 2025 | text-to-speechText to Speech | —Unverified | 0 |