| GTA1: GUI Test-time Scaling Agent | Jul 8, 2025 | Reinforcement Learning (RL)Task Planning | CodeCode Available | 2 |
| TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision | Jul 8, 2025 | Image GenerationOptical Character Recognition (OCR) | —Unverified | 0 |
| Predicting Graph Structure via Adapted Flux Balance Analysis | Jul 8, 2025 | PredictionTime Series | CodeCode Available | 0 |
| What You Have is What You Track: Adaptive and Robust Multimodal Tracking | Jul 8, 2025 | Mixture-of-ExpertsVisual Tracking | CodeCode Available | 0 |
| Unconditional Diffusion for Generative Sequential Recommendation | Jul 8, 2025 | DenoisingSequential Recommendation | CodeCode Available | 0 |
| CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations | Jul 8, 2025 | Generative Adversarial NetworkLarge Language Model | CodeCode Available | 0 |
| Chat-Ghosting: A Comparative Study of Methods for Auto-Completion in Dialog Systems | Jul 8, 2025 | Deep Learning | —Unverified | 0 |
| ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models | Jul 8, 2025 | Layout Generation | —Unverified | 0 |
| Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling | Jul 8, 2025 | ArticlesMultimodal Reasoning | —Unverified | 0 |
| Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation | Jul 8, 2025 | Depth EstimationDepth Prediction | —Unverified | 0 |
| RecRankerEval: A Flexible and Extensible Framework for Top-k LLM-based Recommendation | Jul 8, 2025 | Large Language Model | —Unverified | 0 |
| Automated Neuron Labelling Enables Generative Steering and Interpretability in Protein Language Models | Jul 8, 2025 | | CodeCode Available | 0 |
| Robust One-step Speech Enhancement via Consistency Distillation | Jul 8, 2025 | Speech Enhancement | CodeCode Available | 1 |
| PrefixAgent: An LLM-Powered Design Framework for Efficient Prefix Adder Optimization | Jul 8, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow | Jul 8, 2025 | Deformable Object ManipulationImitation Learning | —Unverified | 0 |
| Contrastive and Transfer Learning for Effective Audio Fingerprinting through a Real-World Evaluation Protocol | Jul 8, 2025 | Transfer Learning | —Unverified | 0 |
| Skywork-R1V3 Technical Report | Jul 8, 2025 | cross-modal alignmentMathematical Reasoning | CodeCode Available | 7 |
| eegFloss: A Python package for refining sleep EEG recordings using machine learning models | Jul 8, 2025 | EEGSleep Staging | CodeCode Available | 1 |
| Accelerating GenAI Workloads by Enabling RISC-V Microkernel Support in IREE | Jul 7, 2025 | | —Unverified | 0 |
| Text Detoxification: Data Efficiency, Semantic Preservation and Model Generalization | Jul 7, 2025 | | CodeCode Available | 0 |
| VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents | Jul 7, 2025 | | —Unverified | 0 |
| MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding | Jul 7, 2025 | | —Unverified | 0 |
| LAID: Lightweight AI-Generated Image Detection in Spatial and Spectral Domains | Jul 7, 2025 | | CodeCode Available | 0 |
| StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling | Jul 7, 2025 | | —Unverified | 0 |
| DeepCS-TRD, a Deep Learning-based Cross-Section Tree Ring Detector | Jul 7, 2025 | | CodeCode Available | 0 |
| TeethGenerator: A two-stage framework for paired pre- and post-orthodontic 3D dental data generation | Jul 7, 2025 | | CodeCode Available | 0 |
| Efficient Unlearning with Privacy Guarantees | Jul 7, 2025 | | CodeCode Available | 0 |
| LCDS: A Logic-Controlled Discharge Summary Generation System Supporting Source Attribution and Expert Review | Jul 7, 2025 | | CodeCode Available | 0 |
| Geometric-Guided Few-Shot Dental Landmark Detection with Human-Centric Foundation Model | Jul 7, 2025 | | CodeCode Available | 0 |
| Parameterized Diffusion Optimization enabled Autoregressive Ordinal Regression for Diabetic Retinopathy Grading | Jul 7, 2025 | | CodeCode Available | 0 |
| Robust Incomplete-Modality Alignment for Ophthalmic Disease Grading and Diagnosis via Labeled Optimal Transport | Jul 7, 2025 | | CodeCode Available | 0 |
| Going Beyond Heuristics by Imposing Policy Improvement as a Constraint | Jul 7, 2025 | | CodeCode Available | 0 |
| LumiCRS: Asymmetric Contrastive Prototype Learning for Long-Tail Conversational Movie Recommendation | Jul 7, 2025 | DiversityFairness | —Unverified | 0 |
| GIST: Cross-Domain Click-Through Rate Prediction via Guided Content-Behavior Distillation | Jul 7, 2025 | Click-Through Rate PredictionTransfer Learning | —Unverified | 0 |
| TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation | Jul 7, 2025 | Optical Flow EstimationVideo Frame Interpolation | —Unverified | 0 |
| Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning | Jul 7, 2025 | Reinforcement Learning (RL)Visual Reasoning | —Unverified | 0 |
| PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes | Jul 7, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer | Jul 7, 2025 | Computational EfficiencyImage Generation | —Unverified | 0 |
| ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation | Jul 7, 2025 | Code Generation | —Unverified | 0 |
| Real-Time Graph-based Point Cloud Networks on FPGAs via Stall-Free Deep Pipelining | Jul 7, 2025 | GPU | CodeCode Available | 0 |
| Disappearing Ink: Obfuscation Breaks N-gram Code Watermarks in Theory and Practice | Jul 7, 2025 | Authorship Attribution | —Unverified | 0 |
| MindFlow: Revolutionizing E-commerce Customer Support with Multimodal LLM Agents | Jul 7, 2025 | Decision Making | —Unverified | 0 |
| Hierarchical Intent-guided Optimization with Pluggable LLM-Driven Semantics for Session-based Recommendation | Jul 7, 2025 | Contrastive LearningDenoising | CodeCode Available | 0 |
| Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions | Jul 7, 2025 | Large Language ModelRAG | —Unverified | 0 |
| Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration | Jul 7, 2025 | Optical Character Recognition (OCR) | CodeCode Available | 2 |
| Activation Steering for Chain-of-Thought Compression | Jul 7, 2025 | GSM8KMath | CodeCode Available | 0 |
| Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning | Jul 7, 2025 | Code Generationreinforcement-learning | —Unverified | 0 |
| 2048: Reinforcement Learning in a Delayed Reward Environment | Jul 7, 2025 | quantile regressionreinforcement-learning | —Unverified | 0 |
| pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models | Jul 7, 2025 | Federated LearningPersonalized Federated Learning | CodeCode Available | 0 |
| FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation | Jul 7, 2025 | MambaRecommendation Systems | CodeCode Available | 1 |