| STELLA: Self-Evolving LLM Agent for Biomedical Research | Jul 1, 2025 | AI AgentHumanity's Last Exam | —Unverified | 0 |
| Geometry-aware 4D Video Generation for Robot Manipulation | Jul 1, 2025 | Robot ManipulationVideo Generation | —Unverified | 0 |
| TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation | Jul 1, 2025 | BenchmarkingMachine Translation | —Unverified | 0 |
| Real-Time Inverse Kinematics for Generating Multi-Constrained Movements of Virtual Human Characters | Jul 1, 2025 | | CodeCode Available | 1 |
| GAF-Guard: An Agentic Framework for Risk Management and Governance in Large Language Models | Jul 1, 2025 | HallucinationManagement | CodeCode Available | 0 |
| GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning | Jul 1, 2025 | document understandingMultimodal Reasoning | CodeCode Available | 7 |
| CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs | Jul 1, 2025 | Text GenerationVideo Understanding | —Unverified | 0 |
| HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning | Jul 1, 2025 | | —Unverified | 0 |
| Prompt2SegCXR:Prompt to Segment All Organs and Diseases in Chest X-rays | Jul 1, 2025 | AllImage Segmentation | —Unverified | 0 |
| Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations | Jul 1, 2025 | Point TrackingPose Tracking | —Unverified | 0 |
| RaGNNarok: A Light-Weight Graph Neural Network for Enhancing Radar Point Clouds on Unmanned Ground Vehicles | Jul 1, 2025 | Autonomous NavigationGraph Neural Network | —Unverified | 0 |
| Out-of-distribution detection in 3D applications: a review | Jul 1, 2025 | Autonomous DrivingNavigate | —Unverified | 0 |
| LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling | Jul 1, 2025 | Image RestorationUnified Image Restoration | CodeCode Available | 1 |
| ShapeEmbed: a self-supervised learning framework for 2D contour quantification | Jul 1, 2025 | Representation LearningSelf-Supervised Learning | —Unverified | 0 |
| UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions | Jul 1, 2025 | Domain AdaptationObject Tracking | CodeCode Available | 1 |
| Why Multi-Interest Fairness Matters: Hypergraph Contrastive Multi-Interest Learning for Fair Conversational Recommender System | Jul 1, 2025 | Contrastive LearningFairness | CodeCode Available | 0 |
| Empirical Analysis Of Heuristic and Approximation Algorithms for the The Mutual-Visibility Problem | Jul 1, 2025 | | CodeCode Available | 0 |
| UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis | Jul 1, 2025 | Image GenerationText to Image Generation | —Unverified | 0 |
| Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment | Jul 1, 2025 | Action RecognitionOne-Shot 3D Action Recognition | CodeCode Available | 1 |
| Enhancing LLM Agent Safety via Causal Influence Prompting | Jul 1, 2025 | Decision Making | CodeCode Available | 0 |
| Process-aware and high-fidelity microstructure generation using stable diffusion | Jul 1, 2025 | Image GenerationSemantic Segmentation | —Unverified | 0 |
| Understanding Generalization in Node and Link Prediction | Jul 1, 2025 | Link PredictionPrediction | —Unverified | 0 |
| Instant Particle Size Distribution Measurement Using CNNs Trained on Synthetic Data | Jul 1, 2025 | Computational Efficiency | CodeCode Available | 0 |
| A Unified Transformer-Based Framework with Pretraining For Whole Body Grasping Motion Generation | Jul 1, 2025 | Grasp GenerationMotion Generation | CodeCode Available | 0 |
| TABASCO: A Fast, Simplified Model for Molecular Generation with Improved Physical Quality | Jul 1, 2025 | Drug Design | CodeCode Available | 1 |
| MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement | Jul 1, 2025 | Automatic Speech RecognitionMamba | CodeCode Available | 2 |
| LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs | Jul 1, 2025 | Large Language Model | CodeCode Available | 1 |
| Imbalance Prime Sieving: Every Prime Gap Is a Result of a Möbius Imbalance Obstruction | Jul 1, 2025 | | CodeCode Available | 0 |
| World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model | Jul 1, 2025 | Autonomous DrivingNavSim | CodeCode Available | 0 |
| Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data | Jun 30, 2025 | | —Unverified | 0 |
| LLMs are Capable of Misaligned Behavior Under Explicit Prohibition and Surveillance | Jun 30, 2025 | | CodeCode Available | 0 |
| AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval | Jun 30, 2025 | | —Unverified | 0 |
| Unified Multimodal Understanding via Byte-Pair Visual Encoding | Jun 30, 2025 | | —Unverified | 0 |
| VMoBA: Mixture-of-Block Attention for Video Diffusion Models | Jun 30, 2025 | | —Unverified | 0 |
| EfficientXLang: Towards Improving Token Efficiency Through Cross-Lingual Reasoning | Jun 30, 2025 | | CodeCode Available | 0 |
| BIMgent: Towards Autonomous Building Modeling via Computer-use Agents | Jun 30, 2025 | | —Unverified | 0 |
| AQUA20: A Benchmark Dataset for Underwater Species Classification under Challenging Conditions | Jun 30, 2025 | | —Unverified | 0 |
| Single Image Test-Time Adaptation via Multi-View Co-Training | Jun 30, 2025 | | CodeCode Available | 0 |
| Spatially Gene Expression Prediction using Dual-Scale Contrastive Learning | Jun 30, 2025 | | CodeCode Available | 0 |
| A Closer Look at Conditional Prompt Tuning for Vision-Language Models | Jun 30, 2025 | | CodeCode Available | 0 |
| Calligrapher: Freestyle Text Image Customization | Jun 30, 2025 | | —Unverified | 0 |
| FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion | Jun 30, 2025 | | —Unverified | 0 |
| MGPRL: Distributed Multi-Gaussian Processes for Wi-Fi-based Multi-Robot Relative Localization in Large Indoor Environments | Jun 30, 2025 | | CodeCode Available | 0 |
| Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks | Jun 30, 2025 | | CodeCode Available | 0 |
| Mono-Modalizing Extremely Heterogeneous Multi-Modal Medical Image Registration | Jun 30, 2025 | | CodeCode Available | 0 |
| Revisiting Audio-Visual Segmentation with Vision-Centric Transformer | Jun 30, 2025 | | CodeCode Available | 0 |
| Visual Textualization for Image Prompted Object Detection | Jun 30, 2025 | | CodeCode Available | 0 |
| EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations | Jun 30, 2025 | | CodeCode Available | 0 |
| Beyond Low-Rank Tuning: Model Prior-Guided Rank Allocation for Effective Transfer in Low-Data and Large-Gap Regimes | Jun 30, 2025 | | CodeCode Available | 0 |
| When Will It Fail?: Anomaly to Prompt for Forecasting Future Anomalies in Time Series | Jun 30, 2025 | | CodeCode Available | 0 |