| Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction Tuning | Apr 2, 2025 | AttributeImage Quality Assessment | CodeCode Available | 1 |
| Urban Computing in the Era of Large Language Models | Apr 2, 2025 | Decision MakingSurvey | CodeCode Available | 1 |
| Efficient Constant-Space Multi-Vector Retrieval | Apr 2, 2025 | ManagementRetrieval | CodeCode Available | 1 |
| DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image | Apr 2, 2025 | Depth CompletionDepth Estimation | CodeCode Available | 1 |
| TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining | Apr 2, 2025 | Continual LearningContinual Pretraining | CodeCode Available | 1 |
| Slow-Fast Architecture for Video Multi-Modal Large Language Models | Apr 2, 2025 | Video Understanding | CodeCode Available | 1 |
| ProtoGCD: Unified and Unbiased Prototype Learning for Generalized Category Discovery | Apr 2, 2025 | Contrastive Learning | CodeCode Available | 1 |
| Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks | Apr 2, 2025 | | CodeCode Available | 1 |
| GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning | Apr 2, 2025 | Decision MakingDiagnostic | CodeCode Available | 1 |
| BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing | Apr 2, 2025 | 3D ReconstructionBenchmarking | CodeCode Available | 1 |
| Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure | Apr 2, 2025 | Arithmetic ReasoningData Augmentation | CodeCode Available | 1 |
| Memory-efficient Low-latency Remote Photoplethysmography through Temporal-Spatial State Space Duality | Apr 2, 2025 | | CodeCode Available | 1 |
| ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning | Apr 2, 2025 | Reinforcement Learning (RL) | CodeCode Available | 1 |
| From Shadows to Safety: Occlusion Tracking and Risk Mitigation for Urban Autonomous Driving | Apr 2, 2025 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 |
| Decoding Covert Speech from EEG Using a Functional Areas Spatio-Temporal Transformer | Apr 2, 2025 | EEGElectroencephalogram (EEG) | CodeCode Available | 1 |
| InvFussion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems | Apr 2, 2025 | | CodeCode Available | 1 |
| Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes | Apr 2, 2025 | MambaSaliency Prediction | CodeCode Available | 1 |
| Quattro: Transformer-Accelerated Iterative Linear Quadratic Regulator Framework for Fast Trajectory Optimization | Apr 2, 2025 | GPUModel Predictive Control | CodeCode Available | 1 |
| Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models? | Apr 2, 2025 | AttributeReinforcement Learning (RL) | CodeCode Available | 1 |
| STPNet: Scale-aware Text Prompt Network for Medical Image Segmentation | Apr 2, 2025 | Image SegmentationLanguage Modeling | CodeCode Available | 1 |
| GSR4B: Biomass Map Super-Resolution with Sentinel-1/2 Guidance | Apr 2, 2025 | regressionSuper-Resolution | CodeCode Available | 1 |
| v-CLR: View-Consistent Learning for Open-World Instance Segmentation | Apr 2, 2025 | Instance SegmentationObject | CodeCode Available | 1 |
| FeatInsight: An Online ML Feature Management System on 4Paradigm Sage-Studio Platform | Apr 1, 2025 | Fraud DetectionManagement | CodeCode Available | 1 |
| Probabilistically safe and efficient model-based Reinforcement Learning | Apr 1, 2025 | Model-based Reinforcement LearningModel Predictive Control | CodeCode Available | 1 |
| Robust LiDAR-Camera Calibration with 2D Gaussian Splatting | Apr 1, 2025 | Camera Calibration | CodeCode Available | 1 |
| MPCritic: A plug-and-play MPC architecture for reinforcement learning | Apr 1, 2025 | Model Predictive ControlReinforcement Learning (RL) | CodeCode Available | 1 |
| Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation | Apr 1, 2025 | Image SegmentationSemantic Segmentation | CodeCode Available | 1 |
| Near Field Localization via AI-Aided Subspace Methods | Apr 1, 2025 | subspace methods | CodeCode Available | 1 |
| SeizureTransformer: Scaling U-Net with Transformer for Simultaneous Time-Step Level Seizure Detection from Long EEG Recordings | Apr 1, 2025 | DecoderEEG | CodeCode Available | 1 |
| Automated Explanation of Machine Learning Models of Footballing Actions in Words | Apr 1, 2025 | regression | CodeCode Available | 1 |
| Effect-driven interpretation: Functors for natural language composition | Apr 1, 2025 | | CodeCode Available | 1 |
| A Doubly Decoupled Network for edge detection | Apr 1, 2025 | DiversityEdge Detection | CodeCode Available | 1 |
| Learning to Normalize on the SPD Manifold under Bures-Wasserstein Geometry | Apr 1, 2025 | Representation Learning | CodeCode Available | 1 |
| LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models | Apr 1, 2025 | | CodeCode Available | 1 |
| GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition | Apr 1, 2025 | Computational Efficiencynamed-entity-recognition | CodeCode Available | 1 |
| Flow Matching on Lie Groups | Apr 1, 2025 | | CodeCode Available | 1 |
| Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical Documents | Apr 1, 2025 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 1 |
| Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute | Apr 1, 2025 | | CodeCode Available | 1 |
| Improved Visual-Spatial Reasoning via R1-Zero-Like Training | Apr 1, 2025 | GPUSpatial Reasoning | CodeCode Available | 1 |
| CellVTA: Enhancing Vision Foundation Models for Accurate Cell Segmentation and Classification | Apr 1, 2025 | Cell SegmentationInstance Segmentation | CodeCode Available | 1 |
| SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning | Apr 1, 2025 | Representation LearningSelf-Supervised Learning | CodeCode Available | 1 |
| MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization | Apr 1, 2025 | Image GenerationImage Reconstruction | CodeCode Available | 1 |
| WikiVideo: Article Generation from Multiple Videos | Apr 1, 2025 | ArticlesRAG | CodeCode Available | 1 |
| IMPACT: A Generic Semantic Loss for Multimodal Medical Image Registration | Mar 31, 2025 | Deformable Medical Image RegistrationImage Registration | CodeCode Available | 1 |
| It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data | Mar 31, 2025 | text annotation | CodeCode Available | 1 |
| MaintainCoder: Maintainable Code Generation Under Dynamic Requirements | Mar 31, 2025 | Code Generation | CodeCode Available | 1 |
| AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization | Mar 31, 2025 | | CodeCode Available | 1 |
| Towards Understanding How Knowledge Evolves in Large Vision-Language Models | Mar 31, 2025 | | CodeCode Available | 1 |
| Times2D: Multi-Period Decomposition and Derivative Mapping for General Time Series Forecasting | Mar 31, 2025 | energy managementTime Series | CodeCode Available | 1 |
| GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language Models | Mar 31, 2025 | Zero-Shot Learning | CodeCode Available | 1 |