| SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation | Jul 16, 2025 | Boundary DetectionPseudo Label | —Unverified | 0 |
| Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation | Jul 15, 2025 | 3D ReconstructionAutonomous Driving | —Unverified | 0 |
| PoseLLM: Enhancing Language-Guided Human Pose Estimation with MLP Alignment | Jul 12, 2025 | Large Language ModelPose Estimation | CodeCode Available | 0 |
| Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data | Jul 9, 2025 | Motion GenerationZero-shot Generalization | CodeCode Available | 0 |
| Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models | Jul 8, 2025 | Future predictionLarge Language Model | —Unverified | 0 |
| Helping CLIP See Both the Forest and the Trees: A Decomposition and Description Approach | Jul 4, 2025 | AttributeContrastive Learning | —Unverified | 0 |
| DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment | Jul 3, 2025 | cross-modal alignmentInstruction Following | CodeCode Available | 2 |
| RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather | Jul 2, 2025 | DenoisingDepth Estimation | —Unverified | 0 |
| WAFT: Warping-Alone Field Transforms for Optical Flow | Jun 26, 2025 | Optical Flow EstimationZero-shot Generalization | CodeCode Available | 2 |
| IRanker: Towards Ranking Foundation Model | Jun 25, 2025 | GSM8Kmodel | CodeCode Available | 1 |