SOTAVerified

Zero-shot Generalization

Papers

Showing 125 of 572 papers

TitleStatusHype
SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation0
Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation0
PoseLLM: Enhancing Language-Guided Human Pose Estimation with MLP AlignmentCode0
Go to Zero: Towards Zero-shot Motion Generation with Million-scale DataCode0
Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models0
Helping CLIP See Both the Forest and the Trees: A Decomposition and Description Approach0
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal AlignmentCode2
RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather0
WAFT: Warping-Alone Field Transforms for Optical FlowCode2
IRanker: Towards Ranking Foundation ModelCode1
TRACED: Transition-aware Regret Approximation with Co-learnability for Environment DesignCode0
VisLanding: Monocular 3D Perception for UAV Safe Landing via Depth-Normal Synergy0
LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction0
Prohibited Items Segmentation via Occlusion-aware Bilayer ModelingCode0
DEAL: Disentangling Transformer Head Activations for LLM Steering0
Deep Equivariant Multi-Agent Control Barrier Functions0
CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray0
ZeroVO: Visual Odometry with Minimal Assumptions0
Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application0
RecGPT: A Foundation Model for Sequential RecommendationCode2
Towards Vision-Language-Garment Models For Web Knowledge Garment Understanding and Generation0
Generating Synthetic Stereo Datasets using 3D Gaussian Splatting and Expert Knowledge Transfer0
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data SynthesisCode1
Language-Guided Multi-Agent Learning in Simulations: A Unified Framework and Evaluation0
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?Code1
Show:102550
← PrevPage 1 of 23Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GR-MGAvg. sequence length4.04Unverified
2MoDEAvg. sequence length4.01Unverified
3RoboUniViewAvg. sequence length3.65Unverified
43D Diffuser ActorAvg. sequence length3.27Unverified
5GR-1Avg. sequence length3.06Unverified