SOTAVerified

Zero-shot Generalization

Papers

Showing 150 of 572 papers

TitleStatusHype
SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation0
Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation0
PoseLLM: Enhancing Language-Guided Human Pose Estimation with MLP AlignmentCode0
Go to Zero: Towards Zero-shot Motion Generation with Million-scale DataCode0
Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models0
Helping CLIP See Both the Forest and the Trees: A Decomposition and Description Approach0
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal AlignmentCode2
RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather0
WAFT: Warping-Alone Field Transforms for Optical FlowCode2
IRanker: Towards Ranking Foundation ModelCode1
TRACED: Transition-aware Regret Approximation with Co-learnability for Environment DesignCode0
VisLanding: Monocular 3D Perception for UAV Safe Landing via Depth-Normal Synergy0
LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction0
Prohibited Items Segmentation via Occlusion-aware Bilayer ModelingCode0
DEAL: Disentangling Transformer Head Activations for LLM Steering0
ZeroVO: Visual Odometry with Minimal Assumptions0
CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray0
Deep Equivariant Multi-Agent Control Barrier Functions0
Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application0
RecGPT: A Foundation Model for Sequential RecommendationCode2
Towards Vision-Language-Garment Models For Web Knowledge Garment Understanding and Generation0
Generating Synthetic Stereo Datasets using 3D Gaussian Splatting and Expert Knowledge Transfer0
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data SynthesisCode1
Language-Guided Multi-Agent Learning in Simulations: A Unified Framework and Evaluation0
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?Code1
Beyond the LUMIR challenge: The pathway to foundational registration modelsCode1
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache CompressionCode2
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers0
ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous DrivingCode1
WHISTRESS: Enriching Transcriptions with Sentence Stress Detection0
G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning0
Anchored Diffusion Language Model0
Universal Biological Sequence Reranking for Improved De Novo Peptide SequencingCode1
EasyInsert: A Data-Efficient and Generalizable Insertion Policy0
CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning0
AnyBody: A Benchmark Suite for Cross-Embodiment Manipulation0
Prompt Tuning Vision Language Models with Margin Regularizer for Few-Shot Learning under Distribution ShiftsCode0
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task GeneralizationCode2
gen2seg: Generative Models Enable Generalizable Instance Segmentation0
EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy0
A Case Study of Cross-Lingual Zero-Shot Generalization for Classical Languages in LLMsCode0
ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling0
AoP-SAM: Automation of Prompts for Efficient Segmentation0
RVTBench: A Benchmark for Visual Reasoning TasksCode0
GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge SubtractionCode0
Depth Anything with Any Prior0
NVSPolicy: Adaptive Novel-View Synthesis for Generalizable Language-Conditioned Policy Learning0
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image AnalysisCode7
Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing0
Foundation Models Knowledge Distillation For Battery Capacity Degradation ForecastCode1
Show:102550
← PrevPage 1 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GR-MGAvg. sequence length4.04Unverified
2MoDEAvg. sequence length4.01Unverified
3RoboUniViewAvg. sequence length3.65Unverified
43D Diffuser ActorAvg. sequence length3.27Unverified
5GR-1Avg. sequence length3.06Unverified