SOTAVerified

Zero-shot Generalization

Papers

Showing 151200 of 572 papers

TitleStatusHype
S^3: Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models0
Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono FailCode3
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance0
UTSD: Unified Time Series Diffusion Model0
The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control0
COMPrompter: reconceptualized segment anything model with multiprompt network for camouflaged object detectionCode1
Collaborative Decoding Makes Visual Auto-Regressive Modeling EfficientCode2
vesselFM: A Foundation Model for Universal 3D Blood Vessel SegmentationCode2
Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis0
Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models0
Generating Out-Of-Distribution Scenarios Using Language Models0
Context-Aware Multimodal Pretraining0
SAM Carries the Burden: A Semi-Supervised Approach Refining Pseudo Labels for Medical SegmentationCode0
HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments0
Scalable Autoregressive Monocular Depth Estimation0
MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language ModelsCode0
Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos0
Mono2Stereo: Monocular Knowledge Transfer for Enhanced Stereo Matching0
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language ModelsCode2
In the Era of Prompt Learning with Vision-Language Models0
Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting DiversityCode0
Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuliCode0
ZIM: Zero-Shot Image Matting for AnythingCode3
JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking0
Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning0
Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility PredictionCode1
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images0
Random Policy Enables In-Context Reinforcement Learning within Trust Horizons0
Adversarial Environment Design via Regret-Guided Diffusion Models0
BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning0
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias0
DEL-Ranking: Ranking-Correction Denoising Framework for Elucidating Molecular Affinities in DNA-Encoded Libraries0
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation CapabilitiesCode2
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model DisentanglementCode2
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge TransferCode0
Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels0
On the Evaluation of Generative Robotic Simulations0
RDT-1B: a Diffusion Foundation Model for Bimanual ManipulationCode5
Zero-Shot Generalization of Vision-Based RL Without Data Augmentation0
Zero-Shot Fact Verification via Natural Logic and Large Language ModelsCode0
What Matters for Model Merging at Scale?0
Cross-Embodiment Dexterous Grasping with Reinforcement Learning0
Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations0
MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generationCode0
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense PredictionCode4
A novel open-source ultrasound dataset with deep learning benchmarks for spinal cord injury localization and anatomical segmentationCode0
From Goal-Conditioned to Language-Conditioned Agents via Vision-Language Models0
M^2PT: Multimodal Prompt Tuning for Zero-shot Instruction LearningCode1
Deep Generative Adversarial Network for Occlusion Removal from a Single Image0
Deep Learning based Optical Image Super-Resolution via Generative Diffusion Models for Layerwise in-situ LPBF Monitoring0
Show:102550
← PrevPage 4 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GR-MGAvg. sequence length4.04Unverified
2MoDEAvg. sequence length4.01Unverified
3RoboUniViewAvg. sequence length3.65Unverified
43D Diffuser ActorAvg. sequence length3.27Unverified
5GR-1Avg. sequence length3.06Unverified