SOTAVerified

Zero-shot Generalization

Papers

Showing 201250 of 572 papers

TitleStatusHype
IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition0
ScaleFlow++: Robust and Accurate Estimation of 3D Motion from VideoCode1
Benchmarking VLMs' Reasoning About Persuasive Atypical Images0
PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion PreimageCode2
AnySkin: Plug-and-play Skin Sensing for Robotic Touch0
IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTSCode2
TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs0
Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion GuidanceCode1
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal-Conditioned PolicyCode2
Segment Anything Model for Grain Characterization in Hard Drive Design0
Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment0
Generalizable Facial Expression RecognitionCode1
Zero-Shot Object-Centric Representation Learning0
OpenCity: Open Spatio-Temporal Foundation Models for Traffic PredictionCode2
One Shot is Enough for Sequential Infrared Small Target SegmentationCode0
Performance and Non-adversarial Robustness of the Segment Anything Model 2 in Surgical Video Segmentation0
Visual Grounding for Object-Level Generalization in Reinforcement LearningCode1
HeteroMorpheus: Universal Control Based on Morphological Heterogeneity ModelingCode0
Segment Anything for Videos: A Systematic SurveyCode5
HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image PriorsCode2
HDL-GPT: High-Quality HDL is All You Need0
SSTD: Stripe-Like Space Target Detection Using Single-Point Weak Supervision0
Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language ModelsCode1
OpenSU3D: Open World 3D Scene Understanding using Foundation Models0
BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models0
Disentangling Representations through Multi-task Learning0
ScaleFlow++: Robust and Accurate Estimation of 3D Motion from VideoCode1
Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting0
Real-Time Anomaly Detection and Reactive Planning with Large Language Models0
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization0
Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object SearchCode0
Unified Embedding Alignment for Open-Vocabulary Video Instance SegmentationCode1
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge DistillationCode2
Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval0
A Two-stage Reinforcement Learning-based Approach for Multi-entity Task AllocationCode1
RoboUniView: Visual-Language Model with Unified View Representation for Robotic ManipulationCode2
NeuralSCF: Neural network self-consistent fields for density functional theory0
GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation ModelsCode2
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers0
Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and GranularityCode0
RobustSAM: Segment Anything Robustly on Degraded ImagesCode3
Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction TuningCode0
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language ModelsCode2
Prompt-based Visual Alignment for Zero-shot Policy Transfer0
GOMAA-Geo: GOal Modality Agnostic Active Geo-localizationCode1
OLIVE: Object Level In-Context Visual EmbeddingsCode0
μLO: Compute-Efficient Meta-Generalization of Learned OptimizersCode1
Text-only Synthesis for Image Captioning0
TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability0
Benchmarking General-Purpose In-Context Learning0
Show:102550
← PrevPage 5 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GR-MGAvg. sequence length4.04Unverified
2MoDEAvg. sequence length4.01Unverified
3RoboUniViewAvg. sequence length3.65Unverified
43D Diffuser ActorAvg. sequence length3.27Unverified
5GR-1Avg. sequence length3.06Unverified