SOTAVerified

Zero-shot Generalization

Papers

Showing 51100 of 572 papers

TitleStatusHype
Collaborative Decoding Makes Visual Auto-Regressive Modeling EfficientCode2
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language ModelsCode2
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation CapabilitiesCode2
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model DisentanglementCode2
PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion PreimageCode2
IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTSCode2
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal-Conditioned PolicyCode2
OpenCity: Open Spatio-Temporal Foundation Models for Traffic PredictionCode2
HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image PriorsCode2
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge DistillationCode2
RoboUniView: Visual-Language Model with Unified View Representation for Robotic ManipulationCode2
GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation ModelsCode2
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language ModelsCode2
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?Code2
GeoSynth: Contextually-Aware High-Resolution Satellite Image SynthesisCode2
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language ReasoningCode2
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model PerformanceCode2
Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion ModelCode2
RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation ModelCode2
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTVCode2
Learning to Route Among Specialized Experts for Zero-Shot GeneralizationCode2
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot LearningCode2
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with InstructionsCode2
Semantic Guidance Tuning for Text-To-Image Diffusion ModelsCode2
Unleashing Large-Scale Video Generative Pre-training for Visual Robot ManipulationCode2
Matryoshka Diffusion ModelsCode2
EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task Tasks for E-commerceCode2
2nd Place Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly DetectionCode2
Segment Any Anomaly without Training via Hybrid Prompt RegularizationCode2
LLM+P: Empowering Large Language Models with Optimal Planning ProficiencyCode2
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking AgentsCode2
NeRF-Supervised Deep StereoCode2
Detecting Everything in the Open World: Towards Universal Object DetectionCode2
Crosslingual Generalization through Multitask FinetuningCode2
VIMA: General Robot Manipulation with Multimodal PromptsCode2
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language ModelsCode2
BigBIO: A Framework for Data-Centric Biomedical Natural Language ProcessingCode2
Multitask Prompted Training Enables Zero-Shot Task GeneralizationCode2
IRanker: Towards Ranking Foundation ModelCode1
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data SynthesisCode1
Beyond the LUMIR challenge: The pathway to foundational registration modelsCode1
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?Code1
ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous DrivingCode1
Universal Biological Sequence Reranking for Improved De Novo Peptide SequencingCode1
Foundation Models Knowledge Distillation For Battery Capacity Degradation ForecastCode1
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Towards Ball Spin and Trajectory Analysis in Table Tennis Broadcast Videos via Physically Grounded Synthetic-to-Real TransferCode1
Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly DetectionsCode1
PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose EstimationCode1
FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few ImagesCode1
Show:102550
← PrevPage 2 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GR-MGAvg. sequence length4.04Unverified
2MoDEAvg. sequence length4.01Unverified
3RoboUniViewAvg. sequence length3.65Unverified
43D Diffuser ActorAvg. sequence length3.27Unverified
5GR-1Avg. sequence length3.06Unverified