Zero-shot Generalization

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 572 papers

Title	Date	Tasks	Status	Hype
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient	Nov 26, 2024	GPUImage Generation	CodeCode Available	2
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models	Nov 8, 2024	Task PlanningZero-shot Generalization	CodeCode Available	2
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities	Oct 18, 2024	Conditional Image GenerationImage Generation	CodeCode Available	2
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement	Oct 15, 2024	DisentanglementInductive Bias	CodeCode Available	2
PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage	Sep 13, 2024	Depth EstimationMonocular Depth Estimation	CodeCode Available	2
IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS	Sep 9, 2024	DenoisingSpeech Enhancement	CodeCode Available	2
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal-Conditioned Policy	Aug 26, 2024	Few-Shot LearningImage Generation	CodeCode Available	2
OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction	Aug 16, 2024	PredictionTraffic Prediction	CodeCode Available	2
HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors	Jul 26, 2024	Depth EstimationGPU	CodeCode Available	2
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation	Jul 3, 2024	Domain GeneralizationKnowledge Distillation	CodeCode Available	2
RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation	Jun 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models	Jun 18, 2024	BenchmarkingDepth Estimation	CodeCode Available	2
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models	Jun 5, 2024	Few-Shot LearningLanguage Modeling	CodeCode Available	2
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?	May 3, 2024	Computational EfficiencyPrompt Learning	CodeCode Available	2
GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis	Apr 9, 2024	Image GenerationZero-shot Generalization	CodeCode Available	2
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning	Apr 4, 2024	3D Scene ReconstructionDepth Estimation	CodeCode Available	2
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance	Apr 4, 2024	BenchmarkingImage Generation	CodeCode Available	2
Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model	Mar 17, 2024	Image RestorationZero-shot Generalization	CodeCode Available	2
RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model	Mar 12, 2024	Change DetectionZero-shot Generalization	CodeCode Available	2
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV	Mar 3, 2024	Depth EstimationMonocular Depth Estimation	CodeCode Available	2
Learning to Route Among Specialized Experts for Zero-Shot Generalization	Feb 8, 2024	parameter-efficient fine-tuningZero-shot Generalization	CodeCode Available	2
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning	Feb 4, 2024	Contact-rich ManipulationZero-shot Generalization	CodeCode Available	2
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions	Jan 24, 2024	document understandingQuestion Answering	CodeCode Available	2
Semantic Guidance Tuning for Text-To-Image Diffusion Models	Dec 26, 2023	Zero-shot Generalization	CodeCode Available	2
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation	Dec 20, 2023	Robot ManipulationZero-shot Generalization	CodeCode Available	2
Matryoshka Diffusion Models	Oct 23, 2023	Image GenerationZero-shot Generalization	CodeCode Available	2
EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task Tasks for E-commerce	Aug 14, 2023	DiversityInstruction Following	CodeCode Available	2
2nd Place Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly Detection	Jun 15, 2023	Anomaly DetectionAnomaly Localization	CodeCode Available	2
Segment Any Anomaly without Training via Hybrid Prompt Regularization	May 18, 2023	Anomaly DetectionAnomaly Localization	CodeCode Available	2
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency	Apr 22, 2023	Zero-shot Generalization	CodeCode Available	2
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents	Apr 19, 2023	Information RetrievalPassage Ranking	CodeCode Available	2
NeRF-Supervised Deep Stereo	Mar 30, 2023	NeRFNeural Rendering	CodeCode Available	2
Detecting Everything in the Open World: Towards Universal Object Detection	Mar 21, 2023	object-detectionObject Detection	CodeCode Available	2
Crosslingual Generalization through Multitask Finetuning	Nov 3, 2022	Coreference ResolutionCross-Lingual Transfer	CodeCode Available	2
VIMA: General Robot Manipulation with Multimodal Prompts	Oct 6, 2022	Imitation LearningLanguage Modelling	CodeCode Available	2
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models	Sep 15, 2022	image-classificationImage Classification	CodeCode Available	2
BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing	Jun 30, 2022	DiversityLanguage Model Evaluation	CodeCode Available	2
Multitask Prompted Training Enables Zero-Shot Task Generalization	Oct 15, 2021	BenchmarkingDecoder	CodeCode Available	2
IRanker: Towards Ranking Foundation Model	Jun 25, 2025	GSM8Kmodel	CodeCode Available	1
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis	Jun 4, 2025	Action GenerationDecision Making	CodeCode Available	1
Beyond the LUMIR challenge: The pathway to foundational registration models	May 30, 2025	Image RegistrationZero-shot Generalization	CodeCode Available	1
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?	May 30, 2025	DiagnosticMedical Image Analysis	CodeCode Available	1
ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving	May 26, 2025	Autonomous DrivingBench2Drive	CodeCode Available	1
Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing	May 23, 2025	de novo peptide sequencingReranking	CodeCode Available	1
Foundation Models Knowledge Distillation For Battery Capacity Degradation Forecast	May 13, 2025	Knowledge DistillationTime Series	CodeCode Available	1
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments	May 8, 2025	BenchmarkingPrompt Engineering	CodeCode Available	1
Towards Ball Spin and Trajectory Analysis in Table Tennis Broadcast Videos via Physically Grounded Synthetic-to-Real Transfer	Apr 28, 2025	Monocular 3D Object LocalizationSports Analytics	CodeCode Available	1
Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections	Apr 15, 2025	Anomaly DetectionAnomaly Localization	CodeCode Available	1
PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose Estimation	Apr 3, 2025	ObjectPose Estimation	CodeCode Available	1
FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images	Mar 24, 2025	3D CanonicalizationZero-shot Generalization	CodeCode Available	1

Show:10 25 50

← PrevPage 2 of 12Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GR-MG	Avg. sequence length	4.04	—	Unverified
2	MoDE	Avg. sequence length	4.01	—	Unverified
3	RoboUniView	Avg. sequence length	3.65	—	Unverified
4	3D Diffuser Actor	Avg. sequence length	3.27	—	Unverified
5	GR-1	Avg. sequence length	3.06	—	Unverified