Zero-shot Generalization

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 572 papers

Title	Date	Tasks	Status	Hype
Visually Descriptive Language Model for Vector Graphics Reasoning	Apr 9, 2024	DescriptiveLanguage Modeling	CodeCode Available	9
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction	Apr 3, 2024	Image GenerationImage Reconstruction	CodeCode Available	9
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis	May 14, 2025	DenoisingDepth Estimation	CodeCode Available	7
FoundationStereo: Zero-Shot Stereo Matching	Jan 17, 2025	Depth EstimationDiversity	CodeCode Available	7
Large Concept Models: Language Modeling in a Sentence Representation Space	Dec 11, 2024	Language ModelingLanguage Modelling	CodeCode Available	7
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation	Mar 22, 2024	Depth EstimationSurface Normal Estimation	CodeCode Available	7
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation	Oct 10, 2024	Zero-shot Generalization	CodeCode Available	5
Segment Anything for Videos: A Systematic Survey	Jul 31, 2024	Image SegmentationRobot Manipulation Generalization	CodeCode Available	5
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs	Jul 31, 2023	Trajectory PlanningZero-shot Generalization	CodeCode Available	5
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth	Feb 23, 2023	Depth EstimationMonocular Depth Estimation	CodeCode Available	5
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models	Apr 15, 2025	Humanoid ControlReinforcement Learning (RL)	CodeCode Available	4
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement	Mar 9, 2025	Domain GeneralizationObject Detection	CodeCode Available	4
MonSter: Marry Monodepth to Stereo Unleashes Power	Jan 15, 2025	Depth EstimationMonocular Depth Estimation	CodeCode Available	4
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction	Sep 26, 2024	3D ReconstructionDenoising	CodeCode Available	4
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation	Dec 4, 2023	Depth EstimationGPU	CodeCode Available	4
Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image	Jul 20, 2023	Depth EstimationImage Reconstruction	CodeCode Available	4
Zero-1-to-3: Zero-shot One Image to 3D Object	Mar 20, 2023	3D ReconstructionImage to 3D	CodeCode Available	4
Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers	Jul 14, 2022	RetrievalText Retrieval	CodeCode Available	4
Detect Anything 3D in the Wild	Apr 10, 2025	3D Object DetectionAutonomous Driving	CodeCode Available	3
PE3R: Perception-Efficient 3D Reconstruction	Mar 10, 2025	3D ReconstructionZero-shot Generalization	CodeCode Available	3
DEFOM-Stereo: Depth Foundation Model Based Stereo Matching	Jan 16, 2025	Depth EstimationDisparity Estimation	CodeCode Available	3
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera	Jan 5, 2025	Data AugmentationDepth Estimation	CodeCode Available	3
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up	Dec 20, 2024	8kGPU	CodeCode Available	3
Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail	Dec 5, 2024	Stereo MatchingZero-shot Generalization	CodeCode Available	3
ZIM: Zero-Shot Image Matting for Anything	Nov 1, 2024	Image InpaintingImage Matting	CodeCode Available	3
RobustSAM: Segment Anything Robustly on Degraded Images	Jun 13, 2024	DeblurringImage Dehazing	CodeCode Available	3
SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction	May 24, 2024	Autonomous DrivingMotion Generation	CodeCode Available	3
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts	May 2, 2024	Combinatorial OptimizationMixture-of-Experts	CodeCode Available	3
IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus	Feb 22, 2024	Zero-shot Generalization	CodeCode Available	3
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations	Feb 18, 2024	DenoisingRobot Manipulation	CodeCode Available	3
General Object Foundation Model for Images and Videos at Scale	Dec 14, 2023	Instance SegmentationLong-tail Video Object Segmentation	CodeCode Available	3
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting	Oct 12, 2023	DecoderProbabilistic Time Series Forecasting	CodeCode Available	3
Separate Anything You Describe	Aug 9, 2023	Audio Source SeparationNatural Language Queries	CodeCode Available	3
Objaverse-XL: A Universe of 10M+ 3D Objects	Jul 11, 2023	DiversityNovel View Synthesis	CodeCode Available	3
What Language Model to Train if You Have One Million GPU Hours?	Oct 27, 2022	GPULanguage Modeling	CodeCode Available	3
Expanding Language-Image Pretrained Models for General Video Recognition	Aug 4, 2022	Action ClassificationAction Recognition	CodeCode Available	3
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment	Jul 3, 2025	cross-modal alignmentInstruction Following	CodeCode Available	2
WAFT: Warping-Alone Field Transforms for Optical Flow	Jun 26, 2025	Optical Flow EstimationZero-shot Generalization	CodeCode Available	2
RecGPT: A Foundation Model for Sequential Recommendation	Jun 6, 2025	Decodermodel	CodeCode Available	2
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression	May 26, 2025	Zero-shot Generalization	CodeCode Available	2
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization	May 21, 2025	Vision-Language-ActionZero-shot Generalization	CodeCode Available	2
SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation	Apr 6, 2025	Multi-Object TrackingObject	CodeCode Available	2
Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite Imagery	Apr 3, 2025	Field Boundary DelineationInstance Segmentation	CodeCode Available	2
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning	Mar 28, 2025	DescriptiveImage Quality Assessment	CodeCode Available	2
Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures	Mar 20, 2025	DeblurringZero-shot Generalization	CodeCode Available	2
Autoregressive Image Generation with Randomized Parallel Decoding	Mar 13, 2025	Conditional Image GenerationImage Generation	CodeCode Available	2
Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter	Mar 12, 2025	Zero-shot Generalization	CodeCode Available	2
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model	Mar 8, 2025	Image Quality AssessmentLanguage Modeling	CodeCode Available	2
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning	Dec 17, 2024	Denoising	CodeCode Available	2
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient	Nov 26, 2024	GPUImage Generation	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 12Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GR-MG	Avg. sequence length	4.04	—	Unverified
2	MoDE	Avg. sequence length	4.01	—	Unverified
3	RoboUniView	Avg. sequence length	3.65	—	Unverified
4	3D Diffuser Actor	Avg. sequence length	3.27	—	Unverified
5	GR-1	Avg. sequence length	3.06	—	Unverified