SOTAVerified

Zero-shot Generalization

Papers

Showing 5175 of 572 papers

TitleStatusHype
Bokehlicious: Photorealistic Bokeh Rendering with Controllable AperturesCode2
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot LearningCode2
SAM2MOT: A Novel Paradigm of Multi-Object Tracking by SegmentationCode2
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task GeneralizationCode2
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?Code2
HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image PriorsCode2
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language ModelCode2
Segment Any Anomaly without Training via Hybrid Prompt RegularizationCode2
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model PerformanceCode2
OpenCity: Open Spatio-Temporal Foundation Models for Traffic PredictionCode2
Multitask Prompted Training Enables Zero-Shot Task GeneralizationCode2
Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in ClutterCode2
Autoregressive Image Generation with Randomized Parallel DecodingCode2
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask LearningCode2
NeRF-Supervised Deep StereoCode2
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model DisentanglementCode2
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache CompressionCode2
Detecting Everything in the Open World: Towards Universal Object DetectionCode2
LLM+P: Empowering Large Language Models with Optimal Planning ProficiencyCode2
Learning to Route Among Specialized Experts for Zero-Shot GeneralizationCode2
Matryoshka Diffusion ModelsCode2
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTVCode2
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking AgentsCode2
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language ReasoningCode2
Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite ImageryCode2
Show:102550
← PrevPage 3 of 23Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GR-MGAvg. sequence length4.04Unverified
2MoDEAvg. sequence length4.01Unverified
3RoboUniViewAvg. sequence length3.65Unverified
43D Diffuser ActorAvg. sequence length3.27Unverified
5GR-1Avg. sequence length3.06Unverified