SOTAVerified

Zero-shot Generalization

Papers

Showing 51100 of 572 papers

TitleStatusHype
Visual Image Reconstruction from Brain Activity via Latent Representation0
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence0
Learning Graph Representation of Agent DiffusersCode0
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization0
TeDA: Boosting Vision-Lanuage Models for Zero-Shot 3D Object Retrieval via Testing-time Distribution AlignmentCode0
Towards Ball Spin and Trajectory Analysis in Table Tennis Broadcast Videos via Physically Grounded Synthetic-to-Real TransferCode1
A Review of 3D Object Detection with Vision-Language Models0
Text-to-Decision Agent: Learning Generalist Policies from Natural Language Supervision0
Dysarthria Normalization via Local Lie Group Transformations for Robust ASRCode0
Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly DetectionsCode1
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation ModelsCode4
Detect Anything 3D in the WildCode3
SAM2MOT: A Novel Paradigm of Multi-Object Tracking by SegmentationCode2
Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite ImageryCode2
PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose EstimationCode1
Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models0
Zero-shot Domain Generalization of Foundational Models for 3D Medical Image Segmentation: An Experimental Study0
Q-Insight: Understanding Image Quality via Visual Reinforcement LearningCode2
Thinking agents for zero-shot generalization to qualitatively novel tasks0
Unpaired Object-Level SAR-to-Optical Image Translation for Aircraft with Keypoints-Guided Diffusion Models0
FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few ImagesCode1
Aether: Geometric-Aware Unified World Modeling0
Equivariant Image ModelingCode1
Bokehlicious: Photorealistic Bokeh Rendering with Controllable AperturesCode2
Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation0
Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance0
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video UnderstandingCode1
GenM^3: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation0
Learning with Expert Abstractions for Efficient Multi-Task Continuous ControlCode0
Good Actions Succeed, Bad Actions Generalize: A Case Study on Why RL Generalizes Better0
Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach0
Compound Expression Recognition via Large Vision-Language Models0
Autoregressive Image Generation with Randomized Parallel DecodingCode2
Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in ClutterCode2
Prompt-OT: An Optimal Transport Regularization Paradigm for Knowledge Preservation in Vision-Language Model AdaptationCode0
A Recipe for Improving Remote Sensing VLM Zero Shot Generalization0
PE3R: Perception-Efficient 3D ReconstructionCode3
PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM0
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive ReinforcementCode4
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language ModelCode2
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction0
RAILGUN: A Unified Convolutional Policy for Multi-Agent Path Finding Across Different Environments and Tasks0
Nature-Inspired Population-Based Evolution of Large Language ModelsCode1
Re-Imagining Multimodal Instruction Tuning: A Representation ViewCode0
Delving into Out-of-Distribution Detection with Medical Vision-Language ModelsCode1
Contrastive Learning of English Language and Crystal Graphs for Multimodal Representation of Materials Knowledge0
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models0
GeLLMO: Generalizing Large Language Models for Multi-property Molecule OptimizationCode0
WRT-SAM: Foundation Model-Driven Segmentation for Generalized Weld Radiographic Testing0
Show:102550
← PrevPage 2 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GR-MGAvg. sequence length4.04Unverified
2MoDEAvg. sequence length4.01Unverified
3RoboUniViewAvg. sequence length3.65Unverified
43D Diffuser ActorAvg. sequence length3.27Unverified
5GR-1Avg. sequence length3.06Unverified