SOTAVerified

Diversity

Diversity in data sampling is crucial across various use cases, including search, recommendation systems, and more. Ensuring diverse samples means capturing a wide range of variations and perspectives, which leads to more robust, unbiased, and comprehensive models. In search use cases, for instance, diversity helps avoid redundancy, ensuring that users are exposed to a broader set of relevant information rather than repeated similar results.

Papers

Showing 125 of 9051 papers

TitleStatusHype
MinerU: An Open-Source Solution for Precise Document Content ExtractionCode16
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language ModelsCode11
Depth Anything V2Code9
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait AnimationCode9
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image AnimationCode9
Is Diversity All You Need for Scalable Robotic Manipulation?Code7
FoundationStereo: Zero-Shot Stereo MatchingCode7
Flow-GRPO: Training Flow Matching Models via Online RLCode7
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation DatasetCode7
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language ModelsCode7
From Audio to Photoreal Embodiment: Synthesizing Humans in ConversationsCode7
Improving Sample Quality of Diffusion Models Using Self-Attention GuidanceCode7
PromptWizard: Task-Aware Prompt Optimization FrameworkCode7
Adaptive In-conversation Team Building for Language Model AgentsCode7
MaskSketch: Unpaired Structure-guided Masked Image GenerationCode7
Better Synthetic Data by Retrieving and Transforming Existing DatasetsCode7
Automatic Chain of Thought Prompting in Large Language ModelsCode6
MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter ExpertsCode5
GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object ManipulationCode5
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge AggregationCode5
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-TuningCode5
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive AnnotationsCode5
Fake News Detection: It's All in the Data!Code5
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity PreservingCode5
BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting ModelsCode5
Show:102550
← PrevPage 1 of 363Next →

No leaderboard results yet.