Diversity

Diversity in data sampling is crucial across various use cases, including search, recommendation systems, and more. Ensuring diverse samples means capturing a wide range of variations and perspectives, which leads to more robust, unbiased, and comprehensive models. In search use cases, for instance, diversity helps avoid redundancy, ensuring that users are exposed to a broader set of relevant information rather than repeated similar results.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 9051 papers

Title	Date	Tasks	Status	Hype
MinerU: An Open-Source Solution for Precise Document Content Extraction	Sep 27, 2024	DiversityOptical Character Recognition (OCR)	CodeCode Available	16
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models	Feb 25, 2025	DiversityLanguage Modeling	CodeCode Available	11
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation	Mar 26, 2024	DiversityFace Reenactment	CodeCode Available	9
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation	Jun 13, 2024	DiversityImage Animation	CodeCode Available	9
Depth Anything V2	Jun 13, 2024	Depth EstimationDiversity	CodeCode Available	9
Is Diversity All You Need for Scalable Robotic Manipulation?	Jul 8, 2025	AllDiversity	CodeCode Available	7
Adaptive In-conversation Team Building for Language Model Agents	May 29, 2024	DiversityLanguage Modeling	CodeCode Available	7
PromptWizard: Task-Aware Prompt Optimization Framework	May 28, 2024	Computational EfficiencyDiversity	CodeCode Available	7
Flow-GRPO: Training Flow Matching Models via Online RL	May 8, 2025	DenoisingDiversity	CodeCode Available	7
Improving Sample Quality of Diffusion Models Using Self-Attention Guidance	Oct 3, 2022	DenoisingDiversity	CodeCode Available	7
MaskSketch: Unpaired Structure-guided Masked Image Generation	Feb 10, 2023	Conditional Image GenerationDiversity	CodeCode Available	7
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models	Feb 8, 2024	BenchmarkingDiversity	CodeCode Available	7
Better Synthetic Data by Retrieving and Transforming Existing Datasets	Apr 22, 2024	Dataset GenerationDiversity	CodeCode Available	7
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset	Sep 21, 2023	ChatbotDiversity	CodeCode Available	7
FoundationStereo: Zero-Shot Stereo Matching	Jan 17, 2025	Depth EstimationDiversity	CodeCode Available	7
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations	Jan 3, 2024	DiversityQuantization	CodeCode Available	7
Automatic Chain of Thought Prompting in Large Language Models	Oct 7, 2022	DiversityQuestion Answering	CodeCode Available	6
BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting Models	May 23, 2025	DiversityTime Series	CodeCode Available	5
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving	Apr 25, 2024	Diversity	CodeCode Available	5
Fake News Detection: It's All in the Data!	Jul 2, 2024	AllDiversity	CodeCode Available	5
R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models	Oct 23, 2024	Diversity	CodeCode Available	5
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos	Sep 3, 2024	Depth EstimationDiversity	CodeCode Available	5
MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts	Apr 13, 2024	DiversityLanguage Modeling	CodeCode Available	5
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation	Jun 25, 2024	DiversityNatural Language Understanding	CodeCode Available	5
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations	Dec 10, 2024	AttributeBenchmarking	CodeCode Available	5
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning	Jun 2, 2025	AI AgentDiversity	CodeCode Available	5
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark	Jul 16, 2024	DiversitySpeaker Identification	CodeCode Available	5
GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation	Nov 27, 2024	Depth EstimationDiversity	CodeCode Available	5
Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration	Oct 3, 2024	DiversityLanguage Modeling	CodeCode Available	4
A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL	Nov 13, 2024	DiversityIn-Context Learning	CodeCode Available	4
GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images	Sep 22, 2022	Diversity	CodeCode Available	4
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image	Mar 18, 2024	3D geometry3D Reconstruction	CodeCode Available	4
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction	May 27, 2024	3D Semantic Occupancy PredictionAutonomous Driving	CodeCode Available	4
ActionStudio: A Lightweight Framework for Data and Training of Large Action Models	Mar 28, 2025	Diversity	CodeCode Available	4
SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis	May 22, 2025	DiversityInformation Retrieval	CodeCode Available	4
Quality-aware Masked Diffusion Transformer for Enhanced Music Generation	May 24, 2024	DiversityMusic Generation	CodeCode Available	4
Efficient Part-level 3D Object Generation via Dual Volume Packing	Jun 11, 2025	DiversityObject	CodeCode Available	4
Expressive Whole-Body 3D Gaussian Avatar	Jul 31, 2024	3DGSDiversity	CodeCode Available	4
AlphaFold Meets Flow Matching for Generating Protein Ensembles	Feb 7, 2024	Diversity	CodeCode Available	4
A New Formulation of Lipschitz Constrained With Functional Gradient Learning for GANs	Jan 20, 2025	DiversityImage Generation	CodeCode Available	4
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations	May 23, 2023	Diversity	CodeCode Available	4
3D Scene Generation: A Survey	May 8, 2025	Autonomous DrivingDiversity	CodeCode Available	4
Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator	Feb 26, 2025	Depth EstimationDiversity	CodeCode Available	4
Improving Text Embeddings with Large Language Models	Dec 31, 2023	DecoderDiversity	CodeCode Available	3
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning	Jan 12, 2024	Diversitydocument understanding	CodeCode Available	3
LongAlign: A Recipe for Long Context Alignment of Large Language Models	Jan 31, 2024	DiversityInstruction Following	CodeCode Available	3
Improved motif-scaffolding with SE(3) flow matching	Jan 8, 2024	Data AugmentationDiversity	CodeCode Available	3
Hierarchical Text-Conditional Image Generation with CLIP Latents	Apr 13, 2022	Conditional Image GenerationDecoder	CodeCode Available	3
Improving Model Evaluation using SMART Filtering of Benchmark Datasets	Oct 26, 2024	ChatbotDiversity	CodeCode Available	3
GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping	May 27, 2024	Depth EstimationDiversity	CodeCode Available	3

Show:10 25 50

← PrevPage 1 of 182Next →

No leaderboard results yet.