SOTAVerified

Benchmarking

Papers

Showing 526550 of 5548 papers

TitleStatusHype
Massive-STEPS: Massive Semantic Trajectories for Understanding POI Check-ins -- Dataset and BenchmarksCode1
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health CounselingCode1
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures TranslationCode1
CriticBench: Benchmarking LLMs for Critique-Correct ReasoningCode1
An Empirical Study into Clustering of Unseen Datasets with Self-Supervised EncodersCode1
A BFS-Tree of Ranking References for Unsupervised Manifold LearningCode1
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?Code1
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable SummarizationCode1
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language ModelsCode1
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning AlgorithmsCode1
Replication in Visual Diffusion Models: A Survey and OutlookCode1
Benchmarking Graph Neural Networks on Dynamic Link PredictionCode1
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasksCode1
dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal ProcessingCode1
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative TasksCode1
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional DependenciesCode1
Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture BreedingCode1
Comprehensive benchmarking of large language models for RNA secondary structure predictionCode1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarkingCode1
CommonPower: A Framework for Safe Data-Driven Smart Grid ControlCode1
CombiBench: Benchmarking LLM Capability for Combinatorial MathematicsCode1
A Benchmarking Study of Kolmogorov-Arnold Networks on Tabular DataCode1
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity QuantificationCode1
Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban IntersectionCode1
Show:102550
← PrevPage 22 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified