SOTAVerified

Benchmarking

Papers

Showing 19511975 of 5548 papers

TitleStatusHype
Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning0
Benchmarking Ethical and Safety Risks of Healthcare LLMs in China-Toward Systemic Governance under Healthy China 20300
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese0
Emerging Approaches for THz Array Imaging: A Tutorial Review and Software Tool0
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset0
C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System0
CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx0
CETBench: A Novel Dataset constructed via Transformations over Programs for Benchmarking LLMs for Code-Equivalence Checking0
Benchmarking and Improving Generator-Validator Consistency of Language Models0
DACOS-A Manually Annotated Dataset of Code Smells0
DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles0
DailyQA: A Benchmark to Evaluate Web Retrieval Augmented LLMs Based on Capturing Real-World Changes0
Certifying almost all quantum states with few single-qubit measurements0
Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place Recognition and Localization0
DarkBench: Benchmarking Dark Patterns in Large Language Models0
DASB -- Discrete Audio and Speech Benchmark0
Data Analysis in the Era of Generative AI0
Data and its (dis)contents: A survey of dataset development and use in machine learning research0
Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory0
Data Augmentation for Traffic Classification0
Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset0
Data-driven Approach for Static Hedging of Exchange Traded Options0
Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines0
Data-driven inventory management for new products: An adjusted Dyna-Q approach with transfer learning0
An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks0
Show:102550
← PrevPage 79 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified