SOTAVerified

Benchmarking

Papers

Showing 19512000 of 5548 papers

TitleStatusHype
Challenges in Benchmarking Stream Learning Algorithms with Real-world Data0
Challenges and Pitfalls of Machine Learning Evaluation and Benchmarking0
Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition0
Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation0
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset0
Challenges and perspectives in computational deconvolution of genomics data0
CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx0
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors0
AN ELIXIR FOR BLOCKCHAIN SCALABILITY WITH CHANNEL BASED CLUSTERED SHARDING0
DACOS-A Manually Annotated Dataset of Code Smells0
DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles0
DailyQA: A Benchmark to Evaluate Web Retrieval Augmented LLMs Based on Capturing Real-World Changes0
Challenges and Advancements in Modeling Shock Fronts with Physics-Informed Neural Networks: A Review and Benchmarking Study0
Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place Recognition and Localization0
DarkBench: Benchmarking Dark Patterns in Large Language Models0
DASB -- Discrete Audio and Speech Benchmark0
Data Analysis in the Era of Generative AI0
Data and its (dis)contents: A survey of dataset development and use in machine learning research0
Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory0
Data Augmentation for Traffic Classification0
Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset0
Data-driven Approach for Static Hedging of Exchange Traded Options0
Challenge Results Are Not Reproducible0
Data-driven inventory management for new products: An adjusted Dyna-Q approach with transfer learning0
A Dataset Similarity Evaluation Framework for Wireless Communications and Sensing0
Data-driven surrogate modelling and benchmarking for process equipment0
Data-Driven Target Localization: Benchmarking Gradient Descent Using the Cramer-Rao Bound0
Benchmarking Federated Machine Unlearning methods for Tabular Data0
ChakmaNMT: A Low-resource Machine Translation On Chakma Language0
Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning0
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese0
End-to-End Neural Ranking for eCommerce Product Search: an application of task models and textual embeddings0
C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System0
CETBench: A Novel Dataset constructed via Transformations over Programs for Benchmarking LLMs for Code-Equivalence Checking0
Benchmarking and Improving Generator-Validator Consistency of Language Models0
Certifying almost all quantum states with few single-qubit measurements0
A Platform for Event Extraction in Hindi0
DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition0
DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation0
Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines0
An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks0
DDR-ID: Dual Deep Reconstruction Networks Based Image Decomposition for Anomaly Detection0
CellCycleGAN: Spatiotemporal Microscopy Image Synthesis of Cell Populations using Statistical Shape Models and Conditional GANs0
DeAR: Debiasing Vision-Language Models with Additive Residuals0
CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark0
DECASTE: Unveiling Caste Stereotypes in Large Language Models through Multi-Dimensional Bias Analysis0
An efficiency analysis of Spanish airports0
Decentralized Federated Learning on the Edge over Wireless Mesh Networks0
1-D Convlutional Neural Networks for the Analysis of Pupil Size Variations in Scotopic Conditions0
Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption0
Show:102550
← PrevPage 40 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified