SOTAVerified

Benchmarking

Papers

Showing 30013050 of 5548 papers

TitleStatusHype
Benchmarking the Gerchberg-Saxton Algorithm0
Benchmarking the Fidelity and Utility of Synthetic Relational Data0
Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web0
ImageNet performance correlates with pose estimation robustness and generalization on out-of-domain data0
ImagePairs: Realistic Super Resolution Dataset via Beam Splitter Camera Rig0
Imagining and building wise machines: The centrality of AI metacognition0
Benchmarking the Effectiveness of Classification Algorithms and SVM Kernels for Dry Beans0
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World0
Imitation Learning Datasets: A Toolkit For Creating Datasets, Training Agents and Benchmarking0
Imitation Learning from Pixel Observations for Continuous Control0
Practical Guidelines for Cell Segmentation Models Under Optical Aberrations in Microscopy0
A Functional Analysis Approach to Symbolic Regression0
Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors0
A Framework for Large Scale Synthetic Graph Dataset Generation0
Dataset Properties Shape the Success of Neuroimaging-Based Patient Stratification: A Benchmarking Analysis Across Clustering Algorithms0
A Framework for Evaluating Predictive Models Using Synthetic Image Covariates and Longitudinal Data0
Impact of spatial transformations on landscape features of CEC2022 basic benchmark problems0
Implementing and Benchmarking the Locally Competitive Algorithm on the Loihi 2 Neuromorphic Processor0
Implementing hosting capacity analysis in distribution networks: Practical considerations, advancements and future directions0
Benchmarking the Benchmark -- Analysis of Synthetic NIDS Datasets0
Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities0
Benchmarking the Accuracy and Robustness of Feedback Alignment Algorithms0
Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels0
The Moral Mind(s) of Large Language Models0
Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices0
Ward: Provable RAG Dataset Inference via LLM Watermarks0
The Multi-speaker Multi-style Voice Cloning Challenge 20210
PAWS-VMK: A Unified Approach To Semi-Supervised Learning And Out-of-Distribution Detection0
Improved statistical benchmarking of digital pathology models using pairwise frames evaluation0
The Neural Painter: Multi-Turn Image Generation0
Improved YOLOv12 with LLM-Generated Synthetic Data for Enhanced Apple Detection and Benchmarking Against YOLOv11 and YOLOv100
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects0
A 28-nm Convolutional Neuromorphic Processor Enabling Online Learning with Spike-Based Retinas0
Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests0
Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation0
Benchmarking terminology building capabilities of ChatGPT on an English-Russian Fashion Corpus0
Improving Augmentation and Evaluation Schemes for Semantic Image Synthesis0
Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary0
Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model0
The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods0
Improving Items and Contexts Understanding with Descriptive Graph for Conversational Recommendation0
Improving Medical Image Classification with Label Noise Using Dual-uncertainty Estimation0
Improving Model Generalization: A Chinese Named Entity Recognition Case Study0
Improving Named Entity Linking Corpora Quality0
Improving plant disease classification by adaptive minimal ensembling0
The Paradox of Success in Evolutionary and Bioinspired Optimization: Revisiting Critical Issues, Key Studies, and Methodological Pathways0
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards0
Improving seasonal forecast using probabilistic deep learning0
The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering0
Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework0
Show:102550
← PrevPage 61 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified