SOTAVerified

Benchmarking

Papers

Showing 30513100 of 5548 papers

TitleStatusHype
Evaluating Music Recommender Systems for Groups0
Evaluating Nuanced Bias in Large Language Model Free Response Answers0
Evaluating Robustness of LLMs on Crisis-Related Microblogs across Events, Information Types, and Linguistic Features0
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning0
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance0
Evaluating the Generation of Spatial Relations in Text and Image Generative Models0
Evaluating the Performance of Large Language Models via Debates0
Evaluating Visual Conversational Agents via Cooperative Human-AI Games0
Evaluation and Ensembling of Methods for Reverse Engineering of Brain Connectivity from Imaging Data0
Evaluation Methodology for Attacks Against Confidence Thresholding Models0
Evaluation Methods and Measures for Causal Learning Algorithms0
Evaluation of Algorithms for Multi-Modality Whole Heart Segmentation: An Open-Access Grand Challenge0
Evaluation of Architectural Synthesis Using Generative AI0
Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi0
Evaluation of Popular XAI Applied to Clinical Prediction Models: Can They be Trusted?0
Evaluation of simulation methods for tumor subclonal reconstruction0
Evaluation of Three Welsh Language POS Taggers0
EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation0
EventAid: Benchmarking Event-aided Image/Video Enhancement Algorithms with Real-captured Hybrid Dataset0
Event-based Continuous Color Video Decompression from Single Frames0
Event-based Feature Extraction Using Adaptive Selection Thresholds0
Event Camera Simulator Design for Modeling Attention-based Inference Architectures0
Eventprop training for efficient neuromorphic applications0
EvEntS ReaLM: Event Reasoning of Entity States via Language Models0
Evetac: An Event-based Optical Tactile Sensor for Robotic Manipulation0
Ev-Layout: A Large-scale Event-based Multi-modal Dataset for Indoor Layout Estimation and Tracking0
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages0
Evolutionary Multimodal Optimization: A Short Survey0
Evolving Evolutionary Algorithms using Linear Genetic Programming0
Evolving Hard Maximum Cut Instances for Quantum Approximate Optimization Algorithms0
EVOPS Benchmark: Evaluation of Plane Segmentation from RGBD and LiDAR Data0
Exact lattice-based stochastic cell culture simulation algorithms incorporating spontaneous and contact-dependent reactions0
Exact Mean Computation in Dynamic Time Warping Spaces0
EXACT: Towards a platform for empirically benchmarking Machine Learning model explanation methods0
Examining convolutional feature extraction using Maximum Entropy (ME) and Signal-to-Noise Ratio (SNR) for image classification0
Experimental Benchmarking of Energy-saving Sub-Optimal Sliding Mode Control0
Experimental robustness benchmark of quantum neural network on a superconducting quantum processor0
Experimenting with robotic intra-logistics domains0
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists0
Explainable AI using expressive Boolean formulas0
Explainable Rumor Detection using Inter and Intra-feature Attention Networks0
Explaining Unreliable Perception in Automated Driving: A Fuzzy-based Monitoring Approach0
Explicitly Multi-Modal Benchmarks for Multi-Objective Optimization0
Exploitation-Guided Exploration for Semantic Embodied Navigation0
Exploiting Adam-like Optimization Algorithms to Improve the Performance of Convolutional Neural Networks0
Exploiting Database Management Systems and Treewidth for Counting0
Exploration of TPUs for AI Applications0
Exploring and Benchmarking the Planning Capabilities of Large Language Models0
Exploring Capabilities of Time Series Foundation Models in Building Analytics0
Exploring Continual Learning of Diffusion Models0
Show:102550
← PrevPage 62 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified