SOTAVerified

Benchmarking

Papers

Showing 18511900 of 5548 papers

TitleStatusHype
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance0
Consolidating and Developing Benchmarking Datasets for the Nepali Natural Language Understanding Tasks0
ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task0
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods0
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
ChatGPT Alternative Solutions: Large Language Models Survey0
An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets0
Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts0
Benchmarking Deep Learning Architectures for Urban Vegetation Point Cloud Semantic Segmentation from MLS0
Context-guided Triple Matching for Multiple Choice Question Answering0
Context-guided Triple Matching for Multiple Choice Question Answering0
EdgeMark: An Automation and Benchmarking System for Embedded Artificial Intelligence Tools0
Effective Transfer of Pretrained Large Visual Model for Fabric Defect Segmentation via Specifc Knowledge Injection0
Exploring the Practicality of Generative Retrieval on Dynamic Corpora0
Continuous Function Structured in Multilayer Perceptron for Global Optimization0
Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation0
Continuous-Time Gaussian Process Motion-Compensation for Event-vision Pattern Tracking with Distance Fields0
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack0
Contrastive Learning-Based Spectral Knowledge Distillation for Multi-Modality and Missing Modality Scenarios in Semantic Segmentation0
CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization0
Characterizing Transactional Databases for Frequent Itemset Mining0
Benchmarking and Validation of Sub-mW 30GHz VG-LNAs in 22nm FDSOI CMOS for 5G/6G Phased-Array Receivers0
Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices0
Characterizing the adversarial vulnerability of speech self-supervised learning0
EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments0
Characterizing Missing Information in Deep Networks Using Backpropagated Gradients0
Convolutional and Deep Learning based techniques for Time Series Ordinal Classification0
COPA: Comparing the Incomparable to Explore the Pareto Front0
An Empirical Investigation into Benchmarking Model Multiplicity for Trustworthy Machine Learning: A Case Study on Image Classification0
Characterization of Multiple 3D LiDARs for Localization and Mapping using Normal Distributions Transform0
Characterization of Constrained Continuous Multiobjective Optimization Problems: A Performance Space Perspective0
EconGym: A Scalable AI Testbed with Diverse Economic Tasks0
Characterization of Constrained Continuous Multiobjective Optimization Problems: A Feature Space Perspective0
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models0
Cornac: A Comparative Framework for Multimodal Recommender Systems0
Benchmarking and Pushing the Multi-Bias Elimination Boundary of LLMs via Causal Effect Estimation-guided Debiasing0
An Optical Frontend for a Convolutional Neural Network0
Benchmarking and Performance Modelling of MapReduce Communication Pattern0
CoSy: Evaluating Textual Explanations of Neurons0
ADCB: An Alzheimer's disease benchmark for evaluating observational estimators of causal effects0
Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies0
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts0
Coupling volume-excluding compartment-based models of diffusion at different scales: Voronoi and pseudo-compartment approaches0
Channel Attention based Iterative Residual Learning for Depth Map Super-Resolution0
Benchmarking and Optimization of Gradient Boosting Decision Tree Algorithms0
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity0
ECG-Adv-GAN: Detecting ECG Adversarial Examples with Conditional Generative Adversarial Networks0
ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph0
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey0
Show:102550
← PrevPage 38 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified