SOTAVerified

Benchmarking

Papers

Showing 32513300 of 5548 papers

TitleStatusHype
LAMBDA: Covering the Solution Set of Black-Box Inequality by Search Space Quantization0
Landscape-Aware Automated Algorithm Configuration using Multi-output Mixed Regression and Classification0
LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions0
Time Sensitive Knowledge Editing through Efficient Finetuning0
Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance0
Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance0
Benchmarking of Transformer-Based Pre-Trained Models on Social Media Text Classification Datasets0
Language Models for Automated Classification of Brain MRI Reports and Growth Chart Generation0
Can LLMs Capture Human Preferences?0
Adversarial Reinforcement Learning Framework for Benchmarking Collision Avoidance Mechanisms in Autonomous Vehicles0
TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs0
Time to Embrace Natural Language Processing (NLP)-based Digital Pathology: Benchmarking NLP- and Convolutional Neural Network-based Deep Learning Pipelines0
Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning0
Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices0
Benchmarking of LLM Detection: Comparing Two Competing Approaches0
Large Language Models are Null-Shot Learners0
Large Language Models are Few-Shot Clinical Information Extractors0
Large Language Models as Automated Aligners for benchmarking Vision-Language Models0
Benchmarking of Lightweight Deep Learning Architectures for Skin Cancer Classification using ISIC 2017 Dataset0
Adversarially Training for Audio Classifiers0
Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens0
Benchmarking of GPU-optimized Quantum-Inspired Evolutionary Optimization Algorithm using Functional Analysis0
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level0
Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding0
Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models0
Large-scale Benchmarking of Metaphor-based Optimization Heuristics0
Benchmarking off-the-shelf statistical shape modeling tools in clinical applications0
Benchmarking Off-The-Shelf Solutions to Robotic Assembly Tasks0
Large-Scale Quantum Separability Through a Reproducible Machine Learning Lens0
Timing Excess Returns A cross-universe approach to alpha0
Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics0
Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce Order Fraud Evaluation0
Latent Variable Models for Visual Question Answering0
TinyML Platforms Benchmarking0
LAVIS: A Library for Language-Vision Intelligence0
Benchmarking of English-Hindi parallel corpora0
Benchmarking of eight recurrent neural network variants for breath phase and adventitious sound detection on a self-developed open-access lung sound database-HF_Lung_V10
LayoutXLM vs. GNN: An Empirical Evaluation of Relation Extraction for Documents0
Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification0
LCFO: Long Context and Long Form Output Dataset and Benchmarking0
Benchmarking of Deep Learning models on 2D Laminar Flow behind Cylinder0
LEAF: A Benchmark for Federated Settings0
Leaf Segmentation and Counting with Deep Learning: on Model Certainty, Test-Time Augmentation, Trade-Offs0
Labelling Vertebrae with 2D Reformations of Multidetector CT Images: An Adversarial Approach for Incorporating Prior Knowledge of Spine Anatomy0
Adversarial Learning for Supervised and Semi-supervised Relation Extraction in Biomedical Literature0
Title2Event: Benchmarking Open Event Extraction with a Large-scale Chinese Title Dataset0
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking0
Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning0
Learning a CNN-based End-to-End Controller for a Formula SAE Racecar0
tmVar 3.0: an improved variant concept recognition and normalization tool0
Show:102550
← PrevPage 66 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified