SOTAVerified

Benchmarking

Papers

Showing 31513200 of 5548 papers

TitleStatusHype
FER-C: Benchmarking Out-of-Distribution Soft Calibration for Facial Expression Recognition0
FETCH: A Memory-Efficient Replay Approach for Continual Learning in Image Classification0
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding0
Few-Shot Defect Segmentation Leveraging Abundant Normal Training Samples Through Normal Background Regularization and Crop-and-Paste Operation0
Few-Shot Learning for Industrial Time Series: A Comparative Analysis Using the Example of Screw-Fastening Process Monitoring0
Fiber Bundle Morphisms as a Framework for Modeling Many-to-Many Maps0
E(3)-equivariant models cannot learn chirality: Field-based molecular generation0
Filter Methods for Feature Selection in Supervised Machine Learning Applications -- Review and Benchmark0
Finance Language Model Evaluation (FLaME)0
Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging0
Findings of the Shared Task on Offensive Language Identification in Tamil, Malayalam, and Kannada0
Fine-Grained Classification of Pedestrians in Video: Benchmark and State of the Art0
FineText: Text Classification via Attention-based Language Model Fine-tuning0
Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency0
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets0
FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets0
FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance0
FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking0
FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures0
FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization0
FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems0
FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning0
FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents0
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models0
FlowMind: Automatic Workflow Generation with LLMs0
Fluorescent Neuronal Cells v2: Multi-Task, Multi-Format Annotations for Deep Learning in Microscopy0
FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks0
uto\!L: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks0
ForamViT-GAN: Exploring New Paradigms in Deep Learning for Micropaleontological Image Analysis0
Forecasting Lithium-Ion Battery Longevity with Limited Data Availability: Benchmarking Different Machine Learning Algorithms0
Forecasting NIFTY 50 benchmark Index using Seasonal ARIMA time series models0
FOR-instance: a UAV laser scanning benchmark dataset for semantic and instance segmentation of individual trees0
FORLAPS: An Innovative Data-Driven Reinforcement Learning Approach for Prescriptive Process Monitoring0
Formal Covariate Benchmarking to Bound Omitted Variable Bias0
FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents0
Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization0
Foundations for learning from noisy quantum experiments0
Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate0
FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting0
Framework and Benchmarks for Combinatorial and Mixed-variable Bayesian Optimization0
FRED: The Florence RGB-Event Drone Dataset0
Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification0
From 2D to 3D: Re-thinking Benchmarking of Monocular Depth Prediction0
From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano0
From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems0
From Code to Play: Benchmarking Program Search for Games Using Large Language Models0
From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks0
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT0
From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation0
From Grounding to Planning: Benchmarking Bottlenecks in Web Agents0
Show:102550
← PrevPage 64 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified