SOTAVerified

Benchmarking

Papers

Showing 25212530 of 5548 papers

TitleStatusHype
DFEE: Interactive DataFlow Execution and Evaluation KitCode0
A Manually Annotated Image-Caption Dataset for Detecting Children in the WildCode0
From MNIST to ImageNet and Back: Benchmarking Continual Curriculum LearningCode0
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in ExplanationsCode0
Flexible Generation of Preference Data for Recommendation AnalysisCode0
FALCON: Feature-Label Constrained Graph Net Collapse for Memory Efficient GNNsCode0
Benchmarking Commercial Intent Detection Services with Practice-Driven EvaluationsCode0
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question AnsweringCode0
FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN ParametersCode0
FR-MRInet: A Deep Convolutional Encoder-Decoder for Brain Tumor Segmentation with Relu-RGB and Sliding-windowCode0
Show:102550
← PrevPage 253 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified