SOTAVerified

Benchmarking

Papers

Showing 651675 of 5548 papers

TitleStatusHype
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4Code1
RobFR: Benchmarking Adversarial Robustness on Face RecognitionCode1
Benchmarking Large Language Models for Automated Verilog RTL Code GenerationCode1
Deep learning model solves change point detection for multiple change typesCode1
Deep Learning-Based Synchronization for Uplink NB-IoTCode1
Deep Learning for ECG Analysis: Benchmarks and Insights from PTB-XLCode1
Benchmarking Language Model Creativity: A Case Study on Code GenerationCode1
A Computed Tomography Vertebral Segmentation Dataset with Anatomical Variations and Multi-Vendor Scanner DataCode1
Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working MemoryCode1
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPTCode1
Benchmarking Language Models for Code Syntax UnderstandingCode1
Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsCode1
Decoding the Underlying Meaning of Multimodal Hateful MemesCode1
Descending through a Crowded Valley - Benchmarking Deep Learning OptimizersCode1
Dataset and Benchmark: Novel Sensors for Autonomous Vehicle PerceptionCode1
AudioMarkBench: Benchmarking Robustness of Audio WatermarkingCode1
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasetsCode1
DataRec: A Python Library for Standardized and Reproducible Data Management in Recommender SystemsCode1
Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet DatasetsCode1
A Large-Scale Dataset for Benchmarking Elevator Button Segmentation and Character RecognitionCode1
Data-Driven Denoising of Stationary Accelerometer SignalsCode1
A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy DetectionCode1
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model EvaluationCode1
DACBench: A Benchmark Library for Dynamic Algorithm ConfigurationCode1
Data Generating Process to Evaluate Causal Discovery Techniques for Time Series DataCode1
Show:102550
← PrevPage 27 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified