SOTAVerified

Benchmarking

Papers

Showing 41764200 of 5548 papers

TitleStatusHype
Towards an AI Accountability Policy0
Towards an Automated SOAP Note: Classifying Utterances from Medical Conversations0
Towards a Taxonomy of Graph Learning Datasets0
Towards a Theory-Guided Benchmarking Suite for Discrete Black-Box Optimization Heuristics: Profiling (1+λ) EA Variants on OneMax and LeadingOnes0
Towards a Unified Framework for Determining Conformational Ensembles of Disordered Proteins0
Towards Benchmarking and Assessing the Safety and Robustness of Autonomous Driving on Safety-critical Scenarios0
Towards Benchmarking and Evaluating Deepfake Detection0
Towards Benchmarking Explainable Artificial Intelligence Methods0
Towards Benchmarking Scene Background Initialization0
Towards Benchmarking the Utility of Explanations for Model Debugging0
Towards Class-agnostic Tracking Using Feature Decorrelation in Point Clouds0
Towards Effective Disambiguation for Machine Translation with Large Language Models0
Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques0
Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset0
Towards Explainable Network Intrusion Detection using Large Language Models0
Towards Fair Machine Learning Software: Understanding and Addressing Model Bias Through Counterfactual Thinking0
Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings0
Towards Ideal Temporal Graph Neural Networks: Evaluations and Conclusions after 10,000 GPU Hours0
Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models0
Towards Large-Scale Small Object Detection: Survey and Benchmarks0
Towards Long-Term predictions of Turbulence using Neural Operators0
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks0
Towards Personalized Federated Learning0
Towards Private Learning on Decentralized Graphs with Local Differential Privacy0
Towards Productionizing Subjective Search Systems0
Show:102550
← PrevPage 168 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified