Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3851–3875 of 5548 papers

Title	Date	Tasks	Status
Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction	May 23, 2023	Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA)	CodeCode Available
Benchmarking Machine Translation with Cultural Awareness	May 23, 2023	BenchmarkingIn-Context Learning	CodeCode Available
Multilingual Large Language Models Are Not (Yet) Code-Switchers	May 23, 2023	BenchmarkingLanguage Identification	—Unverified
Robust Model-Based Optimization for Challenging Fitness Landscapes	May 23, 2023	Benchmarkingmodel	CodeCode Available
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate	May 22, 2023	BenchmarkingMath	—Unverified
How Fragile is Relation Extraction under Entity Replacements?	May 22, 2023	BenchmarkingCausal Inference	CodeCode Available
A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches	May 22, 2023	BenchmarkingClassification	CodeCode Available
Value-at-Risk-Based Portfolio Insurance: Performance Evaluation and Benchmarking Against CPPI in a Markov-Modulated Regime-Switching Market	May 21, 2023	BenchmarkingFinancial Analysis	—Unverified
Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite	May 20, 2023	Benchmarking	—Unverified
TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks	May 19, 2023	Benchmarking	—Unverified
Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses	May 19, 2023	BenchmarkingForm	CodeCode Available
Ahead-of-Time P-Tuning	May 18, 2023	Benchmarkingparameter-efficient fine-tuning	—Unverified
Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation	May 18, 2023	BenchmarkingDiagnostic	—Unverified
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization	May 18, 2023	BenchmarkingGPU	—Unverified
Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in Large Language Models	May 18, 2023	Benchmarking	—Unverified
Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI	May 17, 2023	Benchmarking	—Unverified
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks	May 17, 2023	Benchmarking	—Unverified
Restoring Images Captured in Arbitrary Hybrid Adverse Weather Conditions in One Go	May 17, 2023	BenchmarkingImage Restoration	—Unverified
DLUE: Benchmarking Document Language Understanding	May 16, 2023	BenchmarkingDocument Classification	—Unverified
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking	May 15, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Predictive Models from Quantum Computer Benchmarks	May 15, 2023	Benchmarkingimage-classification	—Unverified
Benchmarking the human brain against computational architectures	May 15, 2023	BenchmarkingComputational Efficiency	—Unverified
A Strong Sustainability Paradigm Based Analytical Hierarchy Process (SSP-AHP) Method to Evaluate Sustainable Healthcare Systems	May 13, 2023	Benchmarking	—Unverified
MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine	May 12, 2023	Benchmarking	—Unverified
Uncertainty in GNN Learning Evaluations: The Importance of a Consistent Benchmark for Community Detection	May 10, 2023	BenchmarkingCommunity Detection	—Unverified

Show:10 25 50

← PrevPage 155 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified