Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3851–3900 of 5548 papers

Title	Date	Tasks	Status
Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction	May 23, 2023	Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA)	CodeCode Available
Benchmarking Machine Translation with Cultural Awareness	May 23, 2023	BenchmarkingIn-Context Learning	CodeCode Available
Multilingual Large Language Models Are Not (Yet) Code-Switchers	May 23, 2023	BenchmarkingLanguage Identification	—Unverified
Robust Model-Based Optimization for Challenging Fitness Landscapes	May 23, 2023	Benchmarkingmodel	CodeCode Available
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate	May 22, 2023	BenchmarkingMath	—Unverified
How Fragile is Relation Extraction under Entity Replacements?	May 22, 2023	BenchmarkingCausal Inference	CodeCode Available
A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches	May 22, 2023	BenchmarkingClassification	CodeCode Available
Value-at-Risk-Based Portfolio Insurance: Performance Evaluation and Benchmarking Against CPPI in a Markov-Modulated Regime-Switching Market	May 21, 2023	BenchmarkingFinancial Analysis	—Unverified
Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite	May 20, 2023	Benchmarking	—Unverified
TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks	May 19, 2023	Benchmarking	—Unverified
Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses	May 19, 2023	BenchmarkingForm	CodeCode Available
Ahead-of-Time P-Tuning	May 18, 2023	Benchmarkingparameter-efficient fine-tuning	—Unverified
Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation	May 18, 2023	BenchmarkingDiagnostic	—Unverified
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization	May 18, 2023	BenchmarkingGPU	—Unverified
Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in Large Language Models	May 18, 2023	Benchmarking	—Unverified
Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI	May 17, 2023	Benchmarking	—Unverified
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks	May 17, 2023	Benchmarking	—Unverified
Restoring Images Captured in Arbitrary Hybrid Adverse Weather Conditions in One Go	May 17, 2023	BenchmarkingImage Restoration	—Unverified
DLUE: Benchmarking Document Language Understanding	May 16, 2023	BenchmarkingDocument Classification	—Unverified
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking	May 15, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Predictive Models from Quantum Computer Benchmarks	May 15, 2023	Benchmarkingimage-classification	—Unverified
Benchmarking the human brain against computational architectures	May 15, 2023	BenchmarkingComputational Efficiency	—Unverified
A Strong Sustainability Paradigm Based Analytical Hierarchy Process (SSP-AHP) Method to Evaluate Sustainable Healthcare Systems	May 13, 2023	Benchmarking	—Unverified
MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine	May 12, 2023	Benchmarking	—Unverified
Uncertainty in GNN Learning Evaluations: The Importance of a Consistent Benchmark for Community Detection	May 10, 2023	BenchmarkingCommunity Detection	—Unverified
Comparing Foundation Models using Data Kernels	May 9, 2023	BenchmarkingSelf-Supervised Learning	—Unverified
Towards Segment Anything Model (SAM) for Medical Image Segmentation: A Survey	May 5, 2023	BenchmarkingImage Generation	CodeCode Available
A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness	May 5, 2023	BenchmarkingDataset Distillation	—Unverified
Semantic Segmentation using Vision Transformers: A survey	May 5, 2023	Autonomous DrivingBenchmarking	—Unverified
Can LLMs Capture Human Preferences?	May 4, 2023	Benchmarking	—Unverified
Analyzing Hong Kong's Legal Judgments from a Computational Linguistics point-of-view	May 4, 2023	BenchmarkingGraph Generation	—Unverified
A Simulation-Augmented Benchmarking Framework for Automatic RSO Streak Detection in Single-Frame Space Images	Apr 30, 2023	Benchmarkingobject-detection	—Unverified
Benchmarking Automated Machine Learning Methods for Price Forecasting Applications	Apr 28, 2023	AutoMLBenchmarking	—Unverified
ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task	Apr 27, 2023	ArticlesBenchmarking	—Unverified
On Pitfalls of RemOve-And-Retrain: Data Processing Inequality Perspective	Apr 26, 2023	BenchmarkingFeature Importance	CodeCode Available
Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency	Apr 26, 2023	BenchmarkingCloud Computing	—Unverified
CIMLA: Interpretable AI for inference of differential causal networks	Apr 25, 2023	Benchmarking	—Unverified
Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic-Structural Constraints	Apr 25, 2023	BenchmarkingContrastive Learning	—Unverified
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology	Apr 24, 2023	BenchmarkingDecision Making	CodeCode Available
A Framework for Benchmarking Real-Time Embedded Object Detection	Apr 23, 2023	BenchmarkingObject	—Unverified
Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification	Apr 23, 2023	BenchmarkingData Augmentation	—Unverified
Learning a quantum computer's capability	Apr 20, 2023	Benchmarking	—Unverified
Towards a Benchmark for Scientific Understanding in Humans and Machines	Apr 20, 2023	BenchmarkingInformation Retrieval	—Unverified
Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms	Apr 19, 2023	BenchmarkingDescriptive	CodeCode Available
The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages	Apr 19, 2023	BenchmarkingMachine Translation	CodeCode Available
UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite	Apr 18, 2023	BenchmarkingInstance Segmentation	—Unverified
Computational and Exploratory Landscape Analysis of the GKLS Generator	Apr 18, 2023	Benchmarkingglobal-optimization	—Unverified
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images	Apr 17, 2023	3D Pose EstimationBenchmarking	—Unverified
Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance Analysis	Apr 17, 2023	BenchmarkingDrift Detection	CodeCode Available
Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy	Apr 14, 2023	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 78 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified