Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3151–3200 of 5548 papers

Title	Date	Tasks	Status
NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition	May 13, 2024	Benchmarkingnamed-entity-recognition	CodeCode Available
Comparative analysis of neural network architectures for short-term FOREX forecasting	May 13, 2024	Benchmarking	—Unverified
UCCIX: Irish-eXcellence Large Language Model	May 13, 2024	BenchmarkingLanguage Modeling	—Unverified
Divergent Creativity in Humans and Large Language Models	May 13, 2024	Benchmarking	CodeCode Available
oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving	May 13, 2024	AttributeAutonomous Driving	—Unverified
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness	May 13, 2024	Benchmarkingcounterfactual	—Unverified
Benchmarking Cross-Domain Audio-Visual Deception Detection	May 11, 2024	BenchmarkingDeception Detection	—Unverified
Replication Study and Benchmarking of Real-Time Object Detection Models	May 11, 2024	Benchmarkingobject-detection	CodeCode Available
Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs	May 10, 2024	BenchmarkingHyperparameter Optimization	—Unverified
Agent-oriented Joint Decision Support for Data Owners in Auction-based Federated Learning	May 9, 2024	BenchmarkingFederated Learning	—Unverified
Benchmarking Educational Program Repair	May 8, 2024	BenchmarkingProgram Repair	CodeCode Available
Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking	May 7, 2024	BenchmarkingModel Selection	—Unverified
Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-Tuning	May 7, 2024	BenchmarkingContrastive Learning	CodeCode Available
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images	May 6, 2024	Benchmarking	—Unverified
Performance Evaluation of Real-Time Object Detection for Electric Scooters	May 5, 2024	Autonomous VehiclesBenchmarking	CodeCode Available
ATG: Benchmarking Automated Theorem Generation for Generative Language Models	May 5, 2024	Automated Theorem ProvingBenchmarking	—Unverified
Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language Models	May 5, 2024	Benchmarking	CodeCode Available
Systematic Review: Anomaly Detection in Connected and Autonomous Vehicles	May 4, 2024	Anomaly DetectionArticles	—Unverified
PhilHumans: Benchmarking Machine Learning for Personal Health	May 4, 2024	Action AnticipationBenchmarking	—Unverified
A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System	May 3, 2024	BenchmarkingCollaborative Filtering	—Unverified
Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo	May 3, 2024	BenchmarkingMulti-hop Question Answering	CodeCode Available
Toward end-to-end interpretable convolutional neural networks for waveform signals	May 3, 2024	BenchmarkingEmotion Recognition	—Unverified
CityLearn v2: Energy-flexible, resilient, occupant-centric, and carbon-aware management of grid-interactive communities	May 2, 2024	BenchmarkingManagement	—Unverified
A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News	May 2, 2024	BenchmarkingSign Language Recognition	—Unverified
Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods	May 2, 2024	Benchmarking	—Unverified
The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA	May 2, 2024	BenchmarkingDrug Discovery	CodeCode Available
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting	Apr 30, 2024	BenchmarkingDepth Completion	—Unverified
Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models	Apr 29, 2024	BenchmarkingClustering	—Unverified
On the Impact of Data Heterogeneity in Federated Learning Environments with Application to Healthcare Networks	Apr 29, 2024	BenchmarkingFederated Learning	—Unverified
MileBench: Benchmarking MLLMs in Long Context	Apr 29, 2024	BenchmarkingDiagnostic	—Unverified
Detecting critical treatment effect bias in small subgroups	Apr 29, 2024	BenchmarkingDecision Making	CodeCode Available
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods	Apr 29, 2024	BenchmarkingDrug Discovery	CodeCode Available
Efficient Exploration of Image Classifier Failures with Bayesian Optimization and Text-to-Image Models	Apr 26, 2024	AttributeBayesian Optimization	—Unverified
Stochastic Spiking Neural Networks with First-to-Spike Coding	Apr 26, 2024	Benchmarking	—Unverified
CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching	Apr 25, 2024	BenchmarkingData Augmentation	CodeCode Available
Benchmarking Mobile Device Control Agents across Diverse Configurations	Apr 25, 2024	BenchmarkingImitation Learning	—Unverified
DPO: A Differential and Pointwise Control Approach to Reinforcement Learning	Apr 24, 2024	Benchmarkingreinforcement-learning	—Unverified
ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees	Apr 24, 2024	BenchmarkingMolecular Property Prediction	CodeCode Available
Empirical Analysis of the Dynamic Binary Value Problem with IOHprofiler	Apr 24, 2024	Benchmarking	—Unverified
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification	Apr 23, 2024	BenchmarkingHyperspectral Image Classification	CodeCode Available
The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking	Apr 22, 2024	BenchmarkingMisinformation	—Unverified
Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches	Apr 22, 2024	BenchmarkingDiversity	—Unverified
Open Datasets for Satellite Radio Resource Control	Apr 22, 2024	BenchmarkingDecision Making	—Unverified
TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos	Apr 22, 2024	BenchmarkingMulti-Object Tracking	—Unverified
EnzChemRED, a rich enzyme chemistry relation extraction dataset	Apr 22, 2024	Benchmarkingnamed-entity-recognition	—Unverified
In-situ process monitoring and adaptive quality enhancement in laser additive manufacturing: a critical review	Apr 21, 2024	BenchmarkingDecision Making	—Unverified
Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News	Apr 21, 2024	BenchmarkingEmotion Recognition	CodeCode Available
Bridging the Gap Between Theory and Practice: Benchmarking Transfer Evolutionary Optimization	Apr 20, 2024	Benchmarking	—Unverified
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning	Apr 19, 2024	Benchmarkingcounterfactual	—Unverified
Integrated Sensing and Communication enabled Multiple Base Stations Cooperative UAV Detection	Apr 19, 2024	BenchmarkingIntegrated sensing and communication	—Unverified

Show:10 25 50

← PrevPage 64 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified