Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2951–3000 of 5548 papers

Title	Date	Tasks	Status	Hype
Demographic Parity: Mitigating Biases in Real-World Data	Sep 27, 2023	Benchmarking	—Unverified	0
NLPBench: Evaluating Large Language Models on Solving NLP Problems	Sep 27, 2023	BenchmarkingMath	CodeCode Available	1
A Content-Driven Micro-Video Recommendation Dataset at Scale	Sep 27, 2023	BenchmarkingRecommendation Systems	CodeCode Available	2
Unified Long-Term Time-Series Forecasting Benchmark	Sep 27, 2023	BenchmarkingTime Series	CodeCode Available	1
Node-Aligned Graph-to-Graph (NAG2G): Elevating Template-Free Deep Learning Approaches in Single-Step Retrosynthesis	Sep 27, 2023	BenchmarkingGraph Generation	CodeCode Available	1
Advancing The Rate-Distortion-Computation Frontier For Neural Image Compression	Sep 26, 2023	BenchmarkingImage Compression	—Unverified	0
A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement Learning	Sep 26, 2023	BenchmarkingMulti-Objective Reinforcement Learning	CodeCode Available	2
Thalamic nuclei segmentation from T_1-weighted MRI: unifying and benchmarking state-of-the-art methods with young and old cohorts	Sep 26, 2023	BenchmarkingSegmentation	—Unverified	0
On quantifying and improving realism of images generated with diffusion	Sep 26, 2023	AttributeBenchmarking	—Unverified	0
Optimization Techniques for a Physical Model of Human Vocalisation	Sep 26, 2023	Benchmarking	—Unverified	0
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition	Sep 25, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1
Efficient Pauli channel estimation with logarithmic quantum memory	Sep 25, 2023	Benchmarking	—Unverified	0
Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence	Sep 24, 2023	BenchmarkingChange Detection	CodeCode Available	0
Categorization and analysis of 14 computational methods for estimating cell potency from single-cell RNA-seq data	Sep 24, 2023	Benchmarking	—Unverified	0
Benchmarking Encoder-Decoder Architectures for Biplanar X-ray to 3D Shape Reconstruction	Sep 24, 2023	3D Shape ReconstructionAnatomy	CodeCode Available	1
VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph	Sep 24, 2023	BenchmarkingKnowledge Graphs	—Unverified	0
Grad DFT: a software library for machine learning enhanced density functional theory	Sep 23, 2023	Benchmarking	CodeCode Available	1
Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data	Sep 23, 2023	BenchmarkingSuper-Resolution	—Unverified	0
Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts	Sep 22, 2023	ArticlesBenchmarking	—Unverified	0
Benchmarking quantized LLaMa-based models on the Brazilian Secondary School Exam	Sep 21, 2023	BenchmarkingComputational Efficiency	—Unverified	0
Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation	Sep 21, 2023	BenchmarkingClassification	CodeCode Available	1
Multimodal Deep Learning for Scientific Imaging Interpretation	Sep 21, 2023	ArticlesBenchmarking	—Unverified	0
On the relationship between Benchmarking, Standards and Certification in Robotics and AI	Sep 21, 2023	Benchmarking	—Unverified	0
Towards Effective Disambiguation for Machine Translation with Large Language Models	Sep 20, 2023	BenchmarkingIn-Context Learning	—Unverified	0
An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum Disorder	Sep 20, 2023	BenchmarkingClustering	CodeCode Available	0
Training neural mapping schemes for satellite altimetry with simulation data	Sep 19, 2023	Benchmarking	—Unverified	0
SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction	Sep 19, 2023	3D ReconstructionBenchmarking	—Unverified	0
The Protein Engineering Tournament: An Open Science Benchmark for Protein Modeling and Design	Sep 18, 2023	Benchmarking	—Unverified	0
Emerging Approaches for THz Array Imaging: A Tutorial Review and Software Tool	Sep 16, 2023	BenchmarkingImage Super-Resolution	—Unverified	0
Exploration of TPUs for AI Applications	Sep 16, 2023	BenchmarkingEdge-computing	—Unverified	0
Anchor Points: Benchmarking Models with Much Fewer Examples	Sep 14, 2023	BenchmarkingLanguage Modeling	CodeCode Available	0
M3Dsynth: A dataset of medical 3D images with AI-generated local manipulations	Sep 14, 2023	BenchmarkingComputed Tomography (CT)	CodeCode Available	0
Leveraging Contextual Information for Effective Entity Salience Detection	Sep 14, 2023	ArticlesBenchmarking	—Unverified	0
Benchmarking machine learning models for quantum state classification	Sep 14, 2023	BenchmarkingClassification	—Unverified	0
VerilogEval: Evaluating Large Language Models for Verilog Code Generation	Sep 14, 2023	BenchmarkingCode Generation	CodeCode Available	2
So you think you can track?	Sep 13, 2023	BenchmarkingObject	—Unverified	0
Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish	Sep 13, 2023	BenchmarkingTranslation	CodeCode Available	0
An Image Dataset for Benchmarking Recommender Systems with Raw Pixels	Sep 13, 2023	BenchmarkingRecommendation Systems	CodeCode Available	1
AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving	Sep 12, 2023	Autonomous DrivingBenchmarking	—Unverified	0
Unveiling the potential of large language models in generating semantic and cross-language clones	Sep 12, 2023	BenchmarkingCode Generation	—Unverified	0
Formalizing Multimedia Recommendation through Multimodal Deep Learning	Sep 11, 2023	BenchmarkingDeep Learning	CodeCode Available	1
FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions	Sep 10, 2023	3D Human Pose Estimation3D Pose Estimation	CodeCode Available	1
RecAD: Towards A Unified Library for Recommender Attack and Defense	Sep 9, 2023	BenchmarkingRecommendation Systems	CodeCode Available	1
Navigating Out-of-Distribution Electricity Load Forecasting during COVID-19: Benchmarking energy load forecasting models without and with continual learning	Sep 8, 2023	BenchmarkingContinual Learning	CodeCode Available	0
DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation	Sep 7, 2023	BenchmarkingNeural Architecture Search	—Unverified	0
PyGraft: Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips	Sep 7, 2023	BenchmarkingKnowledge Graphs	CodeCode Available	2
Using representation balancing to learn conditional-average dose responses from clustered data	Sep 7, 2023	BenchmarkingCausal Inference	CodeCode Available	0
Better Practices for Domain Adaptation	Sep 7, 2023	BenchmarkingDomain Adaptation	—Unverified	0
Evaluation of large language models for discovery of gene set function	Sep 7, 2023	BenchmarkingLanguage Modelling	CodeCode Available	1
Neural Networks for Fast Optimisation in Model Predictive Control: A Review	Sep 6, 2023	BenchmarkingModel Predictive Control	—Unverified	0

Show:10 25 50

← PrevPage 60 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified