Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2151–2175 of 5548 papers

Title	Date	Tasks	Status	Hype
Categorization of 33 computational methods to detect spatially variable genes from spatially resolved transcriptomics data	May 29, 2024	BenchmarkingSpecificity	—Unverified	0
MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification	May 29, 2024	Benchmarking	—Unverified	0
Benchmarking and Improving Detail Image Caption	May 29, 2024	BenchmarkingImage Captioning	CodeCode Available	2
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions	May 29, 2024	BenchmarkingDialogue Understanding	CodeCode Available	1
Quantitative Certification of Bias in Large Language Models	May 29, 2024	Benchmarking	CodeCode Available	1
Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion	May 28, 2024	BenchmarkingEmotion Recognition	—Unverified	0
Risk-Neutral Generative Networks	May 28, 2024	Benchmarking	—Unverified	0
DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime	May 28, 2024	BenchmarkingReinforcement Learning (RL)	CodeCode Available	1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences	May 28, 2024	BenchmarkingFeature Engineering	CodeCode Available	1
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters	May 27, 2024	BenchmarkingGSM8K	CodeCode Available	2
Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving	May 27, 2024	Autonomous DrivingBenchmarking	CodeCode Available	3
A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis	May 27, 2024	Benchmarking	—Unverified	0
Benchmarking General-Purpose In-Context Learning	May 27, 2024	BenchmarkingDecision Making	—Unverified	0
BOLD: Boolean Logic Deep Learning	May 25, 2024	BenchmarkingDeep Learning	—Unverified	0
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases	May 25, 2024	BenchmarkingHallucination	—Unverified	0
Application based Evaluation of an Efficient Spike-Encoder, "Spiketrum"	May 24, 2024	BenchmarkingClassification	—Unverified	0
Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification	May 24, 2024	BenchmarkingData Augmentation	—Unverified	0
NuwaTS: a Foundation Model Mending Every Incomplete Time Series	May 24, 2024	BenchmarkingContrastive Learning	—Unverified	0
Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology images	May 24, 2024	BenchmarkingClassification	—Unverified	0
Full-stack evaluation of Machine Learning inference workloads for RISC-V systems	May 24, 2024	BenchmarkingDeep Learning	—Unverified	0
MCDFN: Supply Chain Demand Forecasting via an Explainable Multi-Channel Data Fusion Network Model	May 24, 2024	BenchmarkingDemand Forecasting	—Unverified	0
Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study	May 24, 2024	BenchmarkingVulnerability Detection	—Unverified	0
Benchmarking the Performance of Pre-trained LLMs across Urdu NLP Tasks	May 24, 2024	BenchmarkingDecoder	—Unverified	0
Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling	May 23, 2024	Benchmarking	CodeCode Available	1
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models	May 23, 2024	Benchmarking	CodeCode Available	2

Show:10 25 50

← PrevPage 87 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified