Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2576–2600 of 5548 papers

Title	Date	Tasks	Status	Hype
What the Weight?! A Unified Framework for Zero-Shot Knowledge Composition	Jan 23, 2024	Benchmarking	CodeCode Available	0
LLpowershap: Logistic Loss-based Automated Shapley Values Feature Selection Method	Jan 23, 2024	BenchmarkingFairness	CodeCode Available	0
Benchmarking LLMs via Uncertainty Quantification	Jan 23, 2024	BenchmarkingUncertainty Quantification	CodeCode Available	3
Deep Neural Network Benchmarks for Selective Classification	Jan 23, 2024	BenchmarkingClassification	CodeCode Available	0
Subgroup analysis methods for time-to-event outcomes in heterogeneous randomized controlled trials	Jan 22, 2024	BenchmarkingSynthetic Data Generation	CodeCode Available	0
A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation	Jan 22, 2024	BenchmarkingDiagnostic	CodeCode Available	3
Benchmarking Large Multimodal Models against Common Corruptions	Jan 22, 2024	BenchmarkingImage to text	CodeCode Available	1
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling	Jan 21, 2024	Benchmarking	CodeCode Available	1
Data-Driven Target Localization: Benchmarking Gradient Descent Using the Cramer-Rao Bound	Jan 20, 2024	Benchmarking	—Unverified	0
Data Augmentation for Traffic Classification	Jan 19, 2024	BenchmarkingClassification	—Unverified	0
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents	Jan 18, 2024	Benchmarking	CodeCode Available	2
WAVES: Benchmarking the Robustness of Image Watermarks	Jan 16, 2024	Benchmarking	CodeCode Available	2
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription	Jan 16, 2024	Automatic Speech RecognitionBenchmarking	—Unverified	0
Harnessing Orthogonality to Train Low-Rank Neural Networks	Jan 16, 2024	Benchmarking	CodeCode Available	0
Large Language Models are Null-Shot Learners	Jan 16, 2024	Arithmetic ReasoningBenchmarking	—Unverified	0
TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding	Jan 16, 2024	Action RecognitionBenchmarking	—Unverified	0
OpenDPD: An Open-Source End-to-End Learning & Benchmarking Framework for Wideband Power Amplifier Modeling and Digital Pre-Distortion	Jan 16, 2024	Benchmarking	—Unverified	0
Authorship Obfuscation in Multilingual Machine-Generated Text Detection	Jan 15, 2024	Adversarial RobustnessBenchmarking	CodeCode Available	2
RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving	Jan 14, 2024	Autonomous DrivingBenchmarking	CodeCode Available	1
A Reinforcement Learning Environment for Directed Quantum Circuit Synthesis	Jan 13, 2024	Benchmarkingreinforcement-learning	—Unverified	0
Lifelogging As An Extreme Form of Personal Information Management -- What Lessons To Learn	Jan 11, 2024	BenchmarkingForm	—Unverified	0
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks	Jan 10, 2024	Benchmarking	CodeCode Available	2
Knowledge Sharing in Manufacturing using Large Language Models: User Evaluation and Model Benchmarking	Jan 10, 2024	BenchmarkingInformation Retrieval	—Unverified	0
Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics	Jan 10, 2024	Anomaly SegmentationAutonomous Driving	—Unverified	0
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference	Jan 9, 2024	BenchmarkingText Generation	CodeCode Available	7

Show:10 25 50

← PrevPage 104 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified