Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2726–2750 of 5548 papers

Title	Date	Tasks	Status
Benchmarking Adaptive Intelligence and Computer Vision on Human-Robot Collaboration	Sep 30, 2024	BenchmarkingIntent Detection	—Unverified
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning	Sep 30, 2024	BenchmarkingDisparity Estimation	CodeCode Available
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs	Sep 30, 2024	BenchmarkingMultiple-choice	—Unverified
Constrained Reinforcement Learning for Safe Heat Pump Control	Sep 29, 2024	Benchmarkingreinforcement-learning	CodeCode Available
Tracking Everything in Robotic-Assisted Surgery	Sep 29, 2024	Benchmarking	—Unverified
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks	Sep 29, 2024	Benchmarking	—Unverified
AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy	Sep 29, 2024	AstronomyBenchmarking	—Unverified
SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement	Sep 28, 2024	BenchmarkingCode Generation	—Unverified
Data Analysis in the Era of Generative AI	Sep 27, 2024	Benchmarking	—Unverified
Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark Study	Sep 27, 2024	Benchmarkingtabular-regression	CodeCode Available
CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting	Sep 27, 2024	ArticlesBenchmarking	—Unverified
bnRep: A repository of Bayesian networks from the academic literature	Sep 27, 2024	Benchmarking	—Unverified
MCUBench: A Benchmark of Tiny Object Detectors on MCUs	Sep 27, 2024	BenchmarkingModel Selection	—Unverified
EarthquakeNPP: Benchmark Datasets for Earthquake Forecasting with Neural Point Processes	Sep 27, 2024	BenchmarkingDataset Generation	—Unverified
Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in Graphs	Sep 26, 2024	BenchmarkingConformal Prediction	CodeCode Available
Benchmarking Domain Generalization Algorithms in Computational Pathology	Sep 25, 2024	BenchmarkingData Augmentation	CodeCode Available
Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices	Sep 25, 2024	Autonomous VehiclesBenchmarking	—Unverified
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning	Sep 25, 2024	BenchmarkingFormal Logic	—Unverified
Omnibenchmark (alpha) for continuous and open benchmarking in bioinformatics	Sep 25, 2024	Benchmarking	—Unverified
SEN12-WATER: A New Dataset for Hydrological Applications and its Benchmarking	Sep 25, 2024	BenchmarkingManagement	—Unverified
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework	Sep 24, 2024	Benchmarkingcounterfactual	CodeCode Available
HLB: Benchmarking LLMs' Humanlikeness in Language Use	Sep 24, 2024	Benchmarking	—Unverified
Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted Data	Sep 24, 2024	BenchmarkingDepth Estimation	CodeCode Available
Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling	Sep 24, 2024	ArticlesBenchmarking	—Unverified
Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation	Sep 24, 2024	BenchmarkingMovie Recommendation	CodeCode Available

Show:10 25 50

← PrevPage 110 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified