SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 831–840 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Distribution Shift in Tabular Data with TableShift	Dec 10, 2023	BenchmarkingBinary Classification	CodeCode Available	1
STREAMLINE: An Automated Machine Learning Pipeline for Biomedicine Applied to Examine the Utility of Photography-Based Phenotypes for OSA Prediction Across International Sleep Centers	Dec 9, 2023	AnatomyAutoML	CodeCode Available	1
Benchmarking and Analysis of Unsupervised Object Segmentation from Real-world Single Images	Dec 8, 2023	BenchmarkingObject	CodeCode Available	1
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym	Dec 6, 2023	BenchmarkingDecision Making	CodeCode Available	1
Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM Interactions	Dec 5, 2023	BenchmarkingConversational Question Answering	CodeCode Available	1
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models	Dec 5, 2023	BenchmarkingVisual Question Answering	CodeCode Available	1
BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks	Dec 5, 2023	BenchmarkingMinecraft	CodeCode Available	1
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms	Nov 30, 2023	BenchmarkingOpenAI Gym	CodeCode Available	1
Enhancing Ligand Pose Sampling for Molecular Docking	Nov 30, 2023	BenchmarkingMolecular Docking	CodeCode Available	1
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation	Nov 30, 2023	Benchmarkingcounterfactual	CodeCode Available	1

Show:10 25 50

← PrevPage 84 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified