Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2801–2825 of 5548 papers

Title	Date	Tasks	Status
GPTs and Language Barrier: A Cross-Lingual Legal QA Examination	Mar 26, 2024	ArticlesBenchmarking	—Unverified
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models	Apr 14, 2025	BenchmarkingDescriptive	—Unverified
Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems	Mar 9, 2025	Benchmarking	—Unverified
Variational Laplace for Bayesian neural networks	Nov 20, 2020	BenchmarkingVariational Inference	—Unverified
Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities	May 13, 2025	automatic-speech-translationBenchmarking	—Unverified
Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking	Mar 17, 2024	BenchmarkingDialogue State Tracking	—Unverified
Graph Alignment for Benchmarking Graph Neural Networks and Learning Positional Encodings	May 19, 2025	BenchmarkingCombinatorial Optimization	—Unverified
Beyond Benchmarks: On The False Promise of AI Regulation	Jan 26, 2025	Benchmarking	—Unverified
Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms	Jun 10, 2025	BenchmarkingGraph Attention	—Unverified
Graph-based Deep-Tree Recursive Neural Network (DTRNN) for Text Classification	Sep 4, 2018	BenchmarkingGeneral Classification	—Unverified
Graph-based Prediction and Planning Policy Network (GP3Net) for scalable self-driving in dynamic environments using Deep Reinforcement Learning	Dec 10, 2023	Autonomous VehiclesBenchmarking	—Unverified
Graph clustering with Boltzmann machines	Mar 4, 2022	BenchmarkingClustering	—Unverified
A Benchmark Dataset and Saliency-guided Stacked Autoencoders for Video-based Salient Object Detection	Nov 1, 2016	BenchmarkingObject	—Unverified
GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets	Jun 23, 2024	Benchmarking	—Unverified
Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models	Jul 10, 2024	Benchmarking	—Unverified
Label Efficient Regularization and Propagation for Graph Node Classification	Apr 19, 2022	AttributeBenchmarking	—Unverified
Graph Joint Attention Networks	Sep 28, 2020	BenchmarkingGraph Attention	—Unverified
A Bayesian Committee Machine Potential for Oxygen-containing Organic Compounds	Mar 2, 2024	BenchmarkingPosition	—Unverified
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra	Mar 5, 2021	BenchmarkingGraph Mining	—Unverified
Better Practices for Domain Adaptation	Sep 7, 2023	BenchmarkingDomain Adaptation	—Unverified
1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation	Jun 8, 2024	BenchmarkingInstance Segmentation	—Unverified
Better Bill GPT: Comparing Large Language Models against Legal Invoice Reviewers	Apr 2, 2025	BenchmarkingManagement	—Unverified
BestServe: Serving Strategies with Optimal Goodput in Collocation and Disaggregation Architectures	Jun 6, 2025	BenchmarkingCPU	—Unverified
The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal	Sep 12, 2024	BenchmarkingLanguage Modeling	—Unverified
Best Practices in Pool-based Active Learning for Image Classification	Sep 29, 2021	Active LearningBenchmarking	—Unverified

Show:10 25 50

← PrevPage 113 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified