Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4201–4250 of 5548 papers

Title	Date	Tasks	Status
Towards responsible AI for education: Hybrid human-AI to confront the Elephant in the room	Apr 22, 2025	BenchmarkingFairness	—Unverified
Towards Robust and Generalizable Gerchberg Saxton based Physics Inspired Neural Networks for Computer Generated Holography: A Sensitivity Analysis Framework	Apr 30, 2025	BenchmarkingLearning Theory	—Unverified
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models	Jun 19, 2024	BenchmarkingOpen-Domain Question Answering	—Unverified
Towards Sentiment Analysis of Tobacco Products’ Usage in Social Media	Sep 1, 2021	BenchmarkingSentiment Analysis	—Unverified
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems	May 21, 2025	BenchmarkingMath	—Unverified
Towards Stable 3D Object Detection	Jul 5, 2024	3D Object DetectionAutonomous Driving	—Unverified
Towards Toxic Positivity Detection	Jul 1, 2022	BenchmarkingClassification	—Unverified
Towards Trustworthy Deception Detection: Benchmarking Model Robustness across Domains, Modalities, and Languages	Apr 23, 2021	BenchmarkingDeception Detection	—Unverified
Towards Universal Learning-based Model for Cardiac Image Reconstruction: Summary of the CMRxRecon2024 Challenge	Mar 5, 2025	BenchmarkingImage Reconstruction	—Unverified
Towards Visual Text Grounding of Multimodal Large Language Model	Apr 7, 2025	BenchmarkingLanguage Modeling	—Unverified
Towards Zero-Shot Differential Morphing Attack Detection with Multimodal Large Language Models	May 21, 2025	BenchmarkingPrompt Engineering	—Unverified
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks	Jul 27, 2022	Adversarial RobustnessBenchmarking	—Unverified
TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning	Apr 11, 2025	BenchmarkingLanguage Modeling	—Unverified
Tracking Everything in Robotic-Assisted Surgery	Sep 29, 2024	Benchmarking	—Unverified
Training Mixed-Domain Translation Models via Federated Learning	May 3, 2022	BenchmarkingFederated Learning	—Unverified
Training neural mapping schemes for satellite altimetry with simulation data	Sep 19, 2023	Benchmarking	—Unverified
Training Transformers with Enforced Lipschitz Constants	Jul 17, 2025	Benchmarking	—Unverified
Trajectory Normalized Gradients for Distributed Optimization	Jan 24, 2019	BenchmarkingDistributed Optimization	—Unverified
TRAM: Benchmarking Temporal Reasoning for Large Language Models	Oct 2, 2023	BenchmarkingFew-Shot Learning	—Unverified
Transactive Local Energy Markets Enable Community-Level Resource Coordination Using Individual Rewards	Mar 22, 2024	Benchmarkingenergy management	—Unverified
TransBench: Benchmarking Machine Translation for Industrial-Scale Applications	May 20, 2025	BenchmarkingMachine Translation	—Unverified
Transfer of Knowledge through Reverse Annealing: A Preliminary Analysis of the Benefits and What to Share	Jan 27, 2025	BenchmarkingTransfer Learning	—Unverified
Transformed Subspace Clustering	Dec 10, 2019	BenchmarkingClustering	—Unverified
Transformers in Protein: A Survey	May 26, 2025	BenchmarkingDrug Discovery	—Unverified
Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends	Oct 5, 2024	BenchmarkingChart Understanding	—Unverified
Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning	Oct 14, 2024	Atari GamesBenchmarking	—Unverified
Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems	Oct 7, 2024	BenchmarkingMachine Translation	—Unverified
TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation	Jul 1, 2025	BenchmarkingMachine Translation	—Unverified
TransOpt: Transformer-based Representation Learning for Optimization Problem Classification	Nov 29, 2023	BenchmarkingClassification	—Unverified
TransportationGames: Benchmarking Transportation Knowledge of (Multimodal) Large Language Models	Jan 9, 2024	Benchmarking	—Unverified
Treatment Learning Causal Transformer for Noisy Image Classification	Mar 29, 2022	BenchmarkingClassification	—Unverified
Tree Instance Segmentation With Temporal Contour Graph	Jan 1, 2023	BenchmarkingInstance Segmentation	—Unverified
Trial-Based Dominance Enables Non-Parametric Tests to Compare both the Speed and Accuracy of Stochastic Optimizers	Dec 19, 2022	BenchmarkingStochastic Optimization	—Unverified
Trident: Efficient 4PC Framework for Privacy Preserving Machine Learning	Dec 5, 2019	BenchmarkingBIG-bench Machine Learning	—Unverified
TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images	Jan 25, 2024	BenchmarkingSegmentation	—Unverified
Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms	May 22, 2025	Adversarial AttackBenchmarking	—Unverified
True Online TD-Replan(lambda) Achieving Planning through Replaying	Jan 31, 2025	Benchmarking	—Unverified
Trust but Verify: Programmatic VLM Evaluation in the Wild	Oct 17, 2024	BenchmarkingLanguage Modelling	—Unverified
TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations	Jul 2, 2024	Benchmarkingtext-to-speech	—Unverified
Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data	Sep 23, 2023	BenchmarkingSuper-Resolution	—Unverified
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding	May 23, 2025	BenchmarkingSpatial Reasoning	—Unverified
UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning	May 21, 2025	BenchmarkingImitation Learning	—Unverified
UAV Immersive Video Streaming: A Comprehensive Survey, Benchmarking, and Open Challenges	Oct 31, 2023	Benchmarking	—Unverified
UCCIX: Irish-eXcellence Large Language Model	May 13, 2024	BenchmarkingLanguage Modeling	—Unverified
UCLID-Net: Single View Reconstruction in Object Space	Jun 6, 2020	BenchmarkingDecoder	—Unverified
UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite	Apr 18, 2023	BenchmarkingInstance Segmentation	—Unverified
UGSL: A Unified Framework for Benchmarking Graph Structure Learning	Aug 21, 2023	BenchmarkingGraph structure learning	—Unverified
UKAN: Unbound Kolmogorov-Arnold Network Accompanied with Accelerated Library	Aug 20, 2024	BenchmarkingComputational Efficiency	—Unverified
Unbounded Bayesian Optimization via Regularization	Aug 14, 2015	Bayesian OptimizationBenchmarking	—Unverified
Uncertainty estimation for Cross-dataset performance in Trajectory prediction	May 15, 2022	BenchmarkingPrediction	—Unverified

Show:10 25 50

← PrevPage 85 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified