SOTAVerified

Benchmarking

Papers

Showing 42014250 of 5548 papers

TitleStatusHype
Towards responsible AI for education: Hybrid human-AI to confront the Elephant in the room0
Towards Robust and Generalizable Gerchberg Saxton based Physics Inspired Neural Networks for Computer Generated Holography: A Sensitivity Analysis Framework0
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models0
Towards Sentiment Analysis of Tobacco Products’ Usage in Social Media0
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems0
Towards Stable 3D Object Detection0
Towards Toxic Positivity Detection0
Towards Trustworthy Deception Detection: Benchmarking Model Robustness across Domains, Modalities, and Languages0
Towards Universal Learning-based Model for Cardiac Image Reconstruction: Summary of the CMRxRecon2024 Challenge0
Towards Visual Text Grounding of Multimodal Large Language Model0
Towards Zero-Shot Differential Morphing Attack Detection with Multimodal Large Language Models0
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks0
TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning0
Tracking Everything in Robotic-Assisted Surgery0
Training Mixed-Domain Translation Models via Federated Learning0
Training neural mapping schemes for satellite altimetry with simulation data0
Training Transformers with Enforced Lipschitz Constants0
Trajectory Normalized Gradients for Distributed Optimization0
TRAM: Benchmarking Temporal Reasoning for Large Language Models0
Transactive Local Energy Markets Enable Community-Level Resource Coordination Using Individual Rewards0
TransBench: Benchmarking Machine Translation for Industrial-Scale Applications0
Transfer of Knowledge through Reverse Annealing: A Preliminary Analysis of the Benefits and What to Share0
Transformed Subspace Clustering0
Transformers in Protein: A Survey0
Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends0
Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning0
Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems0
TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation0
TransOpt: Transformer-based Representation Learning for Optimization Problem Classification0
TransportationGames: Benchmarking Transportation Knowledge of (Multimodal) Large Language Models0
Treatment Learning Causal Transformer for Noisy Image Classification0
Tree Instance Segmentation With Temporal Contour Graph0
Trial-Based Dominance Enables Non-Parametric Tests to Compare both the Speed and Accuracy of Stochastic Optimizers0
Trident: Efficient 4PC Framework for Privacy Preserving Machine Learning0
TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images0
Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms0
True Online TD-Replan(lambda) Achieving Planning through Replaying0
Trust but Verify: Programmatic VLM Evaluation in the Wild0
TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations0
Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data0
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding0
UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning0
UAV Immersive Video Streaming: A Comprehensive Survey, Benchmarking, and Open Challenges0
UCCIX: Irish-eXcellence Large Language Model0
UCLID-Net: Single View Reconstruction in Object Space0
UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite0
UGSL: A Unified Framework for Benchmarking Graph Structure Learning0
UKAN: Unbound Kolmogorov-Arnold Network Accompanied with Accelerated Library0
Unbounded Bayesian Optimization via Regularization0
Uncertainty estimation for Cross-dataset performance in Trajectory prediction0
Show:102550
← PrevPage 85 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified