SOTAVerified

Benchmarking

Papers

Showing 29012950 of 5548 papers

TitleStatusHype
Benchmarking unsupervised near-duplicate image detection0
HoliSafe: Holistic Safety Benchmarking and Modeling with Safety Meta Token for Vision-Language Model0
Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction0
Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning0
Holistic Multi-View Building Analysis in the Wild with Projection Pooling0
Hollywood 3D: Recognizing Actions in 3D Natural Scenes0
HoloGen: An open source toolbox for high-speed hologram generation0
The Extractive-Abstractive Axis: Measuring Content "Borrowing" in Generative Language Models0
Benchmarking Unsupervised Anomaly Detection and Localization0
Horizon Scans can be accelerated using novel information retrieval and artificial intelligence tools0
HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos0
Hotel Recognition via Latent Image Embedding0
Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning0
Benchmarking Uncertainty Quantification on Biosignal Classification Tasks under Dataset Shift0
Household Electricity Demand Forecasting -- Benchmarking State-of-the-Art Methods0
How Aligned are Different Alignment Metrics?0
How Certain are Uncertainty Estimates? Three Novel Earth Observation Datasets for Benchmarking Uncertainty Quantification in Machine Learning0
How Different AI Chatbots Behave? Benchmarking Large Language Models in Behavioral Economics Games0
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension0
The FaceChannelS: Strike of the Sequences for the AffWild 2 Challenge0
Benchmarking Ultra-Low-Power μNPUs0
How Good is a Video Summary? A New Benchmarking Dataset and Evaluation Framework Towards Realistic Video Summarization0
Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making0
How Good Is Neural Combinatorial Optimization? A Systematic Evaluation on the Traveling Salesman Problem0
How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference0
How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers0
How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study0
Benchmarking Ultra-High-Definition Image Super-Resolution0
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input0
Benchmarking Twitter Sentiment Analysis Tools0
The Forchheim Image Database for Camera Identification in the Wild0
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models0
How Universal are Universal Dependencies? Exploiting Syntax for Multilingual Clause-level Sentiment Detection0
Benchmarking Transformers-based models on French Spoken Language Understanding tasks0
How well it works: Benchmarking performance of GPT models on medical natural language processing tasks0
You Only Crash Once v2: Perceptually Consistent Strong Features for One-Stage Domain Adaptive Detection of Space Terrain0
The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech0
The Impact of Genomic Variation on Function (IGVF) Consortium0
A General Taylor Framework for Unifying and Revisiting Attribution Methods0
HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing0
Benchmarking Transformer-based Language Models for Arabic Sentiment and Sarcasm Detection0
Benchmarking Toxic Molecule Classification using Graph Neural Networks and Few Shot Learning0
Human Body Shape Classification Based on a Single Image0
Benchmarking Time Series Forecasting Models: From Statistical Techniques to Foundation Models in Real-World Applications0
Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation0
A generalized kinetic framework applied to whole-cell catalysis in biofilm flow reactors clarifies performance enhancements0
HyBiomass: Global Hyperspectral Imagery Benchmark Dataset for Evaluating Geospatial Foundation Models in Forest Aboveground Biomass Estimation0
Hybrid data driven/thermal simulation model for comfort assessment0
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study0
The iNaturalist Sounds Dataset0
Show:102550
← PrevPage 59 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified