SOTAVerified

Benchmarking

Papers

Showing 32513300 of 5548 papers

TitleStatusHype
Learning to Fold Real Garments with One Arm: A Case Study in Cloud-Based Robotics Research0
Learning to Mix n-Step Returns: Generalizing lambda-Returns for Deep Reinforcement Learning0
Learning to Plan via Deep Optimistic Value Exploration0
Learning to recognize Abnormalities in Chest X-Rays with Location-Aware Dense Networks0
Learning to Schedule Learning rate with Graph Neural Networks0
Learn-to-Race Challenge 2022: Benchmarking Safe Learning and Cross-domain Generalisation in Autonomous Racing0
Learn to Solve Vehicle Routing Problems ASAP: A Neural Optimization Approach for Time-Constrained Vehicle Routing Problems with Finite Vehicle Fleet0
Le benchmarking de la reconnaissance d'entit\'es nomm\'ees pour le fran (Benchmarking for French NER)0
Less is more: Selecting the right benchmarking set of data for time series classification0
Lessons From Red Teaming 100 Generative AI Products0
Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking0
Leveraging Benchmarking Data for Informed One-Shot Dynamic Algorithm Selection0
Leveraging Contextual Information for Effective Entity Salience Detection0
Leveraging LLMs to Create a Haptic Devices' Recommendation System0
Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study0
Leveraging Spatial and Semantic Feature Extraction for Skin Cancer Diagnosis with Capsule Networks and Graph Neural Networks0
Leveraging State Space Models in Long Range Genomics0
Break a Lag: Triple Exponential Moving Average for Enhanced Optimization0
LEXam: Benchmarking Legal Reasoning on 340 Law Exams0
LIBRE: The Multiple 3D LiDAR Dataset0
LidarGait: Benchmarking 3D Gait Recognition with Point Clouds0
Lifelogging As An Extreme Form of Personal Information Management -- What Lessons To Learn0
Light Field Image Quality Assessment With Auxiliary Learning Based on Depthwise and Anglewise Separable Convolutions0
Lightly Weighted Automatic Audio Parameter Extraction for the Quality Assessment of Consensus Auditory-Perceptual Evaluation of Voice0
Lightning UQ Box: A Comprehensive Framework for Uncertainty Quantification in Deep Learning0
Lightweight Jet Reconstruction and Identification as an Object Detection Task0
LIM: Large Interpolator Model for Dynamic Reconstruction0
Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models0
Liquid State Genetic Programming0
Livestock Monitoring with Transformer0
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education0
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living0
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation0
LLM-based Evaluation Policy Extraction for Ecological Modeling0
LLM Evaluators Recognize and Favor Their Own Generations0
LLM-initialized Differentiable Causal Discovery0
LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation0
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study0
LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection0
LMFormer: Lane based Motion Prediction Transformer0
LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs0
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models0
Load-independent Metrics for Benchmarking Force Controllers0
Local Data Quantity-Aware Weighted Averaging for Federated Learning with Dishonest Clients0
Logically at Factify 2: A Multi-Modal Fact Checking System Based on Evidence Retrieval techniques and Transformer Encoder Architecture0
Logically at Factify 2022: Multimodal Fact Verification0
Benchmarking Continuous Time Models for Predicting Multiple Sclerosis Progression0
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation0
Long Range Arena : A Benchmark for Efficient Transformers0
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning0
Show:102550
← PrevPage 66 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified