SOTAVerified

Benchmarking

Papers

Showing 45514600 of 5548 papers

TitleStatusHype
Improvements & Evaluations on the MLCommons CloudMask BenchmarkCode0
The current state of single-cell proteomics data analysisCode0
Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context EvaluationCode0
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language GenerationCode0
Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part IICode0
BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media TextsCode0
LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming LanguagesCode0
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part ICode0
LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like PostsCode0
Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation ThreadsCode0
BLESS: Benchmarking Large Language Models on Sentence SimplificationCode0
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair PredictionCode0
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image ClassificationCode0
BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models for Sentiment Analysis of Bangla Social Media PostsCode0
LLM Performance for Code Generation on Noisy TasksCode0
ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeCode0
A Dataset for Web-Scale Knowledge Base PopulationCode0
The Devil is in the Prompts: De-Identification Traces Enhance Memorization Risks in Synthetic Chest X-Ray GenerationCode0
Impact of ImageNet Model Selection on Domain AdaptationCode0
Immunofluorescence Capillary Imaging Segmentation: Cases StudyCode0
Analyzing the Feature Extractor Networks for Face Image SynthesisCode0
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity LearningCode0
LLpowershap: Logistic Loss-based Automated Shapley Values Feature Selection MethodCode0
Revisiting and Benchmarking Graph Autoencoders: A Contrastive Learning PerspectiveCode0
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual IllusionsCode0
Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language ModelsCode0
AI-enabled Sound Pattern Recognition on Asthma Medication Adherence: Evaluation with the RDA Benchmark SuiteCode0
BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image AnalysisCode0
Illuminating the Diversity-Fitness Trade-Off in Black-Box OptimizationCode0
Revisiting Hate Speech Benchmarks: From Data Curation to System DeploymentCode0
Local manifold learning and its link to domain-based physics knowledgeCode0
LOCO-EPI: Leave-one-chromosome-out (LOCO) as a benchmarking paradigm for deep learning based prediction of enhancer-promoter interactionsCode0
IJCB 2022 Mobile Behavioral Biometrics Competition (MobileB2C)Code0
Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug ErrorsCode0
BioSentVec: creating sentence embeddings for biomedical textsCode0
LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Multi-Domain Reasoning ChallengesCode0
IHCV: Discovery of Hidden Time-Dependent Control Variables in Non-Linear Dynamical SystemsCode0
Identifying the Smallest Adversarial Load Perturbations that Render DC-OPF InfeasibleCode0
LogoNet: a fine-grained network for instance-level logo sketch retrievalCode0
Identifying Money Laundering Subgraphs on the BlockchainCode0
Identifying and Benchmarking Natural Out-of-Context Prediction ProblemsCode0
Analysis | OPEN | Published: 17 June 2019 Multitask learning and benchmarking with clinical time series dataCode0
IdeaBench: Benchmarking Large Language Models for Research Idea GenerationCode0
IceBench: A Benchmark for Deep Learning based Sea Ice Type ClassificationCode0
BioFors: A Large Biomedical Image Forensics DatasetCode0
Benchmarking Attribution Methods with Relative Feature ImportanceCode0
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMsCode0
Hyperspectral Image Dataset for Benchmarking on Salient Object DetectionCode0
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement LearningCode0
Look Across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face RecognitionCode0
Show:102550
← PrevPage 92 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified