Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3276–3300 of 5548 papers

Title	Date	Tasks	Status
Lightweight Jet Reconstruction and Identification as an Object Detection Task	Feb 9, 2022	Benchmarkingobject-detection	—Unverified
LIM: Large Interpolator Model for Dynamic Reconstruction	Mar 28, 2025	4D reconstructionBenchmarking	—Unverified
Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models	Feb 20, 2025	Benchmarking	—Unverified
Liquid State Genetic Programming	Dec 5, 2023	Benchmarking	—Unverified
Livestock Monitoring with Transformer	Nov 1, 2021	Action RecognitionBenchmarking	—Unverified
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education	Feb 9, 2024	BenchmarkingChatbot	—Unverified
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living	Jun 13, 2024	BenchmarkingHuman-Object Interaction Detection	—Unverified
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation	Oct 6, 2023	BenchmarkingMathematical Reasoning	—Unverified
LLM-based Evaluation Policy Extraction for Ecological Modeling	May 20, 2025	BenchmarkingLarge Language Model	—Unverified
LLM Evaluators Recognize and Favor Their Own Generations	Apr 15, 2024	Benchmarking	—Unverified
LLM-initialized Differentiable Causal Discovery	Oct 28, 2024	BenchmarkingCausal Discovery	—Unverified
LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation	Feb 18, 2025	BenchmarkingText Generation	—Unverified
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study	Sep 13, 2024	BenchmarkingGrapheme-to-Phoneme Conversion	—Unverified
LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection	Oct 29, 2023	BenchmarkingDiversity	—Unverified
LMFormer: Lane based Motion Prediction Transformer	Apr 14, 2025	Autonomous DrivingBenchmarking	—Unverified
LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs	Apr 29, 2025	BenchmarkingFace Generation	—Unverified
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models	Jul 17, 2024	BenchmarkingLanguage Modelling	—Unverified
Load-independent Metrics for Benchmarking Force Controllers	May 13, 2025	Benchmarking	—Unverified
Local Data Quantity-Aware Weighted Averaging for Federated Learning with Dishonest Clients	Apr 17, 2025	BenchmarkingFederated Learning	—Unverified
Logically at Factify 2: A Multi-Modal Fact Checking System Based on Evidence Retrieval techniques and Transformer Encoder Architecture	Jan 9, 2023	AvgBenchmarking	—Unverified
Logically at Factify 2022: Multimodal Fact Verification	Dec 16, 2021	BenchmarkingFact Checking	—Unverified
Benchmarking Continuous Time Models for Predicting Multiple Sclerosis Progression	Feb 15, 2023	Benchmarking	—Unverified
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation	Jan 9, 2025	2k8k	—Unverified
Long Range Arena : A Benchmark for Efficient Transformers	Jan 1, 2021	16kBenchmarking	—Unverified
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning	Dec 21, 2019	BenchmarkingPrediction	—Unverified

Show:10 25 50

← PrevPage 132 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified