Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3601–3625 of 5548 papers

Title	Date	Tasks	Status
Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media Data	Oct 9, 2023	BenchmarkingLanguage Modeling	CodeCode Available
Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus	Oct 8, 2023	BenchmarkingMachine Translation	CodeCode Available
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems	Oct 8, 2023	Benchmarking	CodeCode Available
Simple GNNs with Low Rank Non-parametric Aggregators	Oct 8, 2023	BenchmarkingNode Classification	CodeCode Available
Benchmarking Large Language Models with Augmented Instructions for Fine-grained Information Extraction	Oct 8, 2023	BenchmarkingDecoder	—Unverified
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets	Oct 7, 2023	Benchmarkingnamed-entity-recognition	—Unverified
Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data	Oct 7, 2023	Benchmarking	—Unverified
Full-scale modal testing of a Hawk T1A aircraft for benchmarking vibration-based methods	Oct 6, 2023	BenchmarkingExperimental Design	—Unverified
CIFAR-10-Warehouse: Broad and More Realistic Testbeds in Model Generalization Analysis	Oct 6, 2023	BenchmarkingDomain Generalization	—Unverified
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation	Oct 6, 2023	BenchmarkingMathematical Reasoning	—Unverified
AKFruitYield: Modular benchmarking and video analysis software for Azure Kinect cameras for fruit size and fruit yield estimation in apple orchards	Oct 6, 2023	Benchmarking	CodeCode Available
Bringing Quantum Algorithms to Automated Machine Learning: A Systematic Review of AutoML Frameworks Regarding Extensibility for QML Algorithms	Oct 6, 2023	AutoMLBenchmarking	—Unverified
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning	Oct 6, 2023	BenchmarkingFederated Learning	—Unverified
Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report	Oct 5, 2023	Benchmarking	—Unverified
A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling	Oct 5, 2023	BenchmarkingDeep Reinforcement Learning	—Unverified
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference	Oct 4, 2023	BenchmarkingGPU	—Unverified
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma	Oct 4, 2023	BenchmarkingSegmentation	CodeCode Available
Deep Reinforcement Learning Algorithms for Hybrid V2X Communication: A Benchmarking Study	Oct 4, 2023	Autonomous VehiclesBenchmarking	—Unverified
On the Performance of Multimodal Language Models	Oct 4, 2023	BenchmarkingBinary Classification	—Unverified
EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations	Oct 3, 2023	Atomic ForcesBenchmarking	—Unverified
Learning Quantum Processes with Quantum Statistical Queries	Oct 3, 2023	BenchmarkingCryptanalysis	CodeCode Available
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods	Oct 3, 2023	Benchmarkingtext-guided-image-editing	—Unverified
Benchmarking and Improving Generator-Validator Consistency of Language Models	Oct 3, 2023	BenchmarkingInstruction Following	—Unverified
CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems	Oct 2, 2023	BenchmarkingComputational Efficiency	—Unverified
A New Real-World Video Dataset for the Comparison of Defogging Algorithms	Oct 2, 2023	BenchmarkingDeblurring	—Unverified

Show:10 25 50

← PrevPage 145 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified