SOTAVerified

Automated Essay Scoring

Essay scoring: Automated Essay Scoring is the task of assigning a score to an essay, usually in the context of assessing the language ability of a language learner. The quality of an essay is affected by the following four primary dimensions: topic relevance, organization and coherence, word usage and sentence complexity, and grammar and mechanics.

Source: A Joint Model for Multimodal Document Quality Assessment

Papers

Showing 125 of 104 papers

TitleStatusHype
Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMsCode1
Prompt- and Trait Relation-aware Cross-prompt Essay Trait ScoringCode1
Automated Essay Scoring via Pairwise Contrastive RegressionCode1
On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay RepresentationCode1
Countering the Influence of Essay Length in Neural Essay ScoringCode1
Automated Essay Scoring Using Transformer ModelsCode1
A Prompt-independent and Interpretable Automated Essay Scoring Method for Chinese Second Language WritingCode1
EXPATS: A Toolkit for Explainable Automated Text ScoringCode1
Many Hands Make Light Work: Using Essay Traits to Automatically Score EssaysCode1
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring SystemsCode1
Automated Essay Scoring based on Two-Stage LearningCode1
Enhancing Marker Scoring Accuracy through Ordinal Confidence Modelling in Educational Assessments0
Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems0
Composable Cross-prompt Essay Scoring by Merging Models0
CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring0
TRATES: Trait-Specific Rubric-Assisted Cross-Prompt Essay Scoring0
LCES: Zero-shot Automated Essay Scoring via Pairwise Comparisons Using Large Language Models0
Do We Need a Detailed Rubric for Automated Essay Scoring using Large Language Models?0
Does the Prompt-based Large Language Model Recognize Students' Demographics and Introduce Bias in Essay Scoring?0
Evolution of AI in Education: Agentic Workflows0
ARWI: Arabic Write and Improve0
Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection0
EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models0
How well can LLMs Grade Essays in Arabic?0
On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education0
Show:102550
← PrevPage 1 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Neural Pairwise Contrastive Regression (NPCR)Quadratic Weighted Kappa0.82Unverified
2Tran-BERT-MS-ML-RQuadratic Weighted Kappa0.79Unverified
3Considering-Content-XLNetQuadratic Weighted Kappa0.79Unverified
4HISK+BOSWEQuadratic Weighted Kappa0.79Unverified
5SkipFlowQuadratic Weighted Kappa0.76Unverified
6MHMLWQuadratic Weighted Kappa0.76Unverified
7AFQuadratic Weighted Kappa0.73Unverified
8FDAQuadratic Weighted Kappa0.71Unverified