SOTAVerified

Automated Essay Scoring

Essay scoring: Automated Essay Scoring is the task of assigning a score to an essay, usually in the context of assessing the language ability of a language learner. The quality of an essay is affected by the following four primary dimensions: topic relevance, organization and coherence, word usage and sentence complexity, and grammar and mechanics.

Source: A Joint Model for Multimodal Document Quality Assessment

Papers

Showing 125 of 104 papers

TitleStatusHype
Enhancing Marker Scoring Accuracy through Ordinal Confidence Modelling in Educational Assessments0
Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems0
Composable Cross-prompt Essay Scoring by Merging Models0
TRATES: Trait-Specific Rubric-Assisted Cross-Prompt Essay Scoring0
CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring0
LCES: Zero-shot Automated Essay Scoring via Pairwise Comparisons Using Large Language Models0
Do We Need a Detailed Rubric for Automated Essay Scoring using Large Language Models?0
Does the Prompt-based Large Language Model Recognize Students' Demographics and Introduce Bias in Essay Scoring?0
Evolution of AI in Education: Agentic Workflows0
ARWI: Arabic Write and Improve0
Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection0
EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models0
How well can LLMs Grade Essays in Arabic?0
On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education0
The Impact of Example Selection in Few-Shot Prompting on Automated Essay Scoring Using GPT Models0
Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs0
Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards0
Are Large Language Models Good Essay Graders?0
Automated essay scoring in Arabic: a dataset and analysis of a BERT-based system0
Is GPT-4 Alone Sufficient for Automated Essay Scoring?: A Comparative Judgment Approach Based on Rater Cognition0
Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory0
Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression0
Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed CounterfactualsCode0
Graded Relevance Scoring of Written Essays with Dense Retrieval0
Can GPT-4 do L2 analytic assessment?0
Show:102550
← PrevPage 1 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Neural Pairwise Contrastive Regression (NPCR)Quadratic Weighted Kappa0.82Unverified
2Tran-BERT-MS-ML-RQuadratic Weighted Kappa0.79Unverified
3Considering-Content-XLNetQuadratic Weighted Kappa0.79Unverified
4HISK+BOSWEQuadratic Weighted Kappa0.79Unverified
5SkipFlowQuadratic Weighted Kappa0.76Unverified
6MHMLWQuadratic Weighted Kappa0.76Unverified
7AFQuadratic Weighted Kappa0.73Unverified
8FDAQuadratic Weighted Kappa0.71Unverified