SOTAVerified

Automated Essay Scoring

Essay scoring: Automated Essay Scoring is the task of assigning a score to an essay, usually in the context of assessing the language ability of a language learner. The quality of an essay is affected by the following four primary dimensions: topic relevance, organization and coherence, word usage and sentence complexity, and grammar and mechanics.

Source: A Joint Model for Multimodal Document Quality Assessment

Papers

Showing 150 of 104 papers

TitleStatusHype
Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMsCode1
Prompt- and Trait Relation-aware Cross-prompt Essay Trait ScoringCode1
Automated Essay Scoring via Pairwise Contrastive RegressionCode1
On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay RepresentationCode1
Countering the Influence of Essay Length in Neural Essay ScoringCode1
Automated Essay Scoring Using Transformer ModelsCode1
A Prompt-independent and Interpretable Automated Essay Scoring Method for Chinese Second Language WritingCode1
EXPATS: A Toolkit for Explainable Automated Text ScoringCode1
Many Hands Make Light Work: Using Essay Traits to Automatically Score EssaysCode1
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring SystemsCode1
Automated Essay Scoring based on Two-Stage LearningCode1
Enhancing Marker Scoring Accuracy through Ordinal Confidence Modelling in Educational Assessments0
Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems0
Composable Cross-prompt Essay Scoring by Merging Models0
CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring0
TRATES: Trait-Specific Rubric-Assisted Cross-Prompt Essay Scoring0
LCES: Zero-shot Automated Essay Scoring via Pairwise Comparisons Using Large Language Models0
Do We Need a Detailed Rubric for Automated Essay Scoring using Large Language Models?0
Does the Prompt-based Large Language Model Recognize Students' Demographics and Introduce Bias in Essay Scoring?0
Evolution of AI in Education: Agentic Workflows0
ARWI: Arabic Write and Improve0
Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection0
EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models0
How well can LLMs Grade Essays in Arabic?0
On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education0
The Impact of Example Selection in Few-Shot Prompting on Automated Essay Scoring Using GPT Models0
Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs0
Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards0
Are Large Language Models Good Essay Graders?0
Automated essay scoring in Arabic: a dataset and analysis of a BERT-based system0
Is GPT-4 Alone Sufficient for Automated Essay Scoring?: A Comparative Judgment Approach Based on Rater Cognition0
Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory0
Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression0
Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed CounterfactualsCode0
Graded Relevance Scoring of Written Essays with Dense Retrieval0
Can GPT-4 do L2 analytic assessment?0
Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback GenerationCode0
Unleashing Large Language Models' Proficiency in Zero-shot Essay Scoring0
Transformer-based Joint Modelling for Automatic Essay Scoring and Off-Topic Detection0
Autoregressive Score Generation for Multi-trait Essay ScoringCode0
Can Large Language Models Automatically Score Proficiency of Written Essays?Code0
Frustratingly Simple Prompting-based Text Denoising0
DREsS: Dataset for Rubric-based Essay Scoring on EFL Writing0
VerAs: Verify then Assess STEM Lab ReportsCode0
Unveiling the Tapestry of Automated Essay Scoring: A Comprehensive Investigation of Accuracy, Fairness, and GeneralizabilityCode0
Empirical Study of Large Language Models as Automated Essay Scoring Tools in English Composition__Taking TOEFL Independent Writing Task for Example0
Enhancing Essay Scoring with Adversarial Weights Perturbation and Metric-specific AttentionPooling0
Learning to love diligent trolls: Accounting for rater effects in the dialogue safety taskCode0
LLM-as-a-tutor in EFL Writing Education: Focusing on Evaluation of Student-LLM Interaction0
Rubric-Specific Approach to Automated Essay Scoring with Augmentation Training0
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Neural Pairwise Contrastive Regression (NPCR)Quadratic Weighted Kappa0.82Unverified
2Tran-BERT-MS-ML-RQuadratic Weighted Kappa0.79Unverified
3Considering-Content-XLNetQuadratic Weighted Kappa0.79Unverified
4HISK+BOSWEQuadratic Weighted Kappa0.79Unverified
5SkipFlowQuadratic Weighted Kappa0.76Unverified
6MHMLWQuadratic Weighted Kappa0.76Unverified
7AFQuadratic Weighted Kappa0.73Unverified
8FDAQuadratic Weighted Kappa0.71Unverified