SOTAVerified

Automated Essay Scoring

Essay scoring: Automated Essay Scoring is the task of assigning a score to an essay, usually in the context of assessing the language ability of a language learner. The quality of an essay is affected by the following four primary dimensions: topic relevance, organization and coherence, word usage and sentence complexity, and grammar and mechanics.

Source: A Joint Model for Multimodal Document Quality Assessment

Papers

Showing 150 of 104 papers

TitleStatusHype
Enhancing Marker Scoring Accuracy through Ordinal Confidence Modelling in Educational Assessments0
Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems0
Composable Cross-prompt Essay Scoring by Merging Models0
CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring0
TRATES: Trait-Specific Rubric-Assisted Cross-Prompt Essay Scoring0
LCES: Zero-shot Automated Essay Scoring via Pairwise Comparisons Using Large Language Models0
Do We Need a Detailed Rubric for Automated Essay Scoring using Large Language Models?0
Does the Prompt-based Large Language Model Recognize Students' Demographics and Introduce Bias in Essay Scoring?0
Evolution of AI in Education: Agentic Workflows0
ARWI: Arabic Write and Improve0
Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection0
EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models0
How well can LLMs Grade Essays in Arabic?0
On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education0
The Impact of Example Selection in Few-Shot Prompting on Automated Essay Scoring Using GPT Models0
Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs0
Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards0
Are Large Language Models Good Essay Graders?0
Automated essay scoring in Arabic: a dataset and analysis of a BERT-based system0
Is GPT-4 Alone Sufficient for Automated Essay Scoring?: A Comparative Judgment Approach Based on Rater Cognition0
Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory0
Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression0
Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed CounterfactualsCode0
Graded Relevance Scoring of Written Essays with Dense Retrieval0
Can GPT-4 do L2 analytic assessment?0
Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback GenerationCode0
Unleashing Large Language Models' Proficiency in Zero-shot Essay Scoring0
Transformer-based Joint Modelling for Automatic Essay Scoring and Off-Topic Detection0
Autoregressive Score Generation for Multi-trait Essay ScoringCode0
Can Large Language Models Automatically Score Proficiency of Written Essays?Code0
Frustratingly Simple Prompting-based Text Denoising0
DREsS: Dataset for Rubric-based Essay Scoring on EFL Writing0
VerAs: Verify then Assess STEM Lab ReportsCode0
Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMsCode1
Unveiling the Tapestry of Automated Essay Scoring: A Comprehensive Investigation of Accuracy, Fairness, and GeneralizabilityCode0
Empirical Study of Large Language Models as Automated Essay Scoring Tools in English Composition__Taking TOEFL Independent Writing Task for Example0
Enhancing Essay Scoring with Adversarial Weights Perturbation and Metric-specific AttentionPooling0
Learning to love diligent trolls: Accounting for rater effects in the dialogue safety taskCode0
LLM-as-a-tutor in EFL Writing Education: Focusing on Evaluation of Student-LLM Interaction0
Rubric-Specific Approach to Automated Essay Scoring with Augmentation Training0
Review of feedback in Automated Essay Scoring0
Automated Essay Scoring in Argumentative Writing: DeBERTeachingAssistant0
Modeling Structural Similarities between Documents for Coherence Assessment with Graph Convolutional NetworksCode0
Prompt- and Trait Relation-aware Cross-prompt Essay Trait ScoringCode1
The Effectiveness of a Dynamic Loss Function in Neural Network Based Automated Essay Scoring0
WikiSQE: A Large-Scale Dataset for Sentence Quality Estimation in WikipediaCode0
Can ChatGPT and Bard Generate Aligned Assessment Items? A Reliability Analysis against Human Performance0
H-AES: Towards Automated Essay Scoring for HindiCode0
Using Active Learning Methods to Strategically Select Essays for Automated Scoring0
Data Augmentation for Automated Essay Scoring using Transformer Models0
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Neural Pairwise Contrastive Regression (NPCR)Quadratic Weighted Kappa0.82Unverified
2Tran-BERT-MS-ML-RQuadratic Weighted Kappa0.79Unverified
3Considering-Content-XLNetQuadratic Weighted Kappa0.79Unverified
4HISK+BOSWEQuadratic Weighted Kappa0.79Unverified
5SkipFlowQuadratic Weighted Kappa0.76Unverified
6MHMLWQuadratic Weighted Kappa0.76Unverified
7AFQuadratic Weighted Kappa0.73Unverified
8FDAQuadratic Weighted Kappa0.71Unverified