SOTAVerified

Do Schwartz Higher-Order Values Help Sentence-Level Human Value Detection? A Study of Hierarchical Gating and Calibration

2026-03-09Unverified0· sign in to hype

Víctor Yeste, Paolo Rosso

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Human value detection from single sentences is a sparse, imbalanced multi-label task. We study whether Schwartz higher-order (HO) categories help this setting on ValueEval'24 / ValuesML (74K English sentences) under a compute-frugal budget. Rather than proposing a new architecture, we compare direct supervised transformers, hard HOvalues pipelines, PresenceHOvalues cascades, compact instruction-tuned large language models (LLMs), QLoRA, and low-cost upgrades such as threshold tuning and small ensembles. HO categories are learnable: the easiest bipolar pair, Growth vs. Self-Protection, reaches Macro-F_1=0.58. The most reliable gains come from calibration and ensembling: threshold tuning improves Social Focus vs. Personal Focus from 0.41 to 0.57 (+0.16), transformer soft voting lifts Growth from 0.286 to 0.303, and a Transformer+LLM hybrid reaches 0.353 on Self-Protection. In contrast, hard hierarchical gating does not consistently improve the end task. Compact LLMs also underperform supervised encoders as stand-alone systems, although they sometimes add useful diversity in hybrid ensembles. Under this benchmark, the HO structure is more useful as an inductive bias than as a rigid routing rule.

Reproductions