Adaptively profiling models with task elicitation

2025-03-03Unverified0· sign in to hype

Davis Brown, Prithvi Balehannina, Helen Jin, Shreya Havaldar, Hamed Hassani, Eric Wong

Unverified — Be the first to reproduce this paper.

Abstract

Language model evaluations often fail to characterize consequential failure modes, forcing experts to inspect outputs and build new benchmarks. We introduce task elicitation, a method that automatically builds new evaluations to profile model behavior. Task elicitation finds hundreds of natural-language tasks -- an order of magnitude more than prior work -- where frontier models exhibit systematic failures, in domains ranging from forecasting to online harassment. For example, we find that Sonnet 3.5 over-associates quantum computing and AGI and that o3-mini is prone to hallucination when fabrications are repeated in-context.

Tasks

Hallucination Language Modeling Language Modelling Legal Reasoning

Adaptively profiling models with task elicitation

Abstract

Tasks

Reproductions