SOTAVerified

Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format

2018-11-01WS 2018Unverified0· sign in to hype

Alina Wr{\'o}blewska

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The paper presents the largest Polish Dependency Bank in Universal Dependencies format -- PDBUD -- with 22K trees and 352K tokens. PDBUD builds on its previous version, i.e. the Polish UD treebank (PL-SZ), and contains all 8K PL-SZ trees. The PL-SZ trees are checked and possibly corrected in the current edition of PDBUD. Further 14K trees are automatically converted from a new version of Polish Dependency Bank. The PDBUD trees are expanded with the enhanced edges encoding the shared dependents and the shared governors of the coordinated conjuncts and with the semantic roles of some dependents. The conducted evaluation experiments show that PDBUD is large enough for training a high-quality graph-based dependency parser for Polish.

Tasks

Reproductions