-
Exploiting Cross-Dialectal Gold Syntax for Low-Resource Historical Languages: Towards a Generic Parser for Pre-Modern Slavic
- Author(s):
- Nilo Pedrazzini (see profile)
- Date:
- 2020
- Item Type:
- Conference proceeding
- Conf. Title:
- CHR 2020: Workshop on Computational Humanities Research
- Conf. Loc.:
- Amsterdam, The Netherlands
- Conf. Date:
- November 18–20, 2020
- Tag(s):
- low-resource languages, dependency parsing, neural networks, Early Slavic
- Permanent URL:
- http://dx.doi.org/10.17613/5t12-hk29
- Abstract:
- This paper explores the possibility of improving the performance of specialized parsers for pre- modern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treated as low-resource historical languages, whereby cross-dialectal treebank data may be exploited to overcome data scarcity and attempt the training of a variety-agnostic parser. Previous experiments on early Slavic dependency parsing are discussed, particularly with regard to their ability to tackle different orthographic, regional and stylistic features. A generic pre-modern Slavic parser and two specialized parsers – one for East Slavic and one for South Slavic – are trained using jPTDP [8], a neural network model for joint part-of-speech (POS) tagging and dependency parsing which had shown promising results on a number of Universal Dependency (UD) treebanks, including Old Church Slavonic (OCS). With these experiments, a new state of the art is obtained for both OCS (83.79% unlabelled attachment score (UAS) and 78.43% labelled attachment score (LAS)) and Old East Slavic (OES) (85.7% UAS and 80.16% LAS).
- Metadata:
- xml
- Published as:
- Conference proceeding Show details
- Pub. DOI:
- http://ceur-ws.org/Vol-2723/short48.pdf
- Publisher:
- CEUR Workshop Proceedings
- Pub. Date:
- November 2020
- Proceeding:
- Proceedings of the Workshop on Computational Humanities Research (CHR 2020), CEUR Workshop Proceedings
- Page Range:
- 237 - 247
- Status:
- Published
- Last Updated:
- 2 years ago
- License:
- All Rights Reserved
- Share this:
-
Exploiting Cross-Dialectal Gold Syntax for Low-Resource Historical Languages: Towards a Generic Parser for Pre-Modern Slavic