The DiGreC Treebank

In 2017, the project “Investigating Variation and Change: Case in Diachrony” commenced, involving researchers from Ulster University and the University of Crete, and funded by the Arts & Humanities Research Council. Details of the project’s history and aims can be found on its website.

The DiGreC (Diachrony of Greek Case) treebank contains a parsed and annotated version of the data used in this research to investigate changes in the syntax of Greek. The data were selected using a ‘verb-sensitive’ approach, intended to identify passages containing the verbs being studied and to provide illustrative examples of the range of constructions in which they occur. This corpus is intended for the study of phenomena that can be described in binary terms, such as grammaticality; if a construction can be found in the treebank, this is evidence that it was attested in natural language. While the corpus was developed as part of a specific project, we hope that the provision of morphosyntactically and semantically annotated data from throughout the history of Greek, ranging from Homer to early modern authors, will be of use for research on a wide range of topics. The DiGreC resource is still evolving, and data and features will continue to be added and updated on an ongoing basis.

If you would like to cite the corpus, it is described fully in this article:

Macleod, Morgan, Elena Anagnostopoulou, Dionysios Mertyris, and Christina Sevdali. 2021. “The DiGreC Treebank”. Research Data Journal for the Humanities and Social Sciences 6.1: 1-12. https://doi.org/10.1163/24523666-06010004.

The source code for the corpus interface is available at https://github.com/mdm33/digrec. This site also contains the raw corpus data in XML and CSV format; as described in the article, there is also a list of the lemmata for which the corpus coverage is most nearly exhaustive. The permanent link for the corpus dataset is https://doi.org/10.21251/59fd3210-83fe-4d1c-8d18-f2cd1168ccd6.

Update: On 28/03/2022 updates were made to the DiGreC treebank, providing

Data from the previous version of the corpus are still available here.

