Polina Solonets and Maxim Kupreyev, members of the project team, participated in the poster presentation as a part of the DARIAH Annual Event 2023 taking place on June 6th to June 9th in Budapest. This year the conference topic was ‘Cultural Heritage Data as Humanities Research Data?’. Polina and Maxim introduced their approach to sustainable workflow organisation when working on a large scale edition and presented a poster entitled: ‘Sustainable Practices for the Large-Scale TEI Editions at the School of Salamanca Text Collection’.
(Talk given at TEI 2022 conference in Newcastle University, https://zenodo.org/record/7101456)
The topic of this year’s TEI conference and members’ meeting — “text as data” — addressed a growing amount and diversity of textual data produced by the humanities projects. With the increase of data there is also an expanding need for its quality assurance. Several research data projects have already assigned specific teams to tackle the task of standardizing the continuous quality management. I refer, for example, to the task area “Standards, Data Quality and Curation” within the NFDI4Culture consortium, or the KONDA project at the Göttingen State and University Library. The XML data production is in fact a process of a continuous validation, correction, and improvement, involving, inter alia, ODD, RelaxNG, and XML Schemata; custom Python and R scripts; the XSLT, XQuery and Schematron routines integrated into a test-driven development frameworks such as XSpec.
In my talk I addressed a rather unconventional way of testing the TEI data, namely printing it. TEI production workflows frequently presuppose HTML and PDF export, the issue I focused on is the diagnostic value of such prints for the quality control.