(Talk given at TEI 2022 conference in Newcastle University, https://zenodo.org/record/7101456)
The topic of this year’s TEI conference and members’ meeting — “text as data” — addressed a growing amount and diversity of textual data produced by the humanities projects. With the increase of data there is also an expanding need for its quality assurance. Several research data projects have already assigned specific teams to tackle the task of standardizing the continuous quality management. I refer, for example, to the task area “Standards, Data Quality and Curation” within the NFDI4Culture consortium, or the KONDA project at the Göttingen State and University Library. The XML data production is in fact a process of a continuous validation, correction, and improvement, involving, inter alia, ODD, RelaxNG, and XML Schemata; custom Python and R scripts; the XSLT, XQuery and Schematron routines integrated into a test-driven development frameworks such as XSpec.
In my talk I addressed a rather unconventional way of testing the TEI data, namely printing it. TEI production workflows frequently presuppose HTML and PDF export, the issue I focused on is the diagnostic value of such prints for the quality control.