The School of Salamanca https://blog.salamanca.school/en Blog Thu, 29 Feb 2024 10:07:21 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://blog.salamanca.school/wp-content/uploads/2015/10/cropped-Bild3-32x32.png The School of Salamanca https://blog.salamanca.school/en 32 32 Wir brauchen Unterstützung und suchen eine studentische Hilfskraft https://blog.salamanca.school/en/2024/02/29/wir-brauchen-unterstutzung-und-suchen-eine-studentische-hilfskraft/ https://blog.salamanca.school/en/2024/02/29/wir-brauchen-unterstutzung-und-suchen-eine-studentische-hilfskraft/#respond Thu, 29 Feb 2024 10:07:21 +0000 https://blog.salamanca.school/?p=1636 Continue reading "Wir brauchen Unterstützung und suchen eine studentische Hilfskraft"]]> Unser Projektteam in Frankfurt a.M. sucht ab sofort eine studentische Hilfskraft. Unterstützung brauchen wir vor allem bei der Qualitätskontrolle der Scans der frühneuzeitlichen Drucke, damit die Transkriptionsteams mit einer vollständigen und sauberen Datengrundlage arbeiten können. Außerdem geht es um Scannen und OCR für moderne Drucke, um Suche und Beschaffung von internationaler Forschungsliteratur und – wenn daran Interesse besteht – um Mitarbeit bei der xml-Kodierung unserer Quellen- und Wörterbuchtexte.

Wir bieten eine gute und produktive Arbeitsatmosphäre in einem interdisziplinären und internationalen Team, dazu Unterstützung beim Verfolgen eigener wissenschaftlichen Interessen, wenn sie im thematischen Rahmen des Projekts oder in den Digital Humanities liegen.

Alle Einzelheiten gibt es hier:

]]>
https://blog.salamanca.school/en/2024/02/29/wir-brauchen-unterstutzung-und-suchen-eine-studentische-hilfskraft/feed/ 0
The School of Salamanca Takes Part in the Kick-off Meeting of the New Max Planck Partner Group in Trento https://blog.salamanca.school/en/2023/12/14/the-school-of-salamanca-takes-part-in-the-kick-off-meeting-of-the-new-max-planck-partner-group-in-trento/ https://blog.salamanca.school/en/2023/12/14/the-school-of-salamanca-takes-part-in-the-kick-off-meeting-of-the-new-max-planck-partner-group-in-trento/#respond Thu, 14 Dec 2023 09:48:01 +0000 https://blog.salamanca.school/?p=1627 Continue reading "The School of Salamanca Takes Part in the Kick-off Meeting of the New Max Planck Partner Group in Trento"]]>

On November 28, 2023 there was a kick-off meeting of the Max Planck Partner Group ‘The Production of Knowledge of Normativity and the Early Modern Book Trade’ led by Dr. Manuela Bragagnolo (Max Planck Institute for Legal History and Legal Theory, University of Trento). The meeting took place at the School of International Studies of the University of Trento and was organised in cooperation with the permanent seminar ‘Legal History Meets Digital Humanities’, hosted by the Department ‘Historical Regimes of Normativities’ (mpilhlt), as a part of a joint workshop ‘From the Age of the Printing Press to the Digital Age: How Knowledge of Normativity is Produced in Books’.

Project team members Christiane Birr and Polina Solonets presented their joint digitization work in the Salamanca project to the partner group members in a talk entitled ‘From Paper to Screen. Luis de Molina’s ‘De Justitia et Jure’ between Printing Press and Digital Edition’.

The kick-off meeting was followed by an away session of the seminar ‘Legal History Meets Digital Humanities’ on 29 November, where Natalia Maillard Alvarez (University of Seville) gave a very interesting and practice-oriented presentation on ‘Social Network Analysis for Historians: Origins, Uses and Challenges’.

]]>
https://blog.salamanca.school/en/2023/12/14/the-school-of-salamanca-takes-part-in-the-kick-off-meeting-of-the-new-max-planck-partner-group-in-trento/feed/ 0
Digital Longevity: Learnings from the (Digital) History Project Stadt.Geschichte.Basel https://blog.salamanca.school/en/2023/12/07/digital-longevity-learnings-from-the-digital-history-project-stadt-geschichte-basel/ https://blog.salamanca.school/en/2023/12/07/digital-longevity-learnings-from-the-digital-history-project-stadt-geschichte-basel/#respond Thu, 07 Dec 2023 13:05:15 +0000 https://blog.salamanca.school/?p=1614 Continue reading "Digital Longevity: Learnings from the (Digital) History Project Stadt.Geschichte.Basel"]]>

Event Details:
Date: 12 December 2023
Time: 15 PM – 17 PM (CET)
Speaker: Dr. Moritz Mähr (University of Basel)
Location: mpilhlt or online
Room: Z01
Host: Polina Solonets
Contact: dhseminar@lhlt.mpg.de

In this session of the seminar series ‘Legal History Meets Digital Humanities’ we will discuss the topic of digital longevity. Our guest speaker Dr. Moritz Mähr will present the experiences of the project Stadt.Geschichte.Basel in applying sustainability principles to the research practice of a digital history project. The talk will highlight strategies for achieving long-term preservation and openness, as well as the challenges involved.
More information about the event and registration here.

]]>
https://blog.salamanca.school/en/2023/12/07/digital-longevity-learnings-from-the-digital-history-project-stadt-geschichte-basel/feed/ 0
Social Network Analysis for Historians: Origins, Uses and Challenges https://blog.salamanca.school/en/2023/11/21/social-network-analysis-for-historians-origins-uses-and-challenges/ https://blog.salamanca.school/en/2023/11/21/social-network-analysis-for-historians-origins-uses-and-challenges/#respond Tue, 21 Nov 2023 15:08:36 +0000 https://blog.salamanca.school/?p=1596 Continue reading "Social Network Analysis for Historians: Origins, Uses and Challenges"]]> Event Details:

Date: 29 November 2023

Time: 15 PM – 17 PM (CET)

Speaker: Natalia Maillard Álvarez (Pablo de Olavide University)

Location: University of Trento (Palazzo Paolo Prodi) and online

Room: aula Targetti

Host: Polina Solonets, Manuela Bragagnolo

Contact: dhseminar@lhlt.mpg.de

In this away session of the seminar series ‘Legal History meets Digital Humanities’, that will take place at the University of Trento, we will discuss the use of social network analysis for historical research with Dr. Natalia Maillard Álvarez, who is lecturer in Early Modern History at the Department of Geography, History and Philosophy of the University Pablo de Olavide (Seville, Spain). In her research, Álvarez focuses on book trade and book circulation during the Early Modern period.

The away session of the seminar is organised as a part of a joint event between Max Planck Partner Group ‘The Production of Knowledge of Normativity and the Early Modern Book Trade’ and the permanent seminar ‘Legal History meets Digital Humanities’ the international workshop ‘From the Age of the Printing Press to the Digital Age: How Knowledge of Normativity is Produced in Books’ (28-29 November 2023).

The seminar will take place in a hybrid format. Registration: https://www.lhlt.mpg.de/events/36094/2077701.

]]>
https://blog.salamanca.school/en/2023/11/21/social-network-analysis-for-historians-origins-uses-and-challenges/feed/ 0
The School of Salamanca presents at DARIAH Annual Event 2023 in Budapest https://blog.salamanca.school/en/2023/07/13/the-school-of-salamanca-presents-at-dariah-annual-event-2023-in-budapest/ https://blog.salamanca.school/en/2023/07/13/the-school-of-salamanca-presents-at-dariah-annual-event-2023-in-budapest/#respond Thu, 13 Jul 2023 10:28:50 +0000 https://blog.salamanca.school/?p=1564 Continue reading "The School of Salamanca presents at DARIAH Annual Event 2023 in Budapest"]]> Polina Solonets and Maxim Kupreyev,  members of the project team, participated in the poster presentation as a part of the DARIAH Annual Event 2023 taking place on June 6th to June 9th in Budapest. This year the conference topic was ‘Cultural Heritage Data as Humanities Research Data?’. Polina and Maxim introduced their approach to sustainable workflow organisation when working on a large scale edition and presented a poster entitled: ‘Sustainable Practices for the Large-Scale TEI Editions at the School of Salamanca Text Collection’.

The project “The School of Salamanca” is creating a freely-accessible online collection of texts produced in the intellectual centre of the Spanish monarchy during the 16th and 17th centuries. Currently 33 works have completed the production cycle (out of total 116) which includes TEI XML encoding, HTML export for the online access and full-text search, IIIF presentation APIs, PDF, and RDF export. The development of a sustainable workflow for the project has been influenced by the massive size of our textual collection and its unique features. In preparing editions of Early Modern Latin and Spanish texts it is crucial to take into account their inherent instability, i.e. heterogeneous structures, orthographic, typographical, and punctuation variations etc. Our editorial principles were therefore shaped by the necessity to trace and reproduce the development steps at any given moment; to reuse tools independent of the context and individual texts; to scale the complex processing tasks; to perform constant data quality checks, and to document the requirements and the results.

Salamanca’s workflow shares common ground with both Waterfall and Agile development techniques. The concept of pipeline, inherent to Waterfall, is in the centre of Salamanca’s editorial technique: the production on the edition consists of a number of steps executed sequentially, where the output of each stage serves as the input for the next one.

Digitization →  Transcription in TEI TITE → Structural annotation → TEI transformation  → Manual and automatic corrections  →  HTML, PDF, IIIF, and RDF generation.

The advantage of such a predefined sequence is its reproducibility, where each part of an editorial process can be restored at any time. It makes individual work steps traceable and enables comprehensive documentation in form of the program code and editorial guidelines.

In Agile, the software product is built in small chunks, and each of the development cycles includes feature clarification, design, coding, testing and deployment. For this purpose Agile integrates the software development, testing and operations teams in a single collaborative iterative process. In Salamanca’s adoption of Agile practises each of the above-mentioned development stages contains the definition of the requirements, development and quality assurance. This also means that each stage of the production of the digital edition delivers a part of the overall product features, which can be accessed and disseminated. 

For example, the QA routines after step 1 – digitization of the print originals – allow us to publish the IIIF presentation manifests even before the TEI transcription starts. IIIF manifests are later enriched with additional data, pertaining to chapters and pagination. The same applies to PDF generation – it was initially intended to be one of the export methods, located at the end of the workflow. Yet, when implemented, it exposed a number of semantic and structural inconsistencies of the source XML. We therefore decided to use PDF earlier in the pipeline as a diagnostic tool and data quality service. In our talk we will show how the sustainable editorial workflow, adapted for processing of large-scale textual sources, translates into the delivery of high-quality data.

Poster DARIAH_2023

]]>
https://blog.salamanca.school/en/2023/07/13/the-school-of-salamanca-presents-at-dariah-annual-event-2023-in-budapest/feed/ 0
The ‘Assertive Edition’. Seminar with Georg Vogeler, 25.7.2023 https://blog.salamanca.school/en/2023/06/27/the-assertive-edition-seminar-with-georg-vogeler-25-7-2023/ https://blog.salamanca.school/en/2023/06/27/the-assertive-edition-seminar-with-georg-vogeler-25-7-2023/#respond Tue, 27 Jun 2023 10:22:25 +0000 https://blog.salamanca.school/?p=1551 Continue reading "The ‘Assertive Edition’. Seminar with Georg Vogeler, 25.7.2023"]]> In the last months, the Salamanca team has joined forces with colleagues from the Max Planck Institute of Legal History and Legal Theory in Frankfurt a.M. to set up the Permanent Seminary ‘Legal History Meets Digital Humanities’. As our own work centers on creating digital editions of the Salamancan authors, we are especially happy that Georg Vogeler from the Centre for Digital Humanities at the University of Graz (Austria) followed our invitation and will be our guest on July 25, 2023, 15.00-17.00.

Academic disciplines such as philosophy, theology, and jurisprudence tend to regard the mediality of texts as a matter of secondary importance, because they understand them primarily as a means of discussing concepts and the relations between them, using established terminologies in the debate. For these purposes, philological editing methods appear to be relevant only when there is “substantial” variance, which means a textual variance that generates different concepts and changes their relationships.
Historians go even further when they want to critically compare the facts reported in the texts. In this case, linguistic variance becomes even less significant. Therefore, Vogeler would like to discuss with the participants of the seminar: a) whether it is also possible to investigate the factual referents behind the linguistic expression in legal history and b) whether the methods he has proposed to capture the level of meaning in texts seem feasible in editing practice.

Georg Vogeler is a historian with an interest in the Late Middle Ages, particularly medieval administrative documents and diplomatics. His research encompasses Digital Scholarly Editing, Semantic Web technologies, Data Modelling, and application of Data Science to the Humanities.

The event is organised in a hybrid mode. Please register here: https://www.eventbrite.de/e/the-assertive-edition-hybrid-event-tickets-667358044877?aff=oddtdtcreator

]]>
https://blog.salamanca.school/en/2023/06/27/the-assertive-edition-seminar-with-georg-vogeler-25-7-2023/feed/ 0
WE ARE HIRING: Researcher in Political Philosophy, Moral Theology or Similar https://blog.salamanca.school/en/2023/05/17/we-are-hiring-researcher-in-political-philosophy-moral-theology-or-similar/ https://blog.salamanca.school/en/2023/05/17/we-are-hiring-researcher-in-political-philosophy-moral-theology-or-similar/#respond Wed, 17 May 2023 08:06:06 +0000 https://blog.salamanca.school/?p=1542 Continue reading "WE ARE HIRING: Researcher in Political Philosophy, Moral Theology or Similar"]]> The Salamanca Team is looking for a new researcher to work with us on the forthcoming Dictionary of the Juridical-Political Language of the School of Salamanca. If your expertise is in political philosophy, history of political ideas, philosophy of law, moral theology or similar of the early modern period, we would love to hear from you!

All the details of the position and the application are here:

https://www.uni-frankfurt.de/48794849/FB08___Philosophie_und_Geschichtswissenschaften

(text in German and English, scroll down for the English version).

If you have any questions concerning the project, the team, the position or the application process, please don’t hesitate to drop a line to Christiane Birr, the project’s coordinator: birr@lhlt.mpg.de. Looking forward to hearing from you!

]]>
https://blog.salamanca.school/en/2023/05/17/we-are-hiring-researcher-in-political-philosophy-moral-theology-or-similar/feed/ 0
Word frequencies in the Digital Collection of Sources in the Works of the School of Salamanca. https://blog.salamanca.school/en/2023/01/26/word-frequency/ https://blog.salamanca.school/en/2023/01/26/word-frequency/#respond Thu, 26 Jan 2023 14:21:23 +0000 https://blog.salamanca.school/?p=1414 Continue reading "Word frequencies in the Digital Collection of Sources in the Works of the School of Salamanca."]]> The fundamental importance of the School of Salamanca for the early modern discourse about law, politics, religion, and ethics is widespread among of philosophers and legal historians. These early modern texts extend beyond the core authors, and serve to analyze the history of the Salamanca School’s origins and influence, as well as its internal discourse contexts within the context of the future dictionary entries.
Especially on the topic of dictionaries, the idea to test and explore our corpus with modern NLP applications came up in the project a long time ago. We often asked ourselves which lemmas or information we could find with help of a text analysis, and above all how complex this realization would be with our data. Thus, in 2021 we started a natural language processing task (word frequency distribution) by using the Python programming language to explore our corpus and establish groundwork for further text mining.

Corpus description

The digital collection of the School of Salamanca consists of 116 works in early modern Latin and Spanish, available in 16th- and 17th-century prints.
Regarding the stage of text edition, the corpus had 79 works in process of digitization and text capture; 13 works were in part automatically corrected and awaiting a manual scholarly correction; 24 works were as full-text published online. It means, they went through a manual scholarly correction and editing consisting of correcting erroneous transcriptions, original printing errors, and above all resolving brevigraphs and word separations, which were not automatically corrected. Although these last group of works consists one-fifth of the whole corpus, their content (e.g., author selection, topics, and structures) served as solid data representation for our analysis.
As a result, we compiled 24 works (14 in Latin and 10 in Spanish) as our input dataset for the NLP task.

The School of Salamanca, Corpus 2021.

We also classified the works by discipline and for technical reasons by language, since the tools are language dependent. This process yielded to the following three groups: Civil Law, Canon Law and Theology.

List with the thematic and technical classification of works. See List of Works with links to the full text version.

These published texts are available in two reading views; in a diplomatic form (close to the original source text) as well as in a constituted (normalized) form, which also presents the expansions made by the scholar editors (see Vitoria, Confessionario both in diplomatic and constituted view). Therefore, the constituted view was chosen as a more suitable version, which simplifies text processing with current libraries. Lastly, we compile the selected texts in plain text (TXT) format directly from our website.

NLP applications

Steps of our NLP analysis

As above mentioned, our dataset was composed by 14 Latin and 10 Spanish works, which were processed with two different python libraries, namely, CLTK (Classical Language Toolkit) for Latin and spaCy for Spanish. Unfortunately, CLTK does not yet provide a module to process early modern Spanish; that is why, spaCy was selected as the best alternative for this text analysis. Even though it is used to process modern languages, the use of spaCy modules has shown to be very helpful for diverse NLP (Natural Language Processing) tasks.

Three important steps compose our analysis approach: normalization, lemmatization, and conditional frequency distribution (see image Steps of our NLP analysis).
In normalization, the dataset is cleaned from stop words, stripped of unnecessary characters (e.g., “\n”, empty spaces, or irrelevant numbers). This process depends on text structure and its language representation. As a matter of fact, the cleaning process applied on our Latin and Spanish collection was adapted based on the above-mentioned criteria. For instance, we created our own stop word lists for both, Latin and Spanish, taking into account our specific text phenomena: numerals, one letter references, paragraph marks (pilcrow ¶), etc.

In the Latin dataset, it was necessary to replace the letters J/j by I/i and V/v by U/u and to remove accents or macrons, thus improving word reduction to definite base form in the lemmatization process . Similarly, filtering words was an important normalization phase in both datasets. Ruling out words that comprise just one letter, enabled the program to focus on meaningful lexemes, whose length is composed of two or more letters, and ignore irrelevant words and unimportant characters. The following function describes the process.

Normalize function for Latin in Python

After normalizing both datasets, the second step was lemmatization, in which each lexeme is reduced to its base form (e.g., obligado –> obligar, iuris –> ius). As above mentioned, two different lemmatizers were used for this process. On the one hand, LatinBackoffLemmatizer for Latin, and on the other hand, spaCy Lemmatizer for Spanish texts, which we adapted to work with some word variations of the early modern Spanish.

Lastly, we reached the most essential step of our approach: the quantification phase, namely, conditional frequency distribution. After extracting the lemma of each token, we calculated the occurrences of each of them in the text, for example, the lemma ius appears 22217 times across the whole Latin dataset. We focused on the top 100 lemmas with the highest frequency, which represents the level of importance inside the data collection.

The output of our results was stored in an Excel spreadsheet, since it is a well-known format, which allowed our interdisciplinary team to use it, in case we wanted to conduct further analysis. For more technical details about the Python code, and the results are posted on GitHub.

Results

We obtained six lists (every list is associated to its discipline and language), each containing the top 100 most frequent lemmas. The list structures present an ID, which indicates the ranking position of the lemma, lemma frequency (it displays the number of occurrences of the lemma inside the text) and lexeme column. To enable a better visualization here, the lexemes were excluded, and only the first five lemmas and their frequencies are presented respectively. See results in GitHub. In addition, we produced two more lists, a Top-100 list pro language, to have a general view of the corpus. And last but not least, as visualization, we employed word clouds to represent the results of these word frequencies.

Challenges and Outlook

Working with early modern works brought up challenges during the process. Regarding lemmatization in Spanish, as spaCy’s model is based on modern Spanish, it was necessary to group lexemes with spelling variations to a specific lemma (iglesia/yglesia –> iglesia). Thus, the program calculated the frequency correctly. Similarly, in Latin, the JVReplacer normalized certain lemmas, since the Latin alphabet does not distinguish between J/j and I/i and V/v and U/u (usura/vsura). Fortunately, this model was already available in CLTK.

On the other hand, our corpus contains mixed language texts. It means, the main language is Latin; however, there are quotations or marginal notes in Spanish and vice versa. Working with the selected corpus did not show a great difference, since these passages were short. Nonetheless, it is a task to be solved, as we will edit texts containing a significant mixture of both languages (long glosses written in Latin, whereas the main text is written in Spanish. See Alfonso de León & Gregorio López: Las Siete Partidas. (Mehrbandwerk)).

Another factor that may influence the general quantification, is the difference in the size of the texts in our heterogeneous corpus. Two of our works can illustrate this case: Vitoria’s Confessionario is 47 pages long, whereas Politica Indiana is 1190.

A future challenge may present itself in the expansion of our digital collection.  As a matter of fact, the more works that are added to our data collection (which is highly expected), the longer it will take to process them. That is why, we are currently working on developing and updating our NLP methodology to process faster and efficiently larger amount of texts, in order to not slow down the code run time.

In the future, we would like to conduct further analysis with diverse NLP applications, such as NER (Named Entity Recognition), POS tagging (Part-of-Speech), or implementing our methodology with the application of softwares for text analysis.

List of Works

  1. Vitoria, Confessionario (2018 [1562]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0015>
  2. Castillo, Tratado de Cuentas (2018 [1522]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0004>
  3. Vitoria, Relectiones Theologicae XII (2018 [1557]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0013>
  4. Las Casas, Treinta Proposiciones (2018 [1552]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0034>
  5. Vitoria, Summa Sacramentorum (2018 [1561]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0014>
  6. Azpilcueta, Manual de Confessores y Penitentes (2019 [1556]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0002>
  7. Mercado, Tratos y Contratos (2019 [1569]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0007>
  8. Solórzano Pereira, Politica Indiana (2019 [1648]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0010>
  9. Báñez, De Iure et Iustitia Decisiones (2019 [1594]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0003>
  10. Cano, Relectio de Poenitentia (2019 [1558]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0030>
  11. Avendaño, Thesaurus Indicus (2019 [1668]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0001>
  12. Sepúlveda, Apologia pro libro de iustis belli causis (2020 [1550]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0095>
  13. Nebrija, Lexicon Iuris Civilis (2020 [1537]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0078>
  14. Soto, De Iustitia et Iure (2020 [1553]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0011>
  15. Villalón, Provechoso tratado de cambios y contrataciones de mercaderes y reprovación de usura. (2020 [1541]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0113>
  16. Vacca, Expositiones locorum obscuriorum et Paratitulorum in Pandectas. (2020 [1554]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0103>
  17. Albornoz, Arte de los contractos (2020 [1573]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0017>
  18. Carrasco del Saz, Tractatus de casibus curiae. (2020 [1630]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0033>
  19. Freitas, De Iusto Imperio Lusitanorum Asiatico (2020 [1625]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0046>
  20. Covarrubias y Leyva, Opera Omnia (2021 [1571]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0006>
  21. León Pinelo, Confirmaciones Reales de Encomiendas (2021 [1630]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0061>
  22. Solórzano Pereira, De Indiarum Iure ( [1629]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0096>
  23. Pedraza, Summa de casos de consciencia (2021 [1568]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0083>
  24. Díaz de Luco, Practica criminalis canonica (2021 [1554]), in: The School of Salamanca. A Digital Collection of Sources <https://id.salamanca.school/texts/W0041>
]]>
https://blog.salamanca.school/en/2023/01/26/word-frequency/feed/ 0
Back to analog: the added value of printing TEI editions https://blog.salamanca.school/en/2022/12/06/back-to-analog-the-added-value-of-printing-tei-editions/ https://blog.salamanca.school/en/2022/12/06/back-to-analog-the-added-value-of-printing-tei-editions/#respond Tue, 06 Dec 2022 18:56:20 +0000 https://blog.salamanca.school/?p=1261 Continue reading "Back to analog: the added value of printing TEI editions"]]> (Talk given at TEI 2022 conference in Newcastle University, https://zenodo.org/record/7101456)

The topic of this year’s TEI conference and members’ meeting — “text as data” — addressed a growing amount and diversity of textual data produced by the humanities projects. With the increase of data there is also an expanding need for its quality assurance. Several research data projects have already assigned specific teams to tackle the task of standardizing the continuous quality management. I refer, for example, to the task area “Standards, Data Quality and Curation” within the NFDI4Culture consortium, or the KONDA project at the Göttingen State and University Library. The XML data production is in fact a process of a continuous validation, correction, and improvement, involving, inter alia, ODD, RelaxNG, and XML Schemata; custom Python and R scripts; the XSLT, XQuery and Schematron routines integrated into a test-driven development frameworks such as XSpec.

In my talk I addressed a rather unconventional way of testing the TEI data, namely printing it. TEI production workflows frequently presuppose HTML and PDF export, the issue I focused on is the diagnostic value of such prints for the quality control.

The School of Salamanca project, its production worklow and printing practice

The project “The School of Salamanca. A Digital Collection of Sources and a Dictionary of its Juridical-Political Language” is jointly sponsored by the Academy of Sciences and Literature Mainz, Max Planck Institute for Legal History and Legal Theory and Goethe-University Frankfurt am Main. It aims at creating an online collection of important texts produced by the philosophers, jurists and theologians related to the University of Salamanca — the intellectual center of the Spanish monarchy during the 16th and 17th centuries.

The edition will contain 116 works, including more than 108 000 printed pages of Early Modern Latin and Spanish texts encoded in TEI XML. In addition, we also compose a historic dictionary of approximately 300 essential terms, rendering the  importance of the School of Salamanca for the early modern discourse about law, politics, religion, and ethics.

Currently 36 works have completed the production cycle which includes HTML export for online access and full-text search, IIIF Image and Presentation APIs, RDF and TXT export. Recently, PDF output option was also added, and it had a direct impact on our workflow and data quality control. It is now implemented early in the TEI production as a useful diagnostic tool, exposing semantic and structural inconsistencies of the data.

Production workflow

Salamanca’s production workflow consists of a number of steps, which are familiar to those creating Scholarly Digital Editions (SDEs). They were described in detail in a recent blogpost, here is just a summary:

  1. The first step is the digitization of the print originals held by libraries worldwide. Already at this stage the XML header is compiled for each work with bibliographic metadata and the facsimiles are published on the IIIF Server.
  2. It is followed by the transcription by the external providers, conducted in TEI Tite format, which, compared to the generic TEI All-Schema, has a reduced, compact vocabulary.
  3. The TEI Tite files upon arrival undergo the manual structural annotation, cross-referencing and resolution of unclear marks.
  4. Only after that the text is automatically transformed into a project-specific TEI. The resulting document contains both the metadata in <teiHeader>, and all elements and attributes adapted to TEI-All.
  5. TEI transformation is followed by a sequence of automatic corrections and enrichment routines, driven by XSLT templates. They apply to the annotation of hyphenated words, abbreviations and special characters resolution and xml:id tagging.
  6. Then, the texts go to the manual correction and post-correction, where the remaining transcription and typographical errors are resolved.
  7. The final step is the production and delivery of the derived data formats taking place in the Exist DB. These are HTML pages, IIIF Presentation manifests, Search index and crumb trails, plaintext and RDF data creation, and PDF files.

XSL Formatting Objects

Our printing process works through XSL Formatting Objects (FO) technology. Although the latest XSL-FO 1.1 specification dates back to 2006, this format is still widely used. The reasons might lie in free Apache FO Processor, integrated in Oxygen XML Editor, while its current replacement CSS Paged Media is proprietary. The latest Apache FOP 2.8 was released in November 2022, and its commercial counterparts, such RenderX XEP Engine and Antenna House Formatter, supporting XSL-FO, are regularly updated. At Salamanca we made some preliminary tests with XSL-FO in 2018, and when the production started in 2021, we adopted and upgraded the existing templates. The functionality is not yet fully implemented on Salamanca’s production server, the preliminary PDF output along with the XSL Template can be found in our Github repository.

Book components

In the FO routine the original XML is transformed to XSL-FO document which is then picked up by the Apache FO Processor and converted to PDF. Our task was to create the generic transformation rules, which would apply to all 116 works and render them in print as close to the original as possible. Our template defines nine book components, related to the abstract „page masters” in XSL-FO terminology:

    • The first component is the so-called “Half title“ or “Schmutztitel”, containing the short title of the work and the name of its author . The second is the “Frontispiece”, rendering the scan of the title page of the original.

  • These are followed by the title page of the digital edition and the edition notice.

  • Only after that the book contents per se are delivered. First comes the title
    page of the original rendered from <titlePage>.

  • The introduction section comes next, generated from the <tei:div> elements
    of <tei:front>, followed by the contents of <tei:body>.

  • The eighth section delivers the contents of <tei:back>. The publication concludes with endnotes.

The generic template allows dynamically duplicating or removing the components if, for example, the original contains two title pages (one of the volume series and one of the current volume), or when it lacks the introduction section and the pagination should start from the body.

We do not follow the canons of Western book page design, which places the center of the page area above the center of the page with the gutter margin traditionally narrower than the fore-edge margin. This is because the publication is supposed to be printed or viewed as A4 pages (page-height=”29.7cm” page-width=”21cm”). The XSL-FO “page masters” prescribe three types of margins on each page side, two of which are not printable, and one containing a header (with the work title and the author) and a footer (with page numbers) elements. Here an approximate layout of the page with the dimensions of the top margins:

It is quite useful to define the print area as a one-cell table, the borders of which can be activated in debugging sessions to control the margins and other layout features.

PDF export as a diagnostic tool

PDF production helped us to diagnose some of the XML problems. The XML issues we encountered were of different types and apply to a) the order and the position of elements, b) cross-referencing, c) character encoding, and d) text mark-up.

Order and position of XML elements

Marginal notes and milestones

Marginal notes at Salamanca usually contain two elements – the anchor in the text and the note body itself. The anchors can be alphabetical, numerical, or expressed with a symbol.

Sometimes the anchor is missing in the main text, and the note is just located next to the line it refers to. In XML this corresponds to the position before the <lb> element. The number of marginal notes in one document can be up to 8,000.

A substantial difficulty of displaying non-anchored marginal notes in PDF is conditioned by the fact that in the original they “float” alongside the main text. In XML they are encoded before the line beginning (<lb>), which does not correspond to the line beginning in PDF. Apache FO processor has insufficient support for floats, and the formatters which fully support them, such as Antenna house or RenderX, are proprietary. Considering this, we decided to unify the representation of anchored and unanchored notes – they all get a numerical mark and an anchor in the main text in PDF. In addition, we place all the notes as endnotes in a separate section, creating cross-references between them and anchors.

Milestones have a structure similar to marginal notes – they have an anchor in text and a related mark on the margin, usually an Arabic number. In addition, they frequently have another essential component – an item in the summary which they are linked to.

PDF export highlighted a particular aspect of the encoding of non-anchoted marginal notes and milestones, situated at line end containing a word break. In this case the anchor appears in the middle of the word.

This case pointed at the usefulness of data visualization even at the stage of data model design. From the print perspective it would be practical to encode marginal notes and milestones not at word break, but before or after the word it refers to.

Title page and its constituents

The PDF printout can be used to test rules regarding the order of XML elements. In this title page, for example, the first paragraph was encoded in the element <byline>, which in all other works follows the title. Correspondingly, it was printed in the wrong position.

Alternatively, an item will not be rendered in PDF if it is located in a different position in the hierarchy than prescribed by the template.

Cross-referencing

The PDF layout is a useful tool for checking the cross-references within the same document. In a given example the item in the “Summary” is lacking cross-reference, but the reason for that was a duplicated milestone number it refers to. This exceptional behavior was a collateral bug of another improvement and was not caught up by Schematron.

The same effect may occur if the reference is made to a separate document, not included in the current publication. This functionality can be implemented in HTML, but in PDF it leads to a missing cross-reference .

The actual design of a reference is another feature which can be tested – namely, which part of the string should serve as an anchor? In the example below two different conventions were chosen for referencing the section of the book from its summary.

Special characters and abbreviations

PDF renders a constituted version of Salamanca TEI encoding, meaning that selected special characters and abbreviations are expanded. We display the standardized versions of characters such as Latin ⁊ (et) and Latin ſ (long s). In addition, Latin, Greek, Hebrew, and Arabic characters occur and need special fonts to be rendered properly. PDF can thus be used as a handy copy-editing and proofreading tool.

Text mark-up

PDF helps to control the uniformity of text mark-up. This can apply to capitalization, italicized script, superscript, initial, and bold parts of the text. In the example below two functionally identical passages had different encodings, the second one marked as header.

PDF export helps to visualize the semantics of the encoding – the aspect which cannot be controlled by the XML Schemata and Schematron.

PDF impact on data quality assurance

PDF creation thus assists the data quality assurance in two ways: on the one hand, it raises formal errors in the code and on the other — it facilitates manual corrections by providing a new “optical impression” of the data where certain kinds of inconsistencies stand out more clear to a human eye. Some of the issues raised depend on the definition of an “error”, which in the context of scholarly digital editions is a rather challenging task. In the classical software world “bug” is a deviation from the desired behavior defined in the software specification (Crispin 2008: 416). The descriptive nature of XML-based digital editions includes an element of ambiguity and interpretation (Wenz, Kesper, and Taentzer 2022). The identification of an error thus requires a skill set which is different from that of a
classical software tester.

I have mentioned above that at Salamanca the steps of the production pipeline are executed sequentially, in a “waterfall” model, where the output of each stage is validated by the respective Schematron file. The final XML is then controlled by the RelaxNG schema. The PDF generation was initially intended to be one of the export methods of the TEI data, located at the very end of the product development. As soon as it was implemented, we realized that this type of data visualization can be used to expose semantic and structural discrepancies in the source data. We therefore moved PDF export up the TEI text workflow.

This implementation of PDF print bears similarity to the so-called Agile development method. In Agile the software product is built in small progressive chunks, and each of the development cycles includes feature clarification, design, coding, and testing. It is conducted by cross-functional teams of people who house a range of expertise including programming, testing, analysis, database administration, user experience and infrastructure (Black 2017: 7).

(Image source: http://crmsearch.com/images/agileandwaterfall.gif)

PDF creation at Salamanca abides by the principles of Agile software testing. It does not only start early in the data development and repeats with every subsequent step. It also breaks the traditional boundary between the software developers and researchers. In such QA4DH model researchers function as testers providing a direct feedback to the encoding team and thus actively participating in the development process. Researchers are used to dealing with a high degree of uncertainty, where not the software specification, but the deep knowledge of the subject and intuition are necessary to distinguish a “bug” from a “feature”. These two aspects — “fuzziness” of an error and a researcher skill set required from a tester — is what differentiates the conventional software quality assurance from quality assurance in digital humanities.

Bibliography

  • Black, Rex, ed. 2017. Agile Testing Foundations: An ISTQB Foundation Level Agile Tester Guide. Swindon, UK: BCS, The Chartered Institute for IT.
  • Crispin, Lisa. 2008. Agile Testing: A Practical Guide for Testers and Agile Teams. 1st ed. Upper Saddle River, NJ: Addison-Wesley Professional.
  • Wenz, Viola, Arno Kesper, and Gabriele Taentzer. 2022. “Classification of Uncertainty in Descriptive Data Representing Scientific Knowledge,” March.
    https://doi.org/10.5281/zenodo.6327011.
]]>
https://blog.salamanca.school/en/2022/12/06/back-to-analog-the-added-value-of-printing-tei-editions/feed/ 0
Salamanca Colloquium: Luis de Molina on African Slavery https://blog.salamanca.school/en/2022/05/02/salamanca-colloquium-luis-de-molina-on-african-slavery/ https://blog.salamanca.school/en/2022/05/02/salamanca-colloquium-luis-de-molina-on-african-slavery/#respond Mon, 02 May 2022 09:44:56 +0000 https://blog.salamanca.school/?p=1228 Continue reading "Salamanca Colloquium: Luis de Molina on African Slavery"]]> On Wednesday, May 11, 14.30, we invite all interested researchers to the next Salamanca Colloquium which will focus on “Luis de Molina on African Slavery“.

Jörg Tellkamp (UAM, Mexico City), together with Daniel Schwartz (The Hebrew University of Jerusalem): “Luis de Molina on slaves as subjects of rights”

Anne-Charlotte Martineau (CNRS): “Reading Molina’s Disputationes on slavery through an international legal lens“

Luis de Molina (1535-1600) is one of the most prominent authors of the School of Salamanca. Especially remarkable are his extended writings on the topic of slavery and the European trade of African slaves. The interest of historians in Molina has led to his writings being translated from their original Latin into modern languages. Currently, Anne-Charlotte Martineau and Jörg A. Tellkamp (together with Daniel Schwartz from The Hebrew University of Jerusalem), work on translations of Molina’s Disputations on the slavery of Africans into French respectively English.

The colloquium will be held in a hybrid format; for participants interested in joining it virtually, please contact Christiane Birr (birr[at]lhlt.mpg.de) for access details.

]]>
https://blog.salamanca.school/en/2022/05/02/salamanca-colloquium-luis-de-molina-on-african-slavery/feed/ 0