Polina Solonets and Maxim Kupreyev, members of the project team, participated in the poster presentation as a part of the DARIAH Annual Event 2023 taking place on June 6th to June 9th in Budapest. This year the conference topic was ‘Cultural Heritage Data as Humanities Research Data?’. Polina and Maxim introduced their approach to sustainable workflow organisation when working on a large scale edition and presented a poster entitled: ‘Sustainable Practices for the Large-Scale TEI Editions at the School of Salamanca Text Collection’.
The fundamental importance of the School of Salamanca for the early modern discourse about law, politics, religion, and ethics is widespread among of philosophers and legal historians. These early modern texts extend beyond the core authors, and serve to analyze the history of the Salamanca School’s origins and influence, as well as its internal discourse contexts within the context of the future dictionary entries.
Especially on the topic of dictionaries, the idea to test and explore our corpus with modern NLP applications came up in the project a long time ago. We often asked ourselves which lemmas or information we could find with help of a text analysis, and above all how complex this realization would be with our data. Thus, in 2021 we started a natural language processing task (word frequency distribution) by using the Python programming language to explore our corpus and establish groundwork for further text mining. Continue reading “Word frequencies in the Digital Collection of Sources in the Works of the School of Salamanca.”
Since its beginning in 2013, the Salamanca Project has been developing a text editing workflow based on methods and practices for sustainable and scalable text processing. Sustainability in text processing encompasses not only reusability of the tools and methods developed and applied, but also long-term documentation and traceability of the development of the text data. This documentation will provide an important starting point for future research work. Moreover, the text preparation must be scalable, since the Digital Source Collection comprises a relatively large mass of texts for a full-text digital edition project: in total, it will involve more than 108,000 printed pages from early modern prints in Latin and Spanish, which must be edited in an efficient and at the same time quality-assured manner.
In the following, I will introduce the sequence of stages that each work in the Digital Source Collection goes through, from locating a suitable digitization template in a public library to metadata and the completion of a full text in TEI All format, enriched with the project’s specifications. Continue reading “The School of Salamanca Text Workflow: From the early modern print to TEI-All.”
Textual sources from the early modern period pose specific challenges with regards to their digital acquisition, analysis, and representation. First, the full-text digitization of sources is elaborate and time-consuming, and it still is far away from being fully automatable in a way sufficient to the needs of scholarly researchers. Second, the digital analysis of such data relies on specific historical and textual data models and ontologies and needs to be conducted, on the one hand, through time-consuming scholarly studying and manual annotation of the texts; automatic applications, on the other side, face a significant shortage of natural language processing tools applicable to “low-resourced” languages such as early modern Spanish or Latin, or of linguistic and semantic resources appropriate for these specific periods and languages. Finally, while discussions about (textual) data representations and visualizations lie at the very heart of current digital humanities endeavours, there still is only rudimentary consensus about best practices in representing early modern texts and data in all their (multimedial/multimodal) variety in digital forms, as well as about their versioning, forms of readerly participation, underlying software architectures, etc.
Some methods and infrastructures that address these challenges are well-known: For instance, the acquisition of historical (text) data may be enhanced through forms of crowdsourcing and collaborative digital editing and technically facilitated by collaborative annotation tools; linked open data models and related infrastructures aim to provide means for annotating data in a way conforming to interoperable, semantic web standards. But also, there is an ever-increasing amount of specific linguistic and geographic resources as well as tools tailored by and for individual early modern research projects – projects that could potentially benefit through communicating and exchanging their resources and tools. In order to strengthen the community-driven use of already existing frameworks and tools, and to communicate methods and tools that were hitherto unknown outside specific use cases, an exchange of theoretical as well as methodological and practical knowledge about digital approaches to early modern historical questions seems crucial.
With the aim of enabling such an exchange on an interdisciplinary and, in a way, also intercultural scale, the Max Planck Institute for European Legal History and the project “The School of Salamanca: A Digital Collection of Sources and a Dictionary of its Juridical-Political Language” organize a workshop with digital humanities experts from Latin America and Europe. The workshop will comprise a small number of individual project presentations as a common ground for in-depth, hands-on discussions about technical tools, methods and frameworks as well as about (linguistic, geographical, ontological, etc.) resources for the study of early modern sources and contexts.
Date: 23 October 2018
Place: Instituto Max Planck Argentina – Instituto de Investigación en Biomedicina de Buenos Aires, C1425FQD, Godoy Cruz 2390, Buenos Aires
The workshop will be held in Spanish, although contributions may be in English as well.
For registration, please contact us until 15 October 2018 at: email@example.com.
14:00 Thomas Duve: Welcome
14:05 Andreas Wagner / David Glück: Introduction
14:15 Andreas Wagner: Digital Humanities Research Related to Ibero-America at the Max Planck Institute for European Legal History
14:45 David Glück: Methods, Frameworks, and Linguistic Resources in the Digital Edition of “The School of Salamanca”
15:30 Hands-On Session: Frameworks, Methods and Tools for Digital Early Modern History
16:30-17:00 Coffee Break
17:00 Gimena del Rio Riande, Romina De León, Nidia Hernández (HD CAICYT Lab, CONICET): Integrating annotation, digital edition tools and GIS resources: Practical experiences from the LatAm project (project presentation and hands on session)
18:30 Closing of the Workshop and Farewell
Das iiif Consortium und die weitere iiif Community haben Standard-Protokolle definiert, wie man sie in der Darstellung von Bildressourcen benötigt. Die Protokolle sind als Beschreibungen von “Schnittstellen” formuliert, d.h. es wird beschrieben, unter welcher Adresse, mit Hilfe welcher Parameter der Dienst eine bestimmte Funktion anbieten soll. In dieser Weise gibt es Beschreibungen von Zoom-/Rotations-/Ausschnitts-/Format-Konversions- und ähnlichen Diensten in der iiif image API, die aktuell in Version 2.1.1 vorliegt. Ferner Beschreibungen von Zugangsmanagement- und Authentifikationsservices sowie von Such-Funktionen in der Authentication API bzw. der Search API, beide jüngeren Datums und erst in der Version 1.0 vorliegend. Beschreibungen von Video- und Audio-Daten (z.B. für die in diesem Fall hinzukommenden Zeit-Indices der Ressourcen) sind in Vorbereitung.
On 1 March 2018, we have released the code of the web application of the ‘School of Salamanca’ project’s digital edition (https://salamanca.school) as free and open source software (under the MIT license) on GitHub: https://github.com/digicademy/svsal, where the development process and the versioning of our web application takes place exclusively from this date onward. The publication of our web application’s code represents the first major code release of the project; other parts of our digital infrastructure and the research data will be published separately.
The web application, now having reached version 1.0, has been developed since 2014 and, more precisely, consists of an eXist-db application package. While this package can be downloaded and deployed in any eXist-db (version 3.6+) instance, it must be mentioned that, in order to function correctly, the web application draws upon the integration with further, external services: for example, an iiif-conformant image server (image and presentation APIs) allowing for the incorporation of facsimile images in the reading views of our works, or a SphinxSearch server providing lemmatized and cross-language search results for the texts. Notwithstanding these current caveats in portability of the application package, and although there still remains much to be achieved with regards to the functionality of our web application, the code underlying central features of the application (such as the endless-scrolling segmentation of texts in the reading view, the content negotiation-based URI-linking of texts and text segments, and others) is fully available now and can be utilized or serve as an example for similar projects, for instance. For a more extensive and detailed description of current features, caveats, and provisos of the application please refer to: https://github.com/digicademy/svsal.
The software is tagged with a DOI so that it can be cited: https://doi.org/10.5281/zenodo.1186521
Starting from experiences of the the philosophical and legal-historical project “The School of Salamanca. A digital collection of sources and a dictionary of its juridical-political language”, this article discusses an experimental approach to the Semantic Web.1 It lists both affirmative reasons and skeptical doubts related to this field in general and to its relevance for the project in particular. While for us the general question has not been settled yet, we have decided early on to discuss it in terms of a concrete implementation, and hence the article will also describe preliminary goals and their implementation along with practical and technical issues that we have had to deal with.
In the process, we have encountered a few difficult questions that — as far as we could determine — involve (arguably) systematic tensions between key technologies and traditional scholarly customs. The most important one concerns referencing and citation. In the following, I will describe a referencing scheme that we have implemented. It attempts to combine a canonical citation scheme, some technologies known primarily from semantic web contexts and a permalink system. Besides the details of our particular technical approach and the very abstract considerations about risks and benefits of the semantic web, I will point out some considerable advantages of our approach that are worthwhile pursuing independently of a full-blown semantic web offering.
Wir laden sehr herzlich ein zur Einreichung von Beiträgen zu einem Diskussionsforum der Zeitschrift RG RECHTSGESCHICHTE – LEGAL HISTORY mit dem Titel “Die geisteswissenschaftliche Perspektive: Welche Forschungsergebnisse lassen Digital Humanities erwarten?/With the Eyes of a Humanities Scholar: What Results Can We Expect from Digital Humanities?” Dieses Diskussionsforum wird durch das Projekt für die Zeitschrift RG RECHTSGESCHICHTE – LEGAL HISTORY organisiert.
Die Beiträge sollen dezidiert die geisteswissenschaftliche Perspektive einnehmen und die bereits gemachten Erfahrungen in der forschungspraktischen Arbeit mit dem Instrumentarium der DH thematisieren. Im Mittelpunkt steht die Interaktion zwischen geisteswissenschaftlichen Erkenntnisinteressen und digitalen Tools: Wie verändern digitale Möglichkeiten Forschungsinteresse und Methode? Continue reading “(Deutsch) CfP: Forum RG RECHTSGESCHICHTE – LEGAL HISTORY 24 (2016)”