The implementation of geo-historical gazetteers increasingly depends upon the development of Natural Language Processing (NLP) and Corpus Linguistics as well as geographical analysis in disciplines such as History, Archaeology and Literary Studies. The application of these methods usually relies on the appropriate modelling of databases for performing the semantic enrichment of documents including geoparsing tasks. At the same time, even when performing a manual enrichment and referencing of place mentions in texts or in library or museum catalogues (e.g. using CIDOC CRM), an adequate source of external information is crucial.
Today, geo-historical data are frequently published following the Linked Data (LD) principles: i.e. using URIs and data format standards (RDF) and linking to other data sets to enable information discovery. Moreover, an implicit driving principle of LD, widespread in the Semantic Web community, is the reuse of vocabularies and ontologies already defined by others to avoid duplication. Pleiades is one of the best examples these days, but other generalistic sources such as DBpedia, Wikidata or GeoNames also provide interesting - albeit partial - geo-historical information and have proved to be useful in Digital Humanities (DH) projects. Linking texts to external sources using URIs enables the retrieval of additional information about the referenced places. Once this has been achieved, the information in the sources can be easily used to produce different views and aggregated analysis of corpora: i.e. visualizations; this in turn is meant to help scholars to capture place perceptions and to analyse spatio-temporal phenomena described in corpora.
The choice of geo-historical datasets which are used as gazetteers depends on the domain of the texts under consideration. Pleiades is specifically suited to places in Mediterranean Ancient History texts. However, tasks such as referencing places from historical periods other than Antiquity, or identifying geographically vague or imaginary places in literary texts, if ever possible, might require different methodological approach, which would include the construction of conceptual mapping models and the creation of a completely different kind of gazetteer. Existing gazetteers vary widely in how they abstract the world. Important aspects – such as scale, the representation of time (and change over time), complex geometries, uncertainty and vagueness as to location and/or date, multiple points-of-view, representation of hierarchies of political-administrative units, their boundaries and their change over time, alternative names, representation of fantastic places – are modelled in different ways, or are missing altogether. This limits their applicability in the Humanities. Finally, the ontologies used to link toponyms in texts to spatial references need to be further developed, especially when it comes to deal with fuzziness and uncertainty in mentions.
Clearly, new models should conform to LD principles, and privilege the reuse of consolidated ontologies, vocabularies and datasets. Long term preservation and maintenance are crucial problems in this sense because texts enriched with references to sources that have become obsolete or unavailable may have results that are unusable for the task for which they were tagged. Finally, geo-historical projects should also promote harmonization of their data with standards and practices of the broader DH community, and of the current research trends, in particular for what concerns the interoperability of resources within the framework of larger research infrastructures such as CLARIN or DARIAH.
The full-day workshop will focus on geo-historical gazetteers, and we will discuss their limits in supporting the needs of the Spatial Humanities community. The workshop will be composed of nine presentations by experts concerning the production of geo-historical gazetteers as LD and the handling place references in texts, library and museum catalogs, digitized maps, etc. It targets an audience of scholars, data designers, and software developers. Most importantly, it will comprise a speed presenting session for participants, topic-based breakout discussions between experts and attendees, a panel to highlight research priorities and summarize the main contributions of the workshop and research directions.