AccueilA place for places: current trends and challenges in the development and use of geo-historical gazetteers
A place for places: current trends and challenges in the development and use of geo-historical gazetteers
Digital Humanities 2016 pre-conference Workshop
Publié le vendredi 15 avril 2016 par Elsa Zotian
The 1st edition of the workshop “A place for places” will hold in conjunction with the “2016 Digital Humanities conference” in Kraków, Poland. The present workshop aims to investigate the latest developments of geo-historical gazetteers and their impact in natural language processing and digital humanities studies. In particular the workshop will deal with crucial problems concerning the geo-spatial models of representation for ancient places, and the management of temporal information for geographic features in general. Current projects concerning the publication of geo-historical data as Linked Open Data, as well as their exploitation for annotating and enriching texts will also be discussed, alongside with more theoretical issues on vocabularies and ontologies.
The implementation of geo-historical gazetteers increasingly depends upon the development of Natural Language Processing and Corpus Linguistics as well as geographical analysis in disciplines such as History, Archaeology and Literary Studies. The application of these methods usually relies on the appropriate modelling of databases for performing the semantic enrichment of documents including geoparsing tasks. At the same time, even when performing a manual enrichment and referencing of place mentions in texts or in library or museum catalogues (for instance, when applying the CIDOC CRM model and its spatio-temporal extension), an adequate source of external information is crucial.
Today, geo-historical data are more and more often published following the Linked Data principles : i.e. using URIs and data format standards (RDF) and linking to other data sets to enable information discovery. Moreover, an implicit driving principle of Linked Data, widespread in the Semantic Web community, is the reuse of vocabularies and ontologies already defined by others to avoid duplication. Keeping track of provenance is crucial. Pleiades is one of the best examples these days, but other generalistic sources such as DBpedia, Wikidata or GeoNames also provide interesting - albeit partial - geo-historical information and have proved to be useful in Digital Humanities projects. Linking texts to external sources using URIs enables the retrieval of additional information about the referenced places. Once this has been achieved, the information in the sources can be easily used to produce different views and aggregated analysis of corpora: i.e. visualizations (Jessop, 2008); this in turn is meant to help scholars to capture place perceptions and to analyse spatio-temporal phenomena described in corpora.
The choice of geo-historical datasets which are used as gazetteers depends on the domain of the texts under consideration. Pleiades is specifically suited to places in Mediterranean Ancient History texts. However, tasks such as referencing places from historical periods other than Antiquity, or identifying geographically vague or imaginary places in literary texts, if ever possible, might need the development of a different methodological approach, which would include the construction of conceptual mapping models and the creation of a completely different kind of gazetteer. In any case, the choice will have an important influence on the results of such visualizations as well as on the pertinence of the interpretation. Existing gazetteers vary widely in how they abstract the world. Important aspects – such as scale, the representation of time (and change over time), complex geometries, uncertainty and vagueness as to location and/or date, multiple points-of-view, representation of hierarchies of political-administrative units, their boundaries and their change over time, alternative names (rejected and standard forms, vernacular and multilingual), representation of fantastic places – are modelled in different ways, or are missing altogether. This limits their applicability in the Humanities. Moreover, interlinking between corresponding entities in different gazetteers is often lacking, although progress has been made in this regard, through community initiatives or by using GeoNames, or Wikidata as backbones (Simon et al, 2015). Finally, the ontologies used to link toponyms in texts to spatial references need to be further developed, especially when it comes to deal with fuzziness and uncertainty in mentions (Reuschel & Hurni 2011).
Clearly, new models should conform to Linked Data principles, and they should privilege the reuse of existing and consolidated ontologies, vocabularies and datasets whenever possible. Long term preservation and maintenance are also crucial problems in this sense because texts enriched with references to sources that have become obsolete or unavailable may have results that are unusable for the task for which they were tagged (Janowicz et al, 2012). In this sense, specialisation of efforts on the one hand (to avoid re-doing things already done by others) and coordination on the other, are crucial for such projects. Finally, geo-historical projects should also promote harmonization of their data with standards and practices of the broader DH community, and of the current research trends, in particular for what concerns the interoperability of resources within the framework of larger research infrastructures such as CLARIN or DARIAH .
In this workshop we will focus on geo-historical gazetteers, and we will discuss their limits in supporting the needs of the Spatial Humanities community. The proposed workshop will be composed of eight presentations, each of which concerns the production of geo-historical gazetteers as Linked Data as well as the annotation, the recognition and the geoparsing of place names referenced in texts, library and museum catalogs, digitized maps, and other cultural heritage artifacts. The presenters are involved in a variety of Spatial Humanities projects, and they possess valuable experience to share with the wider DH research community. The Information about presenters and presentations is listed below.
- Christopher Donaldson (University of Birmingham), Extracting and visualizing the geographies in historical travel writing: This presentation will introduce a procedure for the automated extraction and resolution of geographical information from a corpus of historical writings about the English Lake District. The research on which the presentation is based is using the spatial analysis of geo-historical linked data sets to achieve a more comprehensive and refined understanding of how the landscape of the Lake District was perceived, represented, and experienced in the past.
- Karl Grossner (Stanford University), Joining Place and Period in Historical Gazetteers: Places referred to in historical documents and gazetteers have temporal as well as spatial extents. Likewise, historical periods have spatial extents. However existing data models and format standards and the mapping and timeline software that use them do not reflect this. I will discuss recent work on Topotime, an extension to the GeoJSON format adding temporal expressions, and allowing for some types of uncertainty encountered in historical data.
- Katherine Hart Weimer (Rice University): A wealth of geographic information is included in library catalogs, with existing structures for name disambiguation, cross-referencing and inclusion of geographic coordinates. Recently, efforts are underway in libraries to convert this data into Linked Data allowing for cross platform applications. The presentation describes an experiment in this sense.
- Maurizio Lana (Università del Piemonte Orientale): Annotation of place mentions in Latin Literature. The annotation pipeline uses parsing+NER but later mentions are manually checked and referenced to external gazetteers such as Pleiades. The novelty of the project is the GeoLat GO! ontology that allows for a more complex annotation
- Bruno Martins (University of Lisbon): NLP and IR methods for handling geospatial information in textual documents. In my talk, I will present a brief survey of techniques for handling geospatial information within textual documents, including work at our team in the University of Lisbon, and other methods proposed within the Computational Linguistics and Information Retrieval communities. I will discuss methods to address the problems of (i) document geocoding, (ii) toponym resolution, and (iii) selecting geographically relevant key-phrases. Applications within the broad field of Digital Humanities, and Spatial Humanities in particular, will also be outlined.
- Patricia Murrieta-Flores (University of Chester): So far, research in the spatial humanities has been mainly concerned with geographically precise information or what could be considered as ‘real’ places in historical and literary sources. Nevertheless, non-locational places play an important role in narratives of all sorts of sources from the fantastic, to geographically vague travel accounts. This is an important limitation in the analysis of place in the Digital Humanities. Using Medieval Romances as an example, this presentation will discuss the challenges posed by literary narratives of place in terms not only of disambiguation, but also reference to fantastic and non-locational places.
- Michael Page (Emory University): Atlanta Explorer: Historical Geocoding & the City: Atlanta Explorer focuses on building datasets and geospatial tools to explore the history of the city. Completed is a geodatabase and geocoder for circa 1930 and the pilot 3D virtual environment. The next phase includes producing geocoders for the remaining years (1868-1930) and therefore strategies and methods for developing historical geocoding datasets and tools for place discovery will be discussed. Our goal is to also share the underlying data with the community CityGML as how we would likely share and archive the model.
- Rainer Simon (Austrian Institute of Technology), Pelagios project: an international community initiative concerned with the development of Linked Open Data methods, tools and services to better interconnect geo-historical datasets. In its most recent project phase (“Pelagios 3 - Early Geospatial Documents”), Pelagios has developed Recogito, a semi-automatic geo-annotation tool; Peripleo, a geotemporal search engine. Furthermore, Pelagios has annotated more than 300 historical sources from different cartographic traditions, collecting more than 120,000 place references (hand-verifying approx. half of them so far) in literary texts and early maps.
- Humphrey Southall (University of Portsmouth): Engaging the wider public with historical gazetteers. Gazetteers are a powerful tool for humanities researchers, but they are also of great fascination and utility for the general public. That interest (1) enables academic projects to achieve wider “impact”, (2) enables popular web sites to be sustained by advertising income advertising income, and (3) enables expansion through crowd-sourcing. This presentation covers experience with three related projects: the established Vision of Britain site, 1.6m annual visitors generating c. £20,000 per annum; PastPlace, our new global Linked Data gazetteer which uses Wikidata as a spine to which we are adding historical toponym attestations by various routes; and GB1900, a crowd-sourced gazetteer building project developed in collaboration with the National Libraries of Wales and of Scotland, extracting toponyms from a complete set of 1:10,560 maps of Great Britain.
Subsequently to the workshop, we aim at preparing a best practices document (i.e. white paper) and to make it available online – for instance via the GeoHumanities SIG site, specialized mailing lists and professional network sites – in order to communicate to the DH community. Furthermore, presenters will be asked to prepare papers describing more exhaustively the content of the presentations so we can prepare the publication of a special issue in the Journal of Map & Geography Libraries, an international, peer-reviewed journal from Taylor & Francis (Routledge), co-edited by one of the presenters, Katherine Hart Weimer.
- Berman, Merrick, Ruth Mostern and Humphrey Southall. 2016. Placing Names: Enriching and Integrating Gazetteers. Bloomington, IN: Indiana University Press.
- Elliott, Tom, and Sean Gillies. 2009. “Digital Geography and Classics.” Digital Humanities Quarterly 3 (1).
- Evans, Courtney, and Ben Jasnow. 2014. “Mapping Homer’s Catalogue of Ships.” Literary and Linguistic Computing 29 (3): 317–25. doi:10.1093/llc/fqu031.
- Gregory, Ian, Christopher Donaldson, Patricia Murrieta-Flores and Paul Rayson. 2015. ‘GIS, Geoparsing, and Text Analysis: New Trends in Spatial Humanities Research’, International Journal of Humanities and Arts Computing, 9.1: 1-94.
- Grossner, Karl, Krzysztof Janowicz, and Carsten Keßler. 2016. “Place, Period, and Setting for Linked Data Gazetteers.” In Placing Names: Enriching and Integrating Gazetteers, edited by Mostern, Ruth, Berman, Lex, and Humphrey Southall. Bloomington, IN: Indiana University Press. http://geog.ucsb.edu/~jano/GrossnerJanowiczKessler_submitted_draft.pdf.
- Janowicz, Krzysztof, Simon Scheider, Todd Pehle, and Glen Hart. 2012. “Geospatial Semantics and Linked Spatiotemporal Data-Past, Present, and Future.” Semantic Web 3 (4): 321–32.
- Jessop, Martyn. 2008. “Digital Visualization as a Scholarly Activity.” Literary and Linguistic Computing 23 (3): 281–93. doi:10.1093/llc/fqn016.
- Murrieta-Flores, Patricia, and Ian Gregory. 2015. “Further Frontiers in GIS: Extending Spatial Analysis to Textual Sources in Archaeology.” Open Archaeology 1 (1). http://www.degruyter.com/view/j/opar.2014.1.issue-1/opar-2015-0010/opar-2015-0010.xml
- Reuschel, Anne-Kathrin, and Lorenz Hurni. 2011. “Mapping Literature: Visualisation of Spatial Uncertainty in Fiction.” The Cartographic Journal 48 (4): 293–308.
- Simon, Rainer, Isaksen, Leif, Barker, Elton and de Soto Cañamares, Pau. 2015. The Pleiades Gazetteer and the Pelagios Project. In Placing Names: Enriching and Integrating Gazetteers. Berman, M. L., Mostern, R. and Southall, H. (Eds.) Indiana University Press (in press).
- Southall, Humphrey, Alexander von Lunen and Paula Aucott. 2009. “On the organization of geographical knowledge: Data models for gazetteers and historical GIS”. E-Science Workshops, 2009 5th IEEE International Conference on (Oxford: IEEE), 162-166.
- Southall, Humphrey, Ruth Mostern and Merrick Berman. 2011. “On historical gazetteers”. International Journal of Humanities and Arts Computing 5 (2), 127-145
- Tomasi, Francesca, Fabio Ciotti, Marilena Daquino, and Maurizio Lana. 2015. “Using Ontologies as a Faceted Browsing for Heterogeneous Cultural Heritage Collections.” (accepted), 1st Workshop on Intelligent Techniques At LIbraries and Archives (IT@LIA 2015), accessed November 5. http://italia2015.dei.unipd.it/papers/ITALIA_2015_submission_5.pdf.
- Van Hooland, Seth, Max De Wilde, Ruben Verborgh, Thomas Steiner, and Rik Van de Walle. 2013. “Exploring Entity Recognition and Disambiguation for Cultural Heritage Collections.” Literary and Linguistic Computing. http://www.researchgate.net/profile/Ruben_Verborgh/publication/255883214_Exploring_Entity_Recognition_and_Disambiguation_for_Cultural_Heritage_Collections/links/550f24bd0cf21287416b02e4.pdf.
Schedule of the day
We propose a full-day workshop where the aforementioned presentations, 20 minutes for each one, will be organized in thematic sessions followed by questions from the audience. The intended audience are scholars, data designers, and software developers who are or will be involved in research projects concerning the spatial humanities. The workshop will also comprise a speed presenting session in which attendees describe in three minutes any concrete theoretical or technical issues related to the workshop topics: for instance explaining their needs and how they could use the presented data sets or ontologies. Attendees will be encouraged to prepare the subject in advance (optional). Then, presenters can interact and engage with the attendees in the form of breakout discussions according to sub-topics. At the end of the workshop, we intend to have a short panel to highlight research priorities. After this, the panelists and workshop leaders will summarize the main contributions of the workshop and research directions. The tentative schedule of the day is presented below.
9:30 First presentation session including discussions and short break
12:30 Lunch break
13:30 Second presentation session including discussions and short break
15:00 Speed presenting and breakout discussions
17:00 Wrap up session including an expert panel discussion
18:00 End of the workshop
- Carmen Brando, PhD, Research engineer, Ecole des Hautes Etudes en Sciences Sociales (EHESS), Centre de recherches historiques (CRH - UMR 8558)
- Francesca Frontini, PhD Researcher, Istituto di Linguistica Computazionale "A. Zampolli", Consiglio Nazionale delle Ricerche (CNR)
- Représentations (Catégorie principale)
- Esprit et Langage > Épistémologie et méthodes > Cartographie, imagerie, SIG
- Esprit et Langage > Épistémologie et méthodes > Méthodes de traitement et de représentation
- Esprit et Langage > Épistémologie et méthodes > Approches de corpus, enquêtes, archives
- Esprit et Langage > Épistémologie et méthodes > Digital humanities
- Jagiellonian University (UJ), Auditorium Maximum, ul. Krupnicza 33| Pedagogical University (UP), Department of Pedagogy, ul. Oleandry 6.
- lundi 11 juillet 2016
- geo-historical gazetteers, toponym annotation, spatio-temporal ontologies, entity recognition and resolution, geoparsing, linked data publication and consumption, distant reading, visualisation, spatial-temporal analysis, place perception
- Carmen Brando
courriel : carmen [dot] brando [at] gmail [dot] com
URLS de référence
Source de l'information
- Carmen Brando
courriel : carmen [dot] brando [at] gmail [dot] com
Pour citer cette annonce
« A place for places: current trends and challenges in the development and use of geo-historical gazetteers », Colloque, Calenda, Publié le vendredi 15 avril 2016, http://calenda.org/363962