Published on Monday, July 01, 2019
Abstract
You are invited to join the DARIAH Code Sprint 2019! It is an opportunity to bring together interested developers and DH-affiliated people, not only from the wide DARIAH community. For this purpose we would like to cordially invite you to spend three days in Berlin working on topics related to bibliographical metadata.
Announcement
Presentation
You are invited to join the DARIAH Code Sprint 2019! It is an opportunity to bring together interested developers and DH-affiliated people, not only from the wide DARIAH community. For this purpose we would like to cordially invite you to spend three days in Berlin working on topics related to bibliographical metadata. The registration is now open and can be found here: https://desircodesprint.sciencesconf.org/registration
Although this is already our second DARIAH code sprint it is not exclusively addressed to participants of the first code sprint. Everyone is welcome! An affiliation to coding in the Digital Humanities or general in technological discussions would although be helpful.
We will have three tracks approaching the wider topic of bibliographical metadata from three angles: extraction of data from PDFs (GROBID), the import and processing of data applying Bibsonomy and the visualisation of this data.
The code sprint will take place from September 24th to September 26th 2019 in Berlin at Forum Factory in a relaxed and productive environment.
The code sprint is organised by the DESIR project (DARIAH ERIC Sustainability Refined), an offspring of DARIAH-EU. DESIR aims to bring together DH affiliated developers, to spread competencies in the community, enhance own knowledge and learn on new approaches and technologies. With all of this DESIR addresses the sustainability question for several kinds of activities, infrastructures or services originating from the DARIAH context. Different from developing new resources or infrastructural components, DESIR is exploring opportunities to employ already existing resources (independent from DARIAH) as means to sustain certain infrastructure components and services.
Track descriptions
Track A: Extraction of bibliographical data and citations from PDF applying GROBID
As a result of the first Code Sprint that was organised last year (2018) by the DESIR project, this track has successfully built a tool covering the following functionalities: 1. Citation extraction of PDF files using GROBID; 2. Visualisation of extracted information directly on the PDF files. This visualization is intended to highlight important information on scientific articles (e.g., authors, title, tables, figures, keywords); 3. Inclusion of some additional information from external services (e.g., affiliation disambiguation, named entity recognition); 4. Integration of all extracted data on the PDF files as usable viewers.By browsing the tool url, users will be given some ideas of how this tool works: Firstly, users need to upload any scientific article in Pdf format; Then, click the service buttons as needed to see the highlighted results that show: - bibliographical extraction results; - affiliation processing results; - named-entity recognition.For the second sprint code, the idea of adding features and capabilities to the demonstrator will be our focus. For example, article authors as results of the Grobid extraction process will be able to refer to the digital researcher identifier (e.g., ORCID identifier). Track A invites participants to give creative ideas and to be part of our project.
Track B: Automatic Import of Bibliographic Data into BibSonomy
In this track we aim to extend the tool for automatic import of bibliographic metadata into BibSonomy. The first version of the tool was created at the DESIR workshop 2018. Currently, users can upload a pdf file and have metadata automatically extracted using GROBID. In a further step, users can correct the metadata and save it to BibSonomy. We want to extend the tool by adding further features:- Metadata extraction from text files, - Individual user login for BibSonomy, - Improved User Interface, - API. Feel free to come up with your own ideas for improvement. We are looking forward to actively discuss all ideas in the beginning of the code sprint.
Track C: Visualisation of time dependent graphs of relations
One of the major substantial outcomes of the previous DESIR Code Sprint Track-C was the novel generic concept of time dependent graphs of relations and its visual presentation. Examples of such graphs may be co-authorship and citation graphs, genealogy trees, or characters interaction graphs. From the visual perspective both the structure and time characteristics of such graphs play a significant analytical role. Our web-based tool developed throughout DESIR project now holds a functionality of visualizing bibliographical datasets (e.g imported via BibSonomy API or loaded from a file), on top of the generic data model. Within this Code Sprint we will focus on the extension of our tool both towards new data formats and use cases, as well as new visual forms. The participants will have the opportunity to work on the mapping of different data to the generic model of our graphs and/or on the translation of data formats to intermediate RDF description (subject-predicate-object). Bring-Your-Own-Data model is encouraged. New visual forms will cover the modification of web application user interface to include additional visualizations of metadata or aggregated information. Experience in Java and/or Javascript programming is recommended.
Program
Tuesday, September 24, 2019
13:00 - 14:00 Welcome and Registration - The location will be announced soon on this website
14:00 - 14:30 Welcome and Agenda Setting - Agenda Setting for the Code Sprint
14:30 - 16:00 Opening - N.N.
16:00 - 18:00 Workshop - Parallel Track A: Extraction of bibliographical data and citations from PDF applying GROBID
16:00 - 18:00 Workshop - Parallel Track B: Automatic Import of Bibliographic Data into BibSonomy
16:00 - 18:00 Workshop - Parallel Track C: Visualisation of time dependent graphs of relations
Wednesday, September 25, 2019
08:30 - 09:00 Welcome - and Coffee
09:00 - 18:00 Workshop - Parallel Track A: Extraction of bibliographical data and citations from PDF applying GROBID
09:00 - 18:00 Workshop - Parallel Track B: Automatic Import of Bibliographic Data into BibSonomy
09:00 - 18:00 Workshop - Parallel Track C: Visualisation of time dependent graphs of relations
Thursday, September 26, 2019
09:00 - 12:00 Workshop - Parallel Track A: Extraction of bibliographical data and citations from PDF applying GROBID
09:00 - 12:00 Workshop - Parallel Track B: Automatic Import of Bibliographic Data into BibSonomy
09:00 - 12:00 Workshop - Parallel Track C: Visualisation of time dependent graphs of relations
12:00 - 13:00 Wrap up of the Code Sprint - Talk
Subjects
- Information (Main category)
- Mind and language > Information > Information sciences
Places
- Forum Factory Hector Space - Charlottenstraße 2
Berlin, Federal Republic of Germany (10969)
Date(s)
- Tuesday, September 24, 2019
- Wednesday, September 25, 2019
- Thursday, September 26, 2019
Keywords
- DARIAH, DESIR, Code Sprint, Bibliographical Metadata, Digital Humanities, GROBID, Bibsonomy
Contact(s)
- Stefan Buddenbohm
courriel : buddenbohm [at] sub [dot] uni-goettingen [dot] de
Reference Urls
Information source
- Barthauer Raisa
courriel : barthauer [at] sub [dot] uni-goettingen [dot] de
License
This announcement is licensed under the terms of Creative Commons CC0 1.0 Universal.
To cite this announcement
« DARIAH Code Sprint 2019 », Study days, Calenda, Published on Monday, July 01, 2019, https://doi.org/10.58079/133d