AccueilAFLiCo JET 2018 : corpus and représentativité

AFLiCo JET 2018 : corpus and représentativité

AFLiCo JET 2018: corpora and representativeness

*  *  *

Publié le jeudi 30 novembre 2017 par Anastasia Giardinelli

Résumé

With the advent of corpus linguistics, the use of corpora has become central in linguistics. One underlying assumption is that the corpus is representative of the linguistic phenomenon under scrutiny. Of course, corpus representativeness itself is a methodological construct (Leech 2006, Habert 2010): language corpora are tools constructed by linguists, and their structural limitations constrain and condition the validity of linguistic findings.

Annonce

Presentation

With the advent of corpus linguistics, the use of corpora has become central in linguistics. One underlying assumption is that the corpus is representative of the linguistic phenomenon under scrutiny. Of course, corpus representativeness itself is a methodological construct (Leech 2006, Habert 2010): language corpora are tools constructed by linguists, and their structural limitations constrain and condition the validity of linguistic findings.

Here is an open-ended list of issues that we wish to address in the workshop: - What does it mean for a corpus to represent language use, and what are the relevant criteria? - To what extent does representativeness rely on intuition, since it cannot be fully gauged empirically? - Because a corpus cannot be representative of all features of language use how can we address bias in sampling? - Does representativeness necessarily entail balance? - Can the design of a corpus be totally free from any form of theorization?Solutions to these complex issues may reflect in the development and use of different types of corpora.

The representativeness of written corpora may rely on a variety of features. According to Biber (1993: 244), “[r]epresentativeness refers to the extent to which a sample includes the full range of variability in a population.” Variability can be defined as the interaction between situational (e.g. format, setting, author, addressee, purposes, topics) and linguistic, distributional parameters (e.g. frequencies of word classes). Sampling can be based on extralinguistic (sociological, demographic) criteria (Crowdy 1993). Balance, i.e. a proportion of sampled elements that reflects their frequency in the targeted language, is claimed to characterize some corpora (e.g. the Brown Corpus (Francis & Kucera 1979) and the Lancaster-Oslo-Bergen corpus (Johansson et al. 1978)), though it is not a prerequisite.

Although increasingly larger corpora, including monitor corpora, can be compiled from the Web (Baroni et al. 2009), large size is not necessarily a priority. “Big is beautiful” in the realm of corpora is, perhaps, a “delusion” (Svartvik 1992: 10). Large corpora are often presented as an ideal but, in practice, “small” corpora can go a long way in such domains as English language teaching (Ghadessy, Henry, and Roseberry, 2001), the study of metaphors (Cameron and Deignan 2003), dialectology (Hollmann and Siewierska, 2007; Boas and Schuchard, 2012), etc. Parallel corpora, i.e. collections of original texts and their translations in one or more languages, are particularly useful in areas of research such as contrastive linguistics, translation studies and computational linguistics (Kenning 2010), but their alleged lack of representativeness has called for inventive ways of using them (Nádvorníková 2017).

In the area of spoken corpora, collecting data that represents the variability of the multiple dimensions of speech (phonology and phonetics, prosody, gesture) remains a challenge today. Collecting, transcribing, annotating and analysing data, is a slow, sometimes complicated, task. Although phonological and prosodic annotations can be partially systematized (Bertrand et al. 2008), technological advances are yet to be made in the automatic recognition of speech and gesture in interactional contexts. Automatic motion capture technologies for gesture research are promising (Priesters & Mittelberg 2013, Guez et al. 2013), but little advanced. As part of initiatives such as the TGIR Huma-Num Multi-Com – CORLI Consortium, multimodality researchers collaborate to develop collective harmonised practices of collection, transcription and archiving of spoken corpora.

AFLiCo JETs provide a forum for high-quality research in cognitive linguistics and, more generally, usage-based approaches to language. The topic of this year’s workshop is “corpora and representativeness”.

AFLiCo JET 2018 invites linguists, including junior researchers, to submit proposals that address the following topics (this is an open-ended list):

  • Bias in corpora
  • Material issues in corpus building
  • Theoretical issues in corpus building
  • The use of different types of corpora in a complementary fashion in linguistic analysis
  • Balance, size, distribution in corpora representativeness
  • Spoken/multimodal vs written/textual corpora
  • Automatization in corpus building

Submission Guidelines

Anonymous abstracts for 20-minute presentations (+ 8 minutes for questions) should include a title and a short bibliography. They should not exceed 500 words (exclusive of references, tables, and figures). They can be in English or in French.

Abstracts should clearly state the following: - research question(s) - approach(es) - subfield (e.g. semantics, pragmatics, gesture studies, corpus linguistics, NLP, etc.) method(s) - data - expected or confirmed results.Include three to five keywords specifying the (sub)field, the topic, and the approach.

Submit your abstract via the "Submissions" module on the conference website: https://aflicojet2018.sciencesconf.org/ (MY SPACE > SUBMISSIONS > MY SUBMISSIONS). First you will need to create an account on sciencesconf.org, if you do not already have one, then click on “Submissions” then “Submit an abstract”. If you need help, let us know via the contact form. Each abstract will be double-blind peer reviewed.

The deadline for all abstracts is

December 8th, 2017

Notification of acceptance will be sent around January 10th, 2018.

Invited speakers

  • Dawn Knight, School of English, Communication and Philosophy, Cardiff University
  • Thomas Egan, Inland Norway University of Applied Sciences

Organizing committee

Scientific Committee

  • Olivier Baude - Université Paris Nanterre
  • Caroline Bogliotti - Université Paris Nanterre
  • Agnès Celle - Université Paris Diderot
  • Hugo Chatellier - Université Paris Nanterre
  • Gilles Col - Université de Poitiers
  • Charlotte Danino - Université Sorbonne Nouvelle
  • Sascha Diwersy - Université Paul Valéry Montpellier
  • Emmanuel Ferragne - Université Paris Diderot
  • Dylan Glynn - Université Paris 8
  • Lucie Gournay - Université Paris Est Créteil Val de Marne
  • Philippe Gréa - Université Paris Nanterre
  • Karolina Krawczak - Adam Mickiewicz University, Poznan
  • Natalie Kübler - Université Paris Diderot
  • Anne Lacheret - Université Paris Nanterre
  • Bernard Laks - Université Paris Nanterre
  • Dominique Legallois - Université Sorbonne Nouvelle
  • Maarten Lemmens - Université de Lille 3
  • Diana Lewis - Aix Marseille Université
  • Sylvain Loiseau - Université Paris 13
  • Julien Longhi - Université de Cergy-Pontoise
  • Rudy Loock - Université de Lille 3
  • Aliyah Morgenstern - Université Sorbonne Nouvelle
  • Florent Perek - Université de Birmingham
  • Julien Perrez - Université de Liège
  • Graham Ranger - Université d'Avignon
  • Caroline Rossi - Université Grenoble Alpes
  • Cécile Viollain - Université Paris Nanterre

Catégories

Lieux

  • Université Paris Nanterre, amphithéâtre Max Weber - 200 Avenue de la République, 92000 Nanterre
    Nanterre, France (92)

Dates

  • vendredi 08 décembre 2017

Mots-clés

  • linguistique de corpus, représentativité, corpus linguistics, representativeness, corpus écrits, corpus oraux, corpus multimodaux, TAL

Contacts

  • Guillaume Desagulier
    courriel : gdesagulier [at] univ-paris8 [dot] fr
  • Camille Debras
    courriel : camilledebras [at] yahoo [dot] fr
  • Sophie Raineri
    courriel : sophieraineri [at] gmail [dot] com

URLS de référence

Source de l'information

  • Guillaume Desagulier
    courriel : gdesagulier [at] univ-paris8 [dot] fr

Pour citer cette annonce

« AFLiCo JET 2018 : corpus and représentativité », Appel à contribution, Calenda, Publié le jeudi 30 novembre 2017, http://calenda.org/423742