The PARSEME-FR ( project
offers a 1.5-year post-doc position in Natural Language Processing,
starting in April 2018. Candidates should send their application before
February 1st, 2018 (see contact information below).
* Duration: 18 months, starting in April 2018 (open until filled)
* Location: to be discussed with the members of the PARSEME-FR
consortium (Nancy, Orléans or Paris)
* Employer: University of Orléans
* Contract : fixed term position
* Remuneration: approx. 2,300€ per month net income (in addition to
the salary, the contract includes health benefits)

## Topic:

**French MultiWord Expressions representation and parsing**

Many NLP applications require a fine-grained representation of the
syntactic (and sometimes semantic) structure of texts. The process of
building such a representation is called deep parsing. Recent work
combining symbolic and data-driven techniques have led to significant
advances in this field, notably in terms of robustness and
efficiency. Still, Multiword expressions (MWE), that is, groups of (not
always continuous) words that exhibit some idiosyncratic properties,
such as “hot dog”, “hard disk”, “kick the bucket”, “pay attention”,
etc. are still a major bottleneck for deep parsing (Sag et al. 2001,
Baldwin and Kim 2010). This is due, among other things, to their
unpredictable behavior at several levels (irregular morpho-syntax,
non-compositional semantics, …) and to the lack of annotated training

One of the goals of the PARSEME-FR project is to enhance the support of
MWEs in French parsing. To do so, 4 work packages have been defined,
dealing respectively with (i) MWE annotation in texts or treebanks, (ii)
MWE lexicons, (iii) MWE statistical and (iv) symbolic parsing. The
recruted post-doc will work in the last WP. Two complementary aspects
will be considered:

– the representation of MWEs in linguistic resources (including
electronic grammars, see e.g. (Abeillé, 2002)),
– the use of these MWE-aware resources in deep (symbolic and hybrid)
parsing (see e.g. (Foth and Menzel, 2006)).

Among existing resources for French, one may cite the FRMG (FRench
MetaGrammar) resource which corresponds to a linguistically motivated
abstract and modular description of the syntax of French (De La
Clergerie, 2010). FRMG has been successfully used to compute deep
representations of French texts. The first phase of the postdoc project
will consist in extending the expressive power of metagrammars to
provide compact representations of MWEs. A second step will consist in
extending FRMG with information about MWEs automatically extracted from
treebanks (e.g. syntactic or lexical constraints, distribution
information, etc.) and from external resources (e.g. lexicon and

This extension of the linguistic description fed to the parser may rise
some efficiency issues. Indeed, the larger the size of the input
grammar, the larger the size of the parsing search space (due to
syntactic and/or lexical ambiguities). To control the exploration of
this search space, several techniques have been proposed including A*
algorithms for MWEs (Waszczul et al., 2017). The second phase of the
postdoctoral project will focus on the extension of existing algorithms
dedicated to MWE parsing and their application to the DyALog engine used
to run FRMG (De La Clergerie, 2013).

## Profile:

* PhD in computer science or computational linguistics
* Good knowledge of French and English (not necessarily native)
* Interest in linguistics and familiarity with language technology
* Capacity to work independently and as part of a team

## Important dates:

Application deadline: February 1, 2018 (or until fulfilled)
Position starts: April 2018
Duration: 18 months

## Contact information:

Enquiries and / or applications should be sent to Yannick Parmentier
( and Eric de la Clergerie

Applications should contain an extended CV (mentioning the PhD defense
date and the names and contact information of 2 to 3 references) and a
cover letter.