Credits and references

Once upon a time

The development of Frog’s modules started in the nineties at the ILK Research Group (Tilburg University, the Netherlands) and the CLiPS Research Centre (University of Antwerp, Belgium). Most modules rely on Timbl, the Tilburg memory-based learning software package :raw-latex:`\cite{timbl}` or MBT the memory-based tagger-generator :raw-latex:`\cite{mbt}`. These modules were integrated into an NLP pipeline that was first named MB-TALPA and later Tadpole :raw-latex:`\cite{Tadpole}`. Over the years, the modules were refined and retrained on larger data sets and the latest versions of each module are discussed in this chapter. We thank all programmers who worked on Frog and its predecessors in chapter [ch-credit].

The CliPS Research Centre also developed an English counterpart of Frog, a python module called MBSP (MBSP website: http://www.clips.ua.ac.be/pages/MBSP).

Credits

If you use Frog for your own work, please cite this reference manual

Frog, A Natural Language Processing Suite for Dutch, Reference guide, Iris Hendrickx, Antal van den Bosch, Maarten van Gompel en Ko van der Sloot, Language and Speech Technology Technical Report Series 16-02, Radboud University Nijmegen, Draft 0.13.1 - June 2016

The following paper describes Tadpole, the predecessor of Frog. It contains a subset of the components described in this paper:

Van den Bosch, A., Busser, G.J., Daelemans, W., and Canisius, S. (2007). An efficient memory-based morphosyntactic tagger and parser for Dutch, In F. van Eynde, P. Dirix, I. Schuurman, and V. Vandeghinste (Eds.), Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting, Leuven, Belgium, pp. 99-114

We would like to thank everybody who worked on Frog and its predecessors. Frog, formerly known as Tadpole and before that as MB-TALPA, was coded by Bertjan Busser, Ko van der Sloot, Maarten van Gompel, and Peter Berck, subsuming code by Sander Canisius (constraint satisfaction inference-based dependency parser), Antal van den Bosch (MBMA, MBLEM, tagger-lemmatizer integration), Jakub Zavrel (MBT), and Maarten van Gompel (Ucto). In the context of the CLARIN-NL infrastructure project TTNWW, Frederik Vaassen (CLiPS, Antwerp) created the base phrase chunking module, and Bart Desmet (LT3, Ghent) provided the data for the named-entity module.

Maarten van Gompel designed the FoLiA XML output format that Frog produces, and also wrote a Frog binding for Python [17], as well as a separate Frog client in Python [18]. Wouter van Atteveldt wrote a Frog client in R [19], and Machiel Molenaar wrote a Frog client for Go [20].

The development of Frog relies on earlier work and ideas from Ko van der Sloot (lead programmer of MBT and TiMBL and the TiMBL API), Walter Daelemans, Jakub Zavrel, Peter Berck, Gert Durieux, and Ton Weijters.

The development and improvement of Frog also relies on your bug reports, suggestions, and comments. Use the github issue tracker at https://github.com/LanguageMachines/frog/issues/ or mail lamasoftware @science.ru.nl.

Alpino syntactic dependency labels

This table is taken from Alpino annotation reference manual :raw-latex:`\cite{lassy2011}` :

dependentielabel OMSCHRIJVING
APP appositie, bijstelling
BODY romp (bij complementizer))
CMP complementizer
CNJ lid van nevenschikking
CRD nevenschikker (als hoofd van conjunctie)
DET determinator
DLINK discourse-link
DP discourse-part
HD hoofd
HDF afsluitend element van circumpositie
LD locatief of directioneel complement
ME maat (duur, gewicht, … ) complement
MOD bijwoordelijke bepaling
MWP deel van een multi-word-unit
NUCL kernzin
OBCOMP vergelijkingscomplement
OBJ1 direct object, lijdend voorwerp
OBJ2 secundair object (meewerkend, belanghebbend, ondervindend)
PC voorzetselvoorwerp
POBJ1 voorlopig direct object
PREDC predicatief complement
PREDM bepaling van gesteldheid ‘tijdens de handeling’
RHD hoofd van een relatieve zin
SAT satelliet; aan- of uitloop
SE verplicht reflexief object
SU subject, onderwerp
SUP voorlopig subject
SVP scheidbaar deel van werkwoord
TAG aanhangsel, tussenvoegsel
VC verbaal complement
WHD hoofd van een vraagzin
[1]The source code repository points to the latest development version by default, which may contain experimental features. Stable releases are deliberate snapshots of the source code. It is recommended to grab the latest stable release.
[2]https://github.com/LanguageMachines/ticcutils
[3]https://github.com/LanguageMachines/libfolia
[4]https://languagemachines.github.io/ucto
[5]https://languagemachines.github.io/timbl
[6]https://github.com/LanguageMachines/timblserver
[7]https://languagemachines.github.io/mbt
[8]B (begin) indicates the begin of the named entity, I (inside) indicates the continuation of a named entity, and O (outside) indicates that something is not a named entity
[9]https://github.com/proycon/pynlpl, supports both Python 2 and Python 3
[10]https://github.com/vanatteveldt/frogr/
[11]https://github.com/Machiel/gorf
[12]In the current Frog version UTF-16 is not accepted as input in Frog.
[13]In fact the tokenizer still is used, but in PassThru mode. This allows for conversion to FoLiA XML and sentence detection.
[14]Versions for Python 3 may be called cython3 on distributions such as Debian or Ubuntu
[15]More about the INI file format:https://en.wikipedia.org/wiki/INI_file)
[16]MBT available at http://languagemachines.github.io/mbt/
[17]https://github.com/proycon/python-frog
[18]Part of PyNLPL: https://github.com/proycon/pynlpl
[19]https://github.com/vanatteveldt/frogr/
[20]https://github.com/Machiel/gorf