There’s more, see all photos!

Recent Posts

More Posts

Break Free! Don’t be a prisoner of your software platform! “An opinion piece on software, social media, and ethics”, …

My PhD thesis Context as Linguistic Bridges has been released and is to be defended soon: Context as Linguistic Bridges is a study that …

大家好! I’ve been trying to learn Mandarin Chinese for quite a while, following the vocabulary compiled for the HSK (汉语水平考试) tests. …

Ik heb de GPS tracker die o.a. verkocht wordt als Reptrek Minitrek aangeschaft om onze hond te kunnen tracken. In plaats van de …

I recently switched to the gruvbox colours and polybar. Distribution: Arch Linux WM: bspwm Bar: polybar Launcher: rofi Multiplexer: …


I’m an avid support of open-source software, also known as free software (free as in free speech), and have published a lot thereof. Most of my software is hosted on GitHub. I also maintain packages for Arch Linux and Debian, and am active in promoting good research software quality & sustainability.


Labirinto is a virtual laboratory portal, it makes a collection of software browseable and searchable for the end-user. Labirinto presents the software’s metadata following the CodeMeta specification in an intuitive way and allows the user to filter and perform a limited search. The portal gives access to software if it offers web-based interfaces. This system is specifically geared towards research software, and for instance allows linking to relevant scientific publications for each tool.



A Python package for generating and working with codemeta



LaMachine is a unified software distribution for Natural Language Processing. We integrate numerous open-source NLP tools, programming libraries, web-services, and web-applications in a single Virtual Research Environment that can be installed on a wide variety of machines.



FoLiA Document Server - HTTP webservice backend for serving and annotating FoLiA documents using the FoLiA Query Language (FQL). Used by FLAT.



Gecco is a generic modular and distributed framework for spelling correction. Aimed to build a complete context-aware spelling correction system given your own data set.


Home Automation

My elaborate Home Automation configuration, powered by Home Assistant



FLAT is a web-based linguistic annotation environment based around the FoLiA format, a rich XML-based format for linguistic annotation. FLAT allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm. It is a document-centric tool that fully preserves and visualises document structure.


Colibri Core

Colibri Core is software to quickly and efficiently count and extract patterns from large corpus data, to extract various statistics on the extracted patterns, and to compute relations between the extracted patterns.



My dotfiles: configuration files for various application on my linux system



FoLiA is an XML-based annotation format, suitable for the representation of linguistically annotated language resources. FoLiA’s intended use is as a format for storing and/or exchanging language resources, including corpora. Our aim is to introduce a single rich format that can accommodate a wide variety of linguistic annotation types through a single generalised paradigm. We do not commit to any label set, language or linguistic theory.



PyNLPl, pronounced as ‘pineapple’, is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).



Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.

2010-03 is een automatische spellingcorrector voor het Nederlands die zowel gewone typefouten als grammaticale fouten en verwarringen tussen bestaande woorden opspoort.



Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation.


Recent Publications

More Publications

Multiword expressions (MWEs) are known as a ‘pain in the neck’ due to their idiosyncratic behaviour. While some categories …

We present an overview of the software and data infrastructure for FoLiA, a Format for Linguistic Annotation developed within the scope …

Scientific research relies on computer software, yet software is not always developed following practices that ensure its quality and …

The idea behind the Flemish/Dutch CLARIN project TTNWW¹ (’TST Tools voor het Nederlands als Webservices in een Workflow’, or ‘NLP Tools …



Via Nederlab kunnen onderzoekers en studenten grote aantallen gedigitaliseerde Nederlandstalige teksten van ca. 800 tot heden gezamenlijk doorzoeken en analyseren met binnen Nederlab ontwikkelde, gebruiksvriendelijke tekstanalysesoftware. Zo biedt Nederlab een laboratorium voor onderzoek naar de veranderingspatronen in de Nederlandse taal en cultuur.



Spreek2Schrijf is een haalbaarheidsstudie naar de mogelijkheid om de gesproken taal in de plenaire bijeenkomsten van de Tweede Kamer, automatisch om te zetten in schrijf -taal zoals die nu in de Handelingen wordt gebruikt, in opdracht van de Dienst Verslag en Redactie van het Nederlandse Parlement, de DVR.



The successor of CLARIN-NL



A Dutch-Frisian Machine translation project in collaboration with the Fryske Akademy



Colibri is my PhD project, it investigates the role of source-language context information in machine translation



The goal of DutchSemCor was to deliver a one-million word Dutch corpus that is fully sense-tagged with senses and domain tags.



The CLARIN infrastructure is a research infrastructure intended for humanities researchers that work with language data and tools.



UniLang is a Language Community on the internet for people interested in learning languages. I co-founded UniLang in October 2000 and it has been around ever since. UniLang has a wide variety of language resources and a community forum to meet like-minded people and learn together.


Talks & Posters

.. presentation slides, screencasts …

More Talks

Language Resources

Grammar Constructs

A phrasebook of various grammar constructions, intended for translation into many different languages as a parallel learning resource.


Dutch for Beginners

A 10-lesson Dutch course for absolute beginners


Esperanto Course

This is a translation and English adaptation of an Esperanto course that was originally in Dutch, by Wil van Ganswijk. Containing 20-lessons.


French for Beginners

A five lesson French course for absolute beginners


Russian for Beginners

A 5 lesson Russian course for absolute beginners


Spanish for Beginners

A 10-lesson Spanish course for absolute beginners


Where is James?

Short story using basic vocabulary, intended for translation into many different languages as a parallel learning resource.


Vocabulairetrainer Arabisch-Nederlands

Deze vocabulairetrainer gaat over de vocabulaire behorend bij de cursus Taalverwerving Arabisch, les 1 t/m 29, aan de Universiteit Utrecht.


Woordenlijst Arabisch-Nederlands

Deze woordenlijst somt alle vocabulaire op van de cursus Taalverwerving Arabisch, les 1 t/m 29, aan de Universiteit Utrecht.


UniLang Basic Phrasebook

A basic phrasebook, containing commonly used phrases, intended for translation into many different languages as a parallel learning resource.


Dutch Pronunciation Guide

A pronunciation guide for Dutch.