There’s more, see all photos!

Recent Posts

More Posts

Ik heb de GPS tracker die o.a. verkocht wordt als Reptrek Minitrek aangeschaft om onze hond te kunnen tracken. In plaats van de …

I recently switched to the gruvbox colours and polybar. Distribution: Arch Linux WM: bspwm Bar: polybar Launcher: rofi Multiplexer: …

A piano music video

A piano music video

A piano music video

Software

I’m an avid support of open-source software, also known as free software (free as in free speech), and have published a lot thereof. Most of my software is hosted on GitHub. I also maintain packages for Arch Linux and Debian, and am active in promoting good research software quality & sustainability.

Labirinto

Labirinto is a virtual laboratory portal, it makes a collection of software browseable and searchable for the end-user. Labirinto presents the software’s metadata following the CodeMeta specification in an intuitive way and allows the user to filter and perform a limited search. The portal gives access to software if it offers web-based interfaces. This system is specifically geared towards research software, and for instance allows linking to relevant scientific publications for each tool.

2018-04

CodeMetaPy

A Python package for generating and working with codemeta

2018-04

LaMachine

LaMachine is a unified software distribution for Natural Language Processing. We integrate numerous open-source NLP tools, programming libraries, web-services, and web-applications in a single Virtual Research Environment that can be installed on a wide variety of machines.

2015-05

foliadocserve

FoLiA Document Server - HTTP webservice backend for serving and annotating FoLiA documents using the FoLiA Query Language (FQL). Used by FLAT.

2015-02

Gecco

Gecco is a generic modular and distributed framework for spelling correction. Aimed to build a complete context-aware spelling correction system given your own data set.

2015-01

Home Automation

My elaborate Home Automation configuration, powered by Home Assistant

2014-01

FLAT

FLAT is a web-based linguistic annotation environment based around the FoLiA format, a rich XML-based format for linguistic annotation. FLAT allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm. It is a document-centric tool that fully preserves and visualises document structure.

2013-12

Colibri Core

Colibri Core is software to quickly and efficiently count and extract patterns from large corpus data, to extract various statistics on the extracted patterns, and to compute relations between the extracted patterns.

2013-09

Dotfiles

My dotfiles: configuration files for various application on my linux system

2013-05

FoLiA

FoLiA is an XML-based annotation format, suitable for the representation of linguistically annotated language resources. FoLiA’s intended use is as a format for storing and/or exchanging language resources, including corpora. Our aim is to introduce a single rich format that can accommodate a wide variety of linguistic annotation types through a single generalised paradigm. We do not commit to any label set, language or linguistic theory.

2011-01

PyNLPl

PyNLPl, pronounced as ‘pineapple’, is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

2010-05

CLAM

Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.

2010-03

Valkuil.net

Valkuil.net is een automatische spellingcorrector voor het Nederlands die zowel gewone typefouten als grammaticale fouten en verwarringen tussen bestaande woorden opspoort.

2010-01

Ucto

Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation.

2009-12

Recent Publications

More Publications

Multiword expressions (MWEs) are known as a ‘pain in the neck’ due to their idiosyncratic behaviour. While some categories …

We present an overview of the software and data infrastructure for FoLiA, a Format for Linguistic Annotation developed within the scope …

Scientific research relies on computer software, yet software is not always developed following practices that ensure its quality and …

The idea behind the Flemish/Dutch CLARIN project TTNWW¹ (’TST Tools voor het Nederlands als Webservices in een Workflow’, or ‘NLP Tools …

Counting n-grams lies at the core of any frequentist corpus analysis and is often considered a trivial matter. Goingbeyond consecutive …

Projects

Nederlab

Via Nederlab kunnen onderzoekers en studenten grote aantallen gedigitaliseerde Nederlandstalige teksten van ca. 800 tot heden gezamenlijk doorzoeken en analyseren met binnen Nederlab ontwikkelde, gebruiksvriendelijke tekstanalysesoftware. Zo biedt Nederlab een laboratorium voor onderzoek naar de veranderingspatronen in de Nederlandse taal en cultuur.

2018-01

Spreek2Schrijf

Spreek2Schrijf is een haalbaarheidsstudie naar de mogelijkheid om de gesproken taal in de plenaire bijeenkomsten van de Tweede Kamer, automatisch om te zetten in schrijf -taal zoals die nu in de Handelingen wordt gebruikt, in opdracht van de Dienst Verslag en Redactie van het Nederlandse Parlement, de DVR.

2018-01

CLARIAH

The successor of CLARIN-NL

2015-01

Oersetter

A Dutch-Frisian Machine translation project in collaboration with the Fryske Akademy

2012-01

Colibri

Colibri is my PhD project, it investigates the role of source-language context information in machine translation

2011-08

DutchSemCor

The goal of DutchSemCor was to deliver a one-million word Dutch corpus that is fully sense-tagged with senses and domain tags.

2009-08

CLARIN-NL

The CLARIN infrastructure is a research infrastructure intended for humanities researchers that work with language data and tools.

2009-01

UniLang

UniLang is a Language Community on the internet for people interested in learning languages. I co-founded UniLang in October 2000 and it has been around ever since. UniLang has a wide variety of language resources and a community forum to meet like-minded people and learn together.

2000-10

Talks & Posters

.. presentation slides, screencasts …

More Talks

Language Resources

Grammar Constructs

A phrasebook of various grammar constructions, intended for translation into many different languages as a parallel learning resource.

2010-01

Dutch for Beginners

A 10-lesson Dutch course for absolute beginners

2008-07

Esperanto Course

This is a translation and English adaptation of an Esperanto course that was originally in Dutch, by Wil van Ganswijk. Containing 20-lessons.

2008-07

French for Beginners

A five lesson French course for absolute beginners

2008-07

Russian for Beginners

A 5 lesson Russian course for absolute beginners

2008-07

Spanish for Beginners

A 10-lesson Spanish course for absolute beginners

2008-07

Where is James?

Short story using basic vocabulary, intended for translation into many different languages as a parallel learning resource.

2008-07

Vocabulairetrainer Arabisch-Nederlands

Deze vocabulairetrainer gaat over de vocabulaire behorend bij de cursus Taalverwerving Arabisch, les 1 t/m 29, aan de Universiteit Utrecht.

2005-03

Woordenlijst Arabisch-Nederlands

Deze woordenlijst somt alle vocabulaire op van de cursus Taalverwerving Arabisch, les 1 t/m 29, aan de Universiteit Utrecht.

2005-03

UniLang Basic Phrasebook

A basic phrasebook, containing commonly used phrases, intended for translation into many different languages as a parallel learning resource.

2003-01

Dutch Pronunciation Guide

A pronunciation guide for Dutch.

2001-12

Contact