Software

All of my software is open-source, and most of it is available through github. Most here is software for Natural Language Processing and runs on unix-based systems only. In my blog I will talk more about my software and research. Some software projects mentioned here are not exclusively mine but are projects I did with other colleagues or collaborators:

  • CoLiBri - CoLiBri (Constructions as Linguistic Bridges) is my PhD project at Radboud University Nijmegen, the project started in September 2011 and aims to research methods for finding constructions in parallel corpora, finding alignments between these constructions, and employing these in a Machine Translation system. The project is under heavy development. Written in C++.
  • FoLiA - FoLiA is an XML-based format for Linguistic Annotation suitable for representing written language resources such as corpora. Its goal is to unify a variety of linguistic annotations in one single rich format, without committing to any particular standard annotation set. Instead, it seeks to accommodate any desired system or tagset, and so offer maximum flexibility. This makes FoLiA language independent. Due to its generalised set up, it is easy to extend the FoLiA format to suit your custom needs for linguistic annotation.
  • CLAM - CLAM allows you to quickly and transparently transform your Natural Language Processing application into a RESTful webservice, with which both human end-users as well as automated clients can interact. CLAM takes a description of your system and wraps itself around the system, allowing end-users or automated clients to upload input files to your application, start your application with specific parameters of their choice, and download and view the output of the application once it is completed.
  • PyNLPl - PyNLPl stands for Python Natural Language Processing Library, and is pronounced as 'pineapple'. It is an extensive set of Python modules for a wide variety of NLP tasks. PyNLPl can be used for example the computation of n-grams, frequency lists and distributions, language models. There are also implementation of various data types, search algorithms, and parsers for formats found in the NLP field.
  • ucto - Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto is extensible and comes with tokenisation rules for several languages. Written in C++.
  • Frog - Frog is a suite for part-of-speech tagging, lemmatisation, morphological analysis and dependency parsing for Dutch. I participated in certain aspects of this software, developed at Tilburg University.
  • ucto - Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto is extensible and comes with tokenisation rules for several languages. Written in C++.
  • PBMBMT - PBMBMT (Phrase Based Memory-based Machine Translation) is a Machine translation system built upon machine learning classifiers. It was my master's projects at Tilburg University. Written in Python.
  • UniLang - Not really a sofware package, but rather a website, an online language community, which I built in collaboration with a few other developers since 2001. It contains lots of custom software for the representation and viewing of language resources for language learning, as well as many custom enhancements to the forum software. Written in PHP.
-->
mail iconE-mail: proycon
ircIRC:proycon (on freenode and others)
xmppXMPP:proycon
telegramTelegram:proycon
textsecureTextSecure:(ask me)
skypeSkype:proycon_linux (Usually offline, I prefer webrtc through talky.io instead!)
sipSIP:sip:proycon@sip.linphone.org
gnupgGnuPG Public Key:0x1A31555C
bitcoinBitcoin donations:1BRptZsKQtqRGSZ5qKbX2azbfiygHxJPsd