Proycon's Homepage


I’m an avid support of free open-source software (FOSS), (free as in free speech), and have published a lot thereof. Most of my software is hosted on GitHub and Sourcehut. I also maintain packages for Arch Linux, Debian and Alpine Linux. I am active in promoting good research software quality & sustainability. These latest years, I have a tendency to favour more minimalistic software, as there is a lot of bloated unmaintainable software and needless complexity around. One of the things of paramount importance for me, is that you are in control of your own software, and that it doesn't compromise your privacy or security. I wrote some posts on that subject in my blog section as well.

My main programming languages are Python, Rust, C, C++, shell scripting, and javascript.

Stand-off Text Annotation Model (STAM) is a data model for stand-off-text annotation where any information on a text is represented as an annotation.
Extensions for todo.txt: interactive rofi/fzf control, sync github issues, better colors, time tracking... and more!
Harvest and aggregate codemeta/ software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process
An approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction
This is a virtual keyboard for wayland (wlroots) compositors, intended to be used in environments where no physical keyboard is available, such as on a smartphone. It is used in Sxmo.
A vibration/audio feedback tool to be used with virtual keyboards
This is a simple virtual keyboard, intended to be used in environments where no physical keyboard is available, such as on a smartphone.
Sxmo, or Simple X Mobile, is a collection of simple and suckless X programs and scripts used together to create a fully functional mobile UI adhering to the Unix philosophy for the Pinephone.
Vocage is a minimalistic terminal-based vocabulary-learning tool.
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein).
Labirinto is a virtual laboratory portal, it makes a collection of software browseable and searchable for the end-user. Labirinto presents the software's metadata following the CodeMeta specification in an intuitive way and allows the user to filter and perform a limited search. The portal gives access to software if it offers web-based interfaces. This system is specifically geared towards research software, and for instance allows linking to relevant scientific publications for each tool.
A Python package for generating and working with codemeta
LaMachine is a unified software distribution for Natural Language Processing. We integrate numerous open-source NLP tools, programming libraries, web-services, and web-applications in a single Virtual Research Environment that can be installed on a wide variety of machines.
FoLiA Document Server - HTTP webservice backend for serving and annotating FoLiA documents using the FoLiA Query Language (FQL). Used by FLAT.
Gecco is a generic modular and distributed framework for spelling correction. Aimed to build a complete context-aware spelling correction system given your own data set.
My elaborate Home Automation configuration, powered by Home Assistant
FLAT is a web-based linguistic annotation environment based around the FoLiA format, a rich XML-based format for linguistic annotation. FLAT allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm. It is a document-centric tool that fully preserves and visualises document structure.
Colibri Core is software to quickly and efficiently count and extract patterns from large corpus data, to extract various statistics on the extracted patterns, and to compute relations between the extracted patterns.
My dotfiles: configuration files for various application on my linux system
FoLiA is an XML-based annotation format, suitable for the representation of linguistically annotated language resources. FoLiA’s intended use is as a format for storing and/or exchanging language resources, including corpora. Our aim is to introduce a single rich format that can accommodate a wide variety of linguistic annotation types through a single generalised paradigm. We do not commit to any label set, language or linguistic theory.
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.
2010-01-01 is een automatische spellingcorrector voor het Nederlands die zowel gewone typefouten als grammaticale fouten en verwarringen tussen bestaande woorden opspoort.
Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation.