Here you find all tools (i.e. software and software services) developed in the CLARIAH project, as well as some tools from predecessors and sister projects. This list is automatically harvested from the tool producers and providers themselves, and updated daily. Our tools are designed for researchers and developers in the Humanities and Social Sciences. Not all tools are suitable for all audiences and not all tools are mature and stable, this information should be clearly indicated for each tool, so you can make an informed judgement whether a tool might be suitable for you.

Are you a developer and is your tool not included in the index yet or do you have questions or comments on the metadata? Please read our contribution guidelines

Alpino

  • Command-line Application
  • Complete: The technology is complete, stable and deployed in production scenarios for end-users
  • Active: The project has reached a stable, usable state and is being actively developed.

Alpino 0.0.0

Alpino parser and related tools for Dutch [view more]
  • Linguistics
  • nwo:ComputationalLinguisticsandPhilology
  • Software for humanities
  • Structural Analysis
  • Docker
  • Linux
  • Web Application
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

Alpino-Webservice 2.4

  •   KNAW Humanities Cluster & CLST, Radboud University
Alpino is a dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing, developed by Gertjan van Noord at the University of Groningen. This is the webservice for it. You can upload either tokenised or untokenised files (which will be automatically tokenised for you using ucto), the output will consist of a zip file containing XML files, one for each sentence in the input document. [view more]
  • Internet > WWW/HTTP > WSGI > Application
  • Text Processing > Linguistic
  • dependency parsing
  • folia
  • linguistics
  • nlp
  • syntax
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2015-09-08
Modified: 2023-11-01
  • Web Application
  • Complete: The technology is complete, stable and deployed in production scenarios for end-users
  • Active: The project has reached a stable, usable state and is being actively developed.

AlpinoGraph 1.0.5

AlpinoGraph is een tool om syntactisch geannoteerde corpora te doorzoeken. De tool maakt gebruik van AgensGraph. AgensGraph combineert databasetechnologie (PostgreSQL) en Cypher, de standaard zoektaal voor grafen. De zoek-queries die je in AlpinoGraph kunt gebruiken zijn daarom een mix van SQL en Cypher. Daar voegt AlpinoGraph nog enkele extra uitbreidingen aan toe, zoals een eenvoudig maar handig systeem van macro's, en visualisatie van de resultaten. [view more]
  • Linguistics
  • nwo:ComputationalLinguisticsandPhilology
  • Software for humanities
  • Structural Analysis
  • Alpino
  • Cypher
  • Dependency parsing
  • SPOD: Syntactic profiler of Dutch
  • UD: Universal Dependencies
  • Docker
  • Linux
Created: 2020-03-25
Modified: 2024-04-24
  • Command-line Application
  • Software Library
  • Complete: The technology is complete, stable and deployed in production scenarios for end-users
  • Active: The project has reached a stable, usable state and is being actively developed.

alud 2.14.0

A Go package for deriving Universal Dependencies from Dutch sentences parsed with Alpino [view more]
  • Linguistics
  • nwo:ComputationalLinguisticsandPhilology
  • Software for humanities
  • Structural Analysis
  • Alpino
  • UD: Universal Dependencies
  • Aix
  • Android
  • Darwin
  • Dragonfly
  • Freebsd
  • Illumos
  • Ios
  • Js
  • Linux
  • Netbsd
  • Openbsd
  • Plan9
  • Solaris
  • Windows
Created: 2019-06-30
Modified: 2024-04-24
  • Active: The project has reached a stable, usable state and is being actively developed.

analiticcl 0.4.6

  •   KNAW Humanities Cluster & CLST, Radboud University
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation [view more]
  • linguistics
  • nlp
  • spellcheck
  • spelling-correction
  • text-processing
Created: 2021-04-13
Modified: 2024-04-22

AnnoRepo

  • Web API
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Active: The project has reached a stable, usable state and is being actively developed.

AnnoRepo 0.6.3

Implementation of W3C Web Annotation Protocol (root project) [view more]
  • web-annotation
  • web-annotation-protocol
Created: 2022-03-24
Modified: 2024-04-03
  • Command-line Application
  • Planning: The technology is in an initial planning stage (pre-alpha), no implementation is available yet
  • Active: The project has reached a stable, usable state and is being actively developed.

annorepo-client 0.1.3

A Python client for accessing an AnnoRepo server [view more]
  • Os
  • Python
Created: 2022-04-07
Modified: 2023-11-29
  • Web Application
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

asrservice 0.3

An Automatic Speech Recognition Service for a variety of languages, powered by WhisperX [view more]
  • Internet > WWW/HTTP > WSGI > Application
  • Text Processing > Linguistic
  • clam webservice rest nlp computational_linguistics rest
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2024-02-16
Modified: 2024-04-12
  • Command-line Application
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

auchann 0.2.0

The AuChAnn (Automatic CHAT Annotation) package can generate CHAT annotations based on a transcript-correction pairs of utterances. [view more]
Created: 2022-01-12
Modified: 2023-08-21
  • Web Application
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

Automatic Speech Recognition for Dutch 0.6.2

This is a web-based automatic speech recogniser for Dutch, capable of transcribing dutch speech recordings using multiple models. [view more]
  • Software for humanities
  • Speech Recognizing
  • dutch
  • nlp
  • speech recognition
  • Linux
Created: 2017-04-02

Blacklab & Corpus Search

  • Web Application
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

A Blacklab Server CLARIN FCS 2.0 endpoint 0.1

CLARIAH Federated content search corpora, developed by the Dutch Language Institute (INT), is a service to enable searching in multiple Dutch corpora at the same time. This application implements the CLARIN FCS 2.0 specification on top of Dutch language corpora. This repository hosts the source code. [view more]
  • BlackLab
  • CLARIN
  • corpus search
  • FCS 2.0
  • Federated Content Search
  • Nederlab
Created: 2016-09-11
Modified: 2023-05-10
  • Active: The project has reached a stable, usable state and is being actively developed.
Created: 2012-10-04
Modified: 2022-10-06
  • Web Application
  • Active: The project has reached a stable, usable state and is being actively developed.

INT Corpus Frontend 3.1.1

A web application to search corpora through the BlackLab Server web service. [view more]
  • corpus
Created: 2014-03-19
Modified: 2024-02-02
  • Active: The project has reached a stable, usable state and is being actively developed.
Created: 2022-06-29
Modified: 2024-09-02
  • Command-line Application
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

burgerLinker 0.0.1-SNAPSHOT

Command line tool for linking civil registries [view more]
Created: 2021-02-16
Modified: 2022-09-21
  • Command-line Application
  • Active: The project has reached a stable, usable state and is being actively developed.

CHAMD 0.5.12

  •   Jan Odijk
  •   Sheean Spoel
  •   Jelte van Boheemen
Conversion and cleaning of CHILDES CHA files into PaQu Plaintext Metadata Format (to convert to Alpino). [view more]
Created: 2017-03-15
Modified: 2024-03-13
  • Command-line Application
  • Server Application
  • Software Library
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

CLAM 3.2.10

Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice. [view more]
  • natural language processing
  • nlp
  • rest
  • webservice
  • Linux
Created: 2010-03-21
Modified: 2024-03-14
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

CLARIAH LD Proxy 1.0-SNAPSHOT

Keep you LD URI's resolvable [view more]
Created: 2021-04-18
Modified: 2021-06-28

CLARIAH Tool Discovery

  • Web Application
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

CLARIAH Tool Discovery 1.6.4

This is the over-arching project for CLARIAH Tool Discovery, its components harvest and aggregate codemeta from source repositories and service endpoints, automatically converting known metadata schemes in the process. This project holds the Tool Source Registry, pointing to all the tools that are to be harvested. It also holds the validation schema. [view more]
  • Browsing
  • Databases for humanities
  • Discovering
  • Exploration
  • Gathering
  • Software for humanities
  • codemeta
  • harvester
  • linked data
  • metadata
  • rdf
  • schema.org
  • software metadata
Created: 2022-01-05
Modified: 2024-06-04
  • Command-line Application
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

codemeta-harvester 0.4.0

Harvest and aggregate codemeta from source repositories and service endpoints, automatically converting known metadata schemes in the process [view more]
  • codemeta
  • harvester
  • linked data
  • metadata
  • rdf
  • schema.org
  • software metadata
Created: 2022-01-05
Modified: 2024-06-03
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

codemeta-lod-to-cmdi 1.0-SNAPSHOT

CLARIAH Tool Discovery output (LOD -> CMDI conversion) [view more]
Created: 2023-02-01
Modified: 2023-05-15
  • Server Application
  • 7 - Release Candidate: Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation may be in progress.
  • Active: The project has reached a stable, usable state and is being actively developed.

codemeta-server 0.4.1

Web API serving codemeta software metadata using codemeta and schema.org, provides a SPARQL endpoint and also offers a human web-interface [view more]
  • Software Development
  • codemeta
  • linked data
  • metadata
  • rdf
  • schema.org
  • scientific
  • software metadata
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2022-03-22
Modified: 2023-11-24
  • Command-line Application
  • Software Library
  • 7 - Release Candidate: Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation may be in progress.
  • Active: The project has reached a stable, usable state and is being actively developed.

codemeta2html 0.1.0

Convert software metadata in codemeta to html for visualisation, can generate fully-fledged static sites that serve well as a portal for a collection of software [view more]
  • Software Development
  • codemeta
  • linked data
  • metadata
  • rdf
  • schema.org
  • scientific
  • software metadata
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2023-05-06
Modified: 2023-05-15
  • Command-line Application
  • Software Library
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

CodeMetaPy 2.5.3

Codemetapy is a command-line tool and python library to work with the codemeta software metadata standard. Codemeta builds upon schema.org and defines a vocabulary for describing software source code. It maps various existing metadata standards to a unified vocabulary. Codemetapy allows you to generate codemeta from various sources. [view more]
  • Computer science
  • Converting
  • Software Development
  • codemeta
  • linked data
  • metadata
  • metadata-extractor
  • rdf
  • schema.org
  • scientific
  • software metadata
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2018-04-16
Modified: 2024-06-14
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

CMD2RDF 1.0.1

No description provided
  • cmdi
  • linked-data
  • metadata-conversion
  • rdf
Created: 2014-08-07
Modified: 2021-03-07
  • Suspended: Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.

COBALT

Corpus annotation tool [view more]
  • corpus
Created: 2014-04-14
Modified: 2020-07-17
  • Command-line Application
  • Active: The project has reached a stable, usable state and is being actively developed.

Colibri Core 2.5.9

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. [view more]
  • language modelling
  • natural language processing
  • ngrams
  • nlp
  • pattern recognition
  • skipgrams
  • Bsd
  • Linux
  • Macos
Created: 2013-09-15
  • Web Application
  • Active: The project has reached a stable, usable state and is being actively developed.

Corpus Editor for Syntactically Annotated Resources (Cesar) unknown

  •   Erwin Komen
Django web application that communicates with the CorpusStudioWeb back-end 'Crpp'. Two main purposes: (1) browse texts, (2) conduct syntactic searches with definable output per hit. Searches are translated to Xquery 'under the hood' [view more]
  • syntax
  • xquery
  • Posix
Created: 2018
  • Command-line Application
  • Active: The project has reached a stable, usable state and is being actively developed.

cow_csvw 1.21

Integrated CSV to RDF converter, using CSVW and nanopublications [view more]
  • csv
  • csvw
  • rdf
Created: 2023-11-09
Modified: 2024-03-08

DANE

  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

DANE 0.4.3

Utils for working with the Distributed Annotation and Enrichment system [view more]
  • Multimedia > Video
  • Scientific/Engineering > Artificial Intelligence
  • Software Development > Libraries > Python Modules
Created: 2019-11-25
Modified: 2024-05-13
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

dane-asr-worker 0.1.0

  •   Nanne van Noord
  •   Jaap Blom
Automatic speech recognition through an external service. Depends on DANE-server [view more]
  • Multimedia processing
Created: 2022-02-15
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

dane-download-worker 0.9.0

  •   Nanne van Noord
  •   Jaap Blom
Basic "DANE worker" that downloads input data via HTTP(s) URLs for further processing by other DANE workers. Depends on DANE-server [view more]
  • Multimedia processing
Created: 2022-02-08
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

DANE-server 0.3.1

Back-end for the Distributed Annotation 'n' Enrichment (DANE) system [view more]
Created: 2020-01-22
Modified: 2023-06-19
  • Software Library
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

dane-workflows 0.9.0

  •   Jaap Blom
  •  
  •   The Netherlands Institute for Sound and Vision
Python library for setting up simple data processing workflows (using DANE) [view more]
  • Multimedia processing
Created: 2022-07-18
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Suspended: Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.

deepfrog 0.2.1

  •   KNAW Humanities Cluster & CLST, Radboud University
A deep learning NLP suite (PoS,lemmatiser,NER) with FoLiA XML support [view more]
  • ['science', 'text-processing']
  • annotation
  • linguistics
  • nlp
  • text-processing
  • xml
Created: 2020-02-08
Modified: 2021-04-11
  • Active: The project has reached a stable, usable state and is being actively developed.
Created: 2022-07-19
Modified: 2024-05-22
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

did-summarizer

Linked Data summarizer driven by Decentralized Identifiers (DIDs) [view more]
Created: 2022-11-25
Modified: 2024-02-02
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Suspended: Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.
Created: 2019-04-29
Modified: 2020-07-08
  • Web Application
  • Active: The project has reached a stable, usable state and is being actively developed.

Electronisch woordenboek van de Achterhoekse en Liemerse dialecten unknown

  •   Erwin Komen
Django web application that facilities viewing and searching a dictionary of Dutch dialects from the regions 'Achterhoek' and 'Liemers' [view more]
  • dialect
  • dictionary
  • dutch
  • Posix
Created: 2019
  • Web Application
  • Active: The project has reached a stable, usable state and is being actively developed.

Electronisch woordenboek van de Gelderse dialecten unknown

  •   Erwin Komen
Django web application that facilities viewing and searching a dictionary of dialects from the Dutch province 'Noord-Brabant' as well as the Belgian provinces of Antwerpen, Vlaams-Brabant and Brussels [view more]
  • dialect
  • dictionary
  • dutch
  • Posix
Created: 2017
  • Web Application
  • Active: The project has reached a stable, usable state and is being actively developed.

Electronisch woordenboek van de Gelderse dialecten unknown

  •   Erwin Komen
Django web application that facilities viewing and searching a dictionary of Dutch dialects from the province 'Gelderland' [view more]
  • dialect
  • dictionary
  • dutch
  • Posix
Created: 2019
  • Web Application
  • Active: The project has reached a stable, usable state and is being actively developed.

Electronisch woordenboek van de Limburgse dialecten unknown

  •   Erwin Komen
Django web application that facilities viewing and searching a dictionary of the Dutch Limburgian dialects [view more]
  • dialect
  • dictionary
  • dutch
  • Posix
Created: 2016

FLAT

  • Web Application
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

FoLiA-Linguistic-Annotation-Tool 0.11.5

  •   KNAW Humanities Cluster & CLST, Radboud University
FLAT is a web-based linguistic annotation environment based around the FoLiA format (https://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm. [view more]
  • Text Processing > Linguistic
  • annotation
  • computational linguistics
  • folia
  • linguistics
  • nlp
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2014-01-02
Modified: 2024-07-05
  • Command-line Application
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

foliadocserve 0.7.8

  •   KNAW Humanities Cluster & CLST, Radboud University
The FoLiA Document Server is a backend HTTP service to interact with documents in the FoLiA format, a rich XML-based format for linguistic annotation (http://proycon.github.io/folia). It provides an interface to efficiently edit FoLiA documents through the FoLiA Query Language (FQL). [view more]
  • Text Processing > Linguistic
  • nlp computational_linguistics rest database document server
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2015-02-12
Modified: 2024-02-07

FoLiA

  • Software Library
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

folia 0.0.6

  •   KNAW Humanities Cluster & CLST, Radboud University
High-performance library for handling the FoLiA XML format (Format for Linguistic Annotation) [view more]
  • ['science', 'text-processing']
  • annotation
  • linguistics
  • nlp
  • text-processing
  • xml
Created: 2019-06-08
Modified: 2020-11-16
  • Command-line Application
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

FoLiA tools 2.5.7

  •   KNAW Humanities Cluster & CLST, Radboud University
FoLiA-tools contains various Python-based command line tools for working with FoLiA XML (Format for Linguistic Annotation) [view more]
  • Annotating
  • https://w3id.org/nwo-research-fields#ComputationalLinguisticsandPhilology
  • Textual and linguistic corpora
  • annotation
  • computational linguistics
  • folia
  • nlp
  • search
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2011-01-14
Modified: 2024-05-14
  • Software Library
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

FoLiApy 2.5.11

  •   KNAW Humanities Cluster & CLST, Radboud University
An extensive library for processing FoLiA documents. FoLiA stands for Format for Linguistic Annotation and is a very rich XML-based format used by various Natural Language Processing tools. [view more]
  • Annotating
  • https://w3id.org/nwo-research-fields#ComputationalLinguisticsandPhilology
  • Textual and linguistic corpora
  • annotation
  • computational linguistics
  • folia
  • format
  • nlp
  • xml
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2010-05-27
Modified: 2024-03-28
  • Command-line Application
  • Active: The project has reached a stable, usable state and is being actively developed.

foliautils 0.22

Command-line utilities for working with the Format for Linguistic Annotation (FoLiA). [view more]
  • folia
  • linguistic annotation
  • natural language processing
  • nlp
  • xml
  • Posix
  • Command-line Application
  • Software Library
  • Active: The project has reached a stable, usable state and is being actively developed.

libfolia 2.20

This is a C++ Library for working with the Format for Linguistic Annotation (FoLiA). [view more]
  • folia
  • linguistic annotation
  • natural language processing
  • nlp
  • xml
  • Posix
  • Web Application
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

piereling 0.4

  •   KNAW Humanities Cluster & CLST, Radboud University
Piereling is a webservice and web-application to convert between a variety of document formats, mostly from and to FoLiA XML. It is intended for NLP pipelines. [view more]
  • Internet > WWW/HTTP > WSGI > Application
  • Text Processing > Linguistic
  • webservice nlp computational_linguistics rest folia conversion
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2019-10-18
Modified: 2023-11-01
  • Web Application
  • Active: The project has reached a stable, usable state and is being actively developed.

Forced Alignment 2 0.3.1

This webservice provides an output file with word alignments given an NL speech recording and a transcription. [view more]
  • alignment
  • speech recognition
  • Linux
Created: 2020-03

Frog

  • Command-line Application
  • Software Library
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.
thumbnail/logo

Frog 0.33

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. It performs automatic linguistic enrichment such as part of speech tagging, lemmatisation, named entity recognition, shallow parsing, dependency parsing and morphological analysis. All NLP modules are based on TiMBL. [view more]
  • Annotating
  • Contextualizing
  • Linguistics
  • Named Entity Recognition
  • POS-Tagging
  • Segmenting
  • Tagging
  • Textual and content analysis
  • Tree-Tagging
  • dependency parsing
  • dutch
  • lemma
  • lemmatisation
  • natural language processing
  • ner
  • nlp
  • parser
  • part-of-speech tagging
  • pos
  • shallow parsing
  • tagger
  • Bsd
  • Linux
  • Macos
Created: 2011-03-31
Modified: 2023-12-05
  • Web Application
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

Frog-Webservice 2.7

Frog is a suite containing a tokeniser, Part-of-Speech tagger, lemmatiser, morphological analyser, shallow parser, and dependency parser for Dutch. This is the webservice for it, for both humans and machines. [view more]
  • Annotating
  • Contextualizing
  • Linguistics
  • Named Entity Recognition
  • POS-Tagging
  • Segmenting
  • Tagging
  • Textual and content analysis
  • Tree-Tagging
  • clam webservice rest nlp computational_linguistics rest
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2022-02-17
Modified: 2023-12-05
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

python-frog 0.6.10

Python binding to Frog, an NLP suite for Dutch doing part-of-speech tagging, lemmatisation, morphological analysis, named-entity recognition, shallow parsing, and dependency parsing. [view more]
  • Annotating
  • Contextualizing
  • Linguistics
  • Named Entity Recognition
  • POS-Tagging
  • Segmenting
  • Tagging
  • Textual and content analysis
  • Tree-Tagging
  • nlp computational_linguistics dutch pos lemmatizer
  • Bsd
  • Cython
  • Linux
  • Macos
  • Python
Created: 2014-09-07
Modified: 2023-12-05
  • Active: The project has reached a stable, usable state and is being actively developed.

toad v0.8

Toad: Trainer Of All Data, the Frog training collection [view more]
Created: 2015-12-08
Modified: 2023-02-22
  • Proof of Concept: An initial proof-of-concept implementation of the technology is available (alpha). It is not mature enough for end-users yet.
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

fusus 0.0.2

  •   Among, A Community for DH and MS
Workflow for converting Arabic scanned pages into readable text [view more]
  • Religion
  • Scientific/Engineering > Information Analysis
  • Sociology > History
  • Text Processing
  • Text Processing > Fonts
  • Text Processing > Markup
  • arabic
  • image processing
  • islam
  • medieval
  • OCR
  • text
  • Macos
  • Microsoft
  • Posix
  • Python
Created: 2020-03-03
Modified: 2023-04-11
  • Web Application
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Active: The project has reached a stable, usable state and is being actively developed.

g2pservice 0.3.4

  •   Louis ten Bosch
Grapheme to Phoneme converter. Input is a list of words (utf8). Choose one of the language options. [view more]
  • Internet > WWW/HTTP > WSGI > Application
  • Text Processing > Linguistic
  • speech
  • transcription
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2019-02-25
Modified: 2023-05-12

GaLAHaD

  • Server Application
  • Software Image
  • Web API
  • Web Application
  • 6 - Late prototype: Technology demonstrated in target setting, end-users adopt it for testing purposes.
  • Active: The project has reached a stable, usable state and is being actively developed.

GaLAHaD 1.2.2

GaLAHaD (Generating Linguistic Annotations for Historical Dutch) allows linguists to compare taggers, tag their own corpora, evaluate the results and export their tagged documents. [view more]
  • Analyzing
  • Annotating
  • Artificial intelligence, export systems
  • Comparing
  • Computational linguistics and philology
  • Converting
  • Enriching
  • Lemmatizing
  • Linguistics
  • Machine Learning
  • Merging
  • POS-Tagging
  • Software for humanities
  • Tagging
  • Textual and linguistic corpora
  • Jvm
  • Linux
  • Node
Created: 2024-05-31
Modified: 2024-08-30
  • 6 - Late prototype: Technology demonstrated in target setting, end-users adopt it for testing purposes.
  • Active: The project has reached a stable, usable state and is being actively developed.

GaLAHaD Train Battery 1.0.0

Python program for training linguistic annotation taggers based on a configuration file and list of datasets. It prepares the resulting trained models for dockerization and adds relevant metadata. It is tagger software agnostic as long as a simple python shell is built around it. [view more]
  • Artificial intelligence, export systems
  • Computational linguistics and philology
  • Linguistics
  • Linux
  • Python
Created: 2024-05-31
Modified: 2024-06-04
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

int-pie 1.0.0

  •   Enrique Manjavacas
  •   Mike Kestemont
  •   Thibault Clerice
The PIE tagger with custom modifications by the Dutch Language Institute (INT). [view more]
  • Analyzing
  • Annotating
  • Artificial intelligence, export systems
  • Computational linguistics and philology
  • Enriching
  • Lemmatizing
  • Linguistics
  • Machine Learning
  • POS-Tagging
  • Tagging
  • Linux
  • Python
Created: 2024-05-31
Modified: 2024-06-05
  • Command-line Application
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Unsupported: The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.

Gecco 0.3.0

  •   KNAW Humanities Cluster & CLST, Radboud University
Generic Environment for Context-Aware Correction of Orthography [view more]
  • Text Processing > Linguistic
  • spelling corrector spell check nlp computational_linguistics rest
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2015-01-08
Modified: 2020-07-11
  • Command-line Application
  • Active: The project has reached a stable, usable state and is being actively developed.

Generale Missieven in Text-Fabric v1.1e

Conversion of Generale Missieven to Text-Fabric and tutorial how to work with the result [view more]
  • corpus-data
  • corpus-linguistics
  • corpus-processing
  • corpus-tools
  • dutch
  • history
  • nlp
  • Linux
  • Macos
  • Python
  • Windows
Created: 2020-09-02
Modified: 2024-03-27
  • Command-line Application
  • Web Application
  • https://w3id.org/research-technology-readiness-level#Level8Complete
    Warning: Status is not expressed in a known vocabulary
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

Glem 1.3.1

  •   Corien Bary
  •   Peter Berck
  •   Iris Hendrickx
  •   Wessel Stoop
GLEM is a lemmatizer for Ancient Greek. [view more]
  • Annotating
  • Computational linguistics and philology
  • Greek and Latin philology and literature
  • ancient greek
  • greek
  • lemma
  • lemmatisation
  • natural language processing
  • nlp
  • Posix
Created: 2017-04-09
Modified: 2023-10-05
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

grlc: the git repository linked data API constructor 1.3.7

  •   Albert Meroño-Peñuela
  •   Carlos Martinez
grlc, the git repository linked data API constructor, automatically builds Web APIs using SPARQL queries stored in git repositories. [view more]
  • linked-data
  • linked-data-api
  • semantic-web
  • sparql
  • swagger-ui
Created: 2015-11-13
Modified: 2022-03-21
  • Active: The project has reached a stable, usable state and is being actively developed.

hypodisc 0.1.0

  •   VU University Amsterdam
Hypothesis Discovery on RDF Knowledge Graphs [view more]
  • hypothesis generation
  • knowledge graphs
  • pattern discovery
  • rdf
  • Os
  • Python
Created: 2023-07-27
Modified: 2024-06-03
  • Web Application
  • Active: The project has reached a stable, usable state and is being actively developed.

I-Analyzer 5.3.0

  •   Research Software Lab, Centre for Digital Humanities, Utrecht University
I-analyzer is a tool for exploring corpora (large collections of texts). You can use I-analyzer to find relevant documents, or to make visualisations to understand broader trends in the corpus. The interface is designed to be accessible for users of all skill levels. I-analyzer is primarily intended for academic research and higher education. We focus on data that is relevant for the humanities, but we are open to datasets that are relevant for other fields. [view more]
  • corpus research
  • data visualization
  • elasticsearch
  • natural language processing
  • text-mining
Created: 2016-09-01
Modified: 2023-12-08
  • Web Application

ineo

No description provided
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.
Created: 2023-05-17
Modified: 2024-09-19
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

Kaldi_NL v0.4.3

Code related to the Dutch instance and user groups of the KALDI speech recognition toolkit [view more]
  • dutch
  • kaldi
  • speech-recognition
  • speech-recognition-model
Created: 2016-04-22
Modified: 2023-11-01
  • Unsupported: The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.

LaMachine 2.28

LaMachine is a unified software distribution for Natural Language Processing. We integrate numerous open-source NLP tools, programming libraries, web-services, and web-applications in a single Virtual Research Environment that can be installed on a wide variety of machines. [view more]
  • installer
  • natural language processing
  • nlp
  • python
  • software distribution
  • Posix
Created: 2015-05-17

Lenticular Lens

  • Web Application
  • Active: The project has reached a stable, usable state and is being actively developed.

lenticular-lens 1.17

Lenticular Lens is a tool which allows users to construct linksets between entities from different Timbuctoo datasets (so called data-alignment or reconciliation). Lenticular Lens tracks the configuration and the algorithms used in the alignment and is also able to report on manual corrections and the amount of manual validation done. [view more]
Created: 2019-01-16
Modified: 2022-10-26
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.
Created: 2021-01-22
Modified: 2023-08-16
  • Software Library
  • 7 - Release Candidate: Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation may be in progress.
  • Active: The project has reached a stable, usable state and is being actively developed.

lingua-cli 0.4.0

  •   KNAW Humanities Cluster & CLST, Radboud University
Lingua-cli is a command line tool for language classification, using the lingua-rs library. [view more]
Created: 2022-04-16
Modified: 2024-06-10
  • Command-line Application
  • Software Library
  • Active: The project has reached a stable, usable state and is being actively developed.

mbt 3.10

MBT is a memory-based tagger-generator and tagger in one. The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger part can tag new sequences. MBT can, for instance, be used to generate part-of-speech taggers or chunkers for natural language processing. It has also been used for named-entity recognition, information extraction in domain-specific texts, and disfluency chunking in transcribed speech. [view more]
  • machine learning
  • memory based learning
  • natural language processing
  • nlp
  • tagger
  • Bsd
  • Linux
  • Macos

Media Suite

  • Web Application
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

CLARIAH Media Suite 6.10

  •   Jaap Blom
The CLARIAH Media Suite is a research environment in which researchers can search, bookmark, annotate and compare items from a number of cultural heritage collections [view more]
  • collection analysis
  • cultural heritage
  • data portal
  • faceted search
  • scholerly annotation
  • virtual workspace
  • Linux
Created: 2023-11-21
Modified: 2023-11-21

Nederlab

  • Software Library
  • Active: The project has reached a stable, usable state and is being actively developed.

MTAS 8.11.1.0

Multi Tier Annotation Search, a Solr/Lucene based library and plugin providing search and analysis on annotated and structured text. [view more]
  • annotations
  • big-data
  • cql
  • distributed
  • lucene
  • search
  • search-engine
  • search-in-text
  • solr
  • structure
  • text
  • text-analysis
Created: 2016-07-11
Modified: 2022-01-14
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

Nederlab Pipeline 0.8.0

A set of workflows for linguistic enrichment of historical dutch [view more]
  • natural language processing
  • nlp
  • Posix
Created: 2017

Netwerk Digitaal Erfgoed (NDE)

  • Web Application
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

Network of Terms GraphQL API

GraphQL API for the Network of Terms, a Search engine for finding terms in terminology sources (such as thesauri, classification systems and reference lists) [view more]
  • Identifying
  • graphql
  • linked-data
  • search
Created: 2020-04-17
  • Web Application
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

Network of Terms Reconciliation API

Reconciliation API for the Network of Terms, a Search engine for finding terms in terminology sources (such as thesauri, classification systems and reference lists) [view more]
  • Identifying
  • graphql
  • linked-data
  • search
Created: 2020-04-17
  • Software Library
  • Suspended: Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.

OpenDutchWordnet

This repo provides a python module to work with Open Dutch WordNet. It was created using python 3.4. [view more]
Created: 2015-09-01
Modified: 2021-05-11
  • Command-line Application
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Active: The project has reached a stable, usable state and is being actively developed.

pagexml-tools 0.5.0

  •   Marijn Koolen
  •   Bram Buitendijk
Utility functions for reading PageXML files [view more]
  • Scientific/Engineering
  • Os
  • Python
Created: 2021-05-07
Modified: 2024-03-18
  • Web Application
  • Complete: The technology is complete, stable and deployed in production scenarios for end-users
  • Active: The project has reached a stable, usable state and is being actively developed.

PaQu 1.0.5

Met PaQu (Parse & Query) kun je zoeken in syntactisch geannoteerde Nederlandstalige corpora. PaQu ondersteunt twee manieren van zoeken. Met de eerste, eenvoudige, manier kun je naar woordparen zoeken, met daarbij eventueel hun syntactische relatie. De tweede, ingewikkeldere, manier gebruikt de zoektaal XPath. In PaQu is een aantal syntactisch geannoteerde corpora standaard beschikbaar. Maar het is ook mogelijk om je eigen teksten aan te bieden. Deze teksten worden dan door de automatische ontleder geanalyseerd, en opgeslagen. Vervolgens kun je dan op dezelfde manier in je eigen teksten zoeken. [view more]
  • Linguistics
  • nwo:ComputationalLinguisticsandPhilology
  • Software for humanities
  • Structural Analysis
  • Alpino
  • Dependency parsing
  • SPOD: Syntactic profiler of Dutch
  • UD: Universal Dependencies
  • XPath
  • Docker
  • Linux
Created: 2014-05-21
Modified: 2024-04-24
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.
Created: 2022-04-19
Modified: 2024-04-11
  • Command-line Application
  • Software Library
  • Web Application
  • 6 - Late prototype: Technology demonstrated in target setting, end-users adopt it for testing purposes.
  • Active: The project has reached a stable, usable state and is being actively developed.
thumbnail/logo

Ricgraph - Research in context graph v2.4

  •   Rik D.T. Janssen
Ricgraph, also known as Research in context graph, enables the exploration of researchers, teams, their results, collaborations, skills, projects, and the relations between these items. Ricgraph can store many types of items into a single graph. These items can be obtained from various systems and from multiple organizations. Ricgraph facilitates reasoning about these items because it infers new relations between items, relations that are not present in any of the separate source systems. It is flexible and extensible, and can be adapted to new application areas. Throughout this text, we illustrate how Ricgraph works by applying it to the application area research information. Motivation Ricgraph, also known as Research in context graph, is software that is about relations between items. These items can be collected from various source systems and from multiple organizations. We explain how Ricgraph works by applying it to the application area research information. We show the insights that can be obtained by combining information from various source systems, insight arising from new relations that are not present in each separate source system. Research information is about anything related to research: research results, the persons in a research team, their collaborations, their skills, projects in which they have participated, as well as the relations between these entities. Examples of research results are publications, data sets, and software. Example use cases from the application area research information are: (1) As a journalist, I want to find researchers with a certain skill and their publications, so that I can interview them for a newspaper article. (2) As a librarian, I want to enrich my local research information system with research results that are in other systems but not in ours, so that we have a more complete view of research at our university. (3) As a researcher, I want to find researchers from other universities that have co-authored publications written by the co-authors of my own publications, so that I can read their publications to find out if we share common research interests. These use cases use different types of information (called items): researchers, skills, publications, etc. Most often, these types of information are not stored in one system, so the use cases may be difficult or time-consuming to answer. However, by using Ricgraph, these use cases (and many others) are easy to answer. Although this text illustrates Ricgraph in the application area research information, the principle "relations between items from various source systems" is general, so Ricgraph can be used in other application areas. Main contributions of Ricgraph (1) Ricgraph can store many types of items in a single graph. (2) Ricgraph harvests multiple source systems into a single graph. (3) Ricgraph Explorer is the exploration tool for Ricgraph. (4) Ricgraph facilitates reasoning about items because it infers new relations between items. (5) Ricgraph can be tailored for an application area. Read more about Ricgraph For a gentle introduction in Ricgraph, read the reference publication: Rik D.T. Janssen (2024). Ricgraph: A flexible and extensible graph to explore research in context from various systems. SoftwareX, 26(101736). https://doi.org/10.1016/j.softx.2024.101736 Extensive documentation, publications, videos and source code can be found in the GitHub repository https://github.com/UtrechtUniversity/ricgraph The website for Ricgraph can be found at https://www.ricgraph.eu [view more]
  • Analyzing
  • Browsing
  • Capturing
  • Discovering
  • Enriching
  • Exploration
  • Information Retrieval
  • Storing
  • Data enrichment
  • Data harvesting
  • Data linking
  • Graph
  • Graph database
  • Knowledge graph
  • Linked data
  • Metadata
  • Research in context graph
  • Ricgraph
  • Ricgraph Explorer
  • Ricgraph REST API
  • Utrecht University
Created: 2023-01-10
Modified: 2024-09-10
  • Web Application
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

search-ui 1.0.0

This repository contains the code for a Search UI to test the functionality of the basic vocabulary-recommender. [view more]
Created: 2022-11-15
Modified: 2022-12-21
  • Web Application
  • Active: The project has reached a stable, usable state and is being actively developed.

shebanq v4.2z

Exposing the Hebrew Text Database of the ETCBC [view more]
  • annotation
  • etcbc
  • etcbc-data
  • hebrew
  • hebrew-bible
  • search-engine
  • text-fabric
  • Linux
  • Macos
  • Python
  • Selinux
  • Windows
Created: 2017-10-19
Modified: 2022-10-12
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

SPAQ

SPAQ (speech aquisition using Surveys) [view more]
Created: 2021-01-26
Modified: 2024-02-14

STAM

  • Software Library
  • 7 - Release Candidate: Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation may be in progress.
  • Active: The project has reached a stable, usable state and is being actively developed.
thumbnail/logo

stam 0.16.3

STAM is a powerful library for dealing with stand-off annotations on text. This is the Rust library. [view more]
  • Annotating
  • Textual and content analysis
  • Textual and linguistic corpora
  • annotation
  • linguistics
  • nlp
  • standoff
  • text-processing
Created: 2023-01-03
Modified: 2024-10-04
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Active: The project has reached a stable, usable state and is being actively developed.
thumbnail/logo

stam v1.1.1

Stand-off Text Annotation Model (STAM) is a data model for stand-off-text annotation where any information on a text is represented as an annotation. This repository contains the model's full specification, extensions, schemas, examples and documentation. [view more]
  • Annotating
  • Textual and content analysis
  • Textual and linguistic corpora
  • annotation
  • linguistics
  • stand-off
  • text
  • text-annotation
  • webannotation
Created: 2021-09-09
Modified: 2024-09-17
  • Software Library
  • 7 - Release Candidate: Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation may be in progress.
  • Active: The project has reached a stable, usable state and is being actively developed.
thumbnail/logo

stam 0.10.0

STAM is a library for dealing with standoff annotations on text, this is the python binding. [view more]
  • Annotating
  • Textual and content analysis
  • Textual and linguistic corpora
  • annotation
  • linguistics
  • nlp
  • standoff
  • text-processing
Created: 2023-01-31
Modified: 2024-10-04
  • Command-line Application
  • 7 - Release Candidate: Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation may be in progress.
  • Active: The project has reached a stable, usable state and is being actively developed.

stam-tools 0.9.0

Command-line tools for working with stand-off annotations on text (STAM) [view more]
  • Annotating
  • Textual and content analysis
  • Textual and linguistic corpora
  • annotation
  • linguistics
  • nlp
  • standoff
  • text-processing
Created: 2023-03-21
Modified: 2024-10-04
  • Web Application
  • Active: The project has reached a stable, usable state and is being actively developed.

T-Scan 0.10.0

T-Scan is an analysis tool for Dutch texts to assess the complexity of the text, and is based on original work by Rogier Kraf [view more]
  • dutch
  • feature extraction
  • natural language processing
  • nlp
  • readability
  • Posix
Created: 2012-09-12
  • Command-line Application
  • Software Library
  • Web Application
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.
thumbnail/logo

text-fabric 12.5.5

  •   Dirk Roorda
Processor and browser for annotated text corpora [view more]
  • Archiving
  • Bible studies
  • Commenting
  • Computational linguistics and philology
  • Highlighting
  • Information Retrieval
  • Interpreting
  • Religious studies and theology
  • Rhetorical Analysis
  • Sharing
  • Structural Analysis
  • Textual and content analysis
  • Textual and linguistic corpora
  • akkadian
  • babylonian
  • bible
  • cuneiform
  • database
  • graph
  • greek
  • hebrew
  • linguistics
  • peshitta
  • quran
  • syriac
  • text
  • uruk
  • Javascript
  • Macos
  • Microsoft
  • Posix
  • Python
Created: 2017-10-19
Modified: 2024-10-03
  • Web Application
  • Active: The project has reached a stable, usable state and is being actively developed.
Created: 2022-03-08
Modified: 2024-10-07

TextRepo

  • Active: The project has reached a stable, usable state and is being actively developed.
Created: 2019-08-07
Modified: 2022-03-15
  • Command-line Application
  • Active: The project has reached a stable, usable state and is being actively developed.
Created: 2021-03-08
Modified: 2022-04-08

TICCL & PICCL

  • Unsupported: The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.

PICCL 0.9.5

A set of workflows for corpus building through OCR, post-correction, and normalisation. [view more]
  • natural language processing
  • nlp
  • ocr
  • Posix
Created: 2015
  • Software Library
  • Unsupported: The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.

TICCLTools 0.10

TicclTools is a collection of programs to process datafiles, together they constitute the bulk of TICCL: Text Induced Corpus-Cleanup. This software consists of individual modules that are invoked by the pipeline system PICCL. [view more]
  • natural language processing
  • nlp
  • normalization
  • ocr
  • Posix
Created: 2015

TiMBL

  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Active: The project has reached a stable, usable state and is being actively developed.

python3-timbl 2020.6.8

Python 3 language binding for the Tilburg Memory-Based Learner [view more]
  • Scientific/Engineering
  • Text Processing > Linguistic
  • k-nearest-neighbours
  • knn
  • machine-learning
  • python
  • timbl
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2013-02-11
Modified: 2020-06-08
  • Command-line Application
  • Software Library
  • Active: The project has reached a stable, usable state and is being actively developed.

TiMBL 6.9

TiMBL is an open source software package implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification with feature weighting suitable for symbolic feature spaces, and IGTree, a decision-tree approximation of IB1-IG. All implemented algorithms have in common that they store some representation of the training set explicitly in memory. During testing, new cases are classified by extrapolation from the most similar stored cases. [view more]
  • decision tree
  • k-nearest neighbours
  • knn
  • machine learning
  • memory based learning
  • natural language processing
  • nlp
  • Bsd
  • Linux
  • Macos
Created: 1998
  • Active: The project has reached a stable, usable state and is being actively developed.

Timbuctoo 7.15

  •   Ronald Haentjens Dekker
  •   Pratham Joshi
  •   Meindert Kroese
  •   Martijn Maas
  •   Kerim Meijer
  •   Jauco Noordzij
  •   Walter Ravenek
  •   Henk van den Berg
  •   René van der Ark
An RDF datastore that gives researchers control over the sharing of data between datasets [view more]
  • berkeley-db
  • graphql
  • humanities
  • java
  • r2rml
  • rdf
Created: 2012-08-15
Modified: 2024-03-01

Ucto

  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

python-ucto 0.6.8

  •   KNAW Humanities Cluster & CLST, Radboud University
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto). [view more]
  • Text Processing > Linguistic
  • tokenizer tokenization tokeniser tokenisation nlp computational_linguistics ucto
  • Bsd
  • Cython
  • Linux
  • Macos
  • Python
Created: 2014-05-21
Modified: 2024-09-12
  • Command-line Application
  • Software Library
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.
thumbnail/logo

ucto 0.34

Ucto tokenizes text files: it separates words from punctuation, and splits sentences. This is one of the first tasks for almost any Natural Language Processing application. Ucto offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. [view more]
  • Annotating
  • Linguistics
  • Tagging
  • Textual and content analysis
  • natural language processing
  • nlp
  • tokenization
  • tokenizer
  • Bsd
  • Linux
  • Macos
Created: 2011-03-27
Modified: 2023-02-22
  • Web Application
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

Ucto-Webservice 2.5.2

  •   KNAW Humanities Cluster & CLST, Radboud University
Ucto is a rule-based tokeniser for multiple languages. This is the webservice for it, for both humans and machines. [view more]
  • Annotating
  • Linguistics
  • Tagging
  • Textual and content analysis
  • clam webservice rest nlp computational_linguistics rest
  • Bsd
  • Linux
  • Macos
  • Python
Created: 2022-04-08
Modified: 2024-03-14
  • Web Application
  • Active: The project has reached a stable, usable state and is being actively developed.

udpipe-service 4.10

A rest service for an R / udpipe based tokenizer, lemmatizer, pos-tagger and dependency parser. See https://bitbucket.org/fryske-akademy/udpipe for (docker) setup. [view more]
Created: 2020-11-18
Modified: 2023-11-26
  • Software Library
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

vocabulary-recommender 2.0.0

A generic linked data vocabulary recommender library that provides recommendation functions for various backends. [view more]
  • lod
  • namespace
  • recommender
  • vocabulary
Created: 2022-09-05
Modified: 2022-12-23
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.
Created: 2018-07-25
Modified: 2019-03-24