Here you find all tools (i.e. software and software services) developed in the CLARIAH project, as well as some tools from predecessors and sister projects. This list is automatically harvested from the tool producers and providers themselves, and updated daily. Our tools are designed for researchers and developers in the Humanities and Social Sciences. Not all tools are suitable for all audiences and not all tools are mature and stable, this information should be clearly indicated for each tool, so you can make an informed judgement whether a tool might be suitable for you.
Are you a developer and is your tool not included in the index yet or do you have questions or comments on the metadata? Please read our contribution guidelines
Name | Version | Interface type | Description | Links | Status | Maintainer | Authors | Producer/Provider |
---|---|---|---|---|---|---|---|---|
Alpino | ||||||||
Alpino | 0.0.0 |
|
Alpino parser and related tools for Dutch [view more]
Category:
|
|
||||
Alpino |
|
|
|
|
|
|||
Alpino-Webservice | 2.4 2023-11-01 11:56:10 +0100 |
|
Alpino is a dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing, developed by Gertjan van Noord at the University of Groningen. This is the webservice for it. You can upload either tokenised or untokenised files (which will be automatically tokenised for you using ucto), the output will consist of a zip file containing XML files, one for each sentence in the input document. [view more]
Category:
Keywords:
|
|
|
|||
Alpino Webservice | 2.4 |
|
Alpino is a dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing, developed by Gertjan van Noord at the University of Groningen. You can upload either tokenised or untokenised files (which will be automatically tokenised for you using ucto), the output will consist of a zip file containing XML files, one for each sentence in the input document. |
|
|
|||
AlpinoGraph | 1.0.5 2024-04-24 |
|
AlpinoGraph is een tool om syntactisch geannoteerde corpora te doorzoeken. De tool maakt gebruik van AgensGraph. AgensGraph combineert databasetechnologie (PostgreSQL) en Cypher, de standaard zoektaal voor grafen. De zoek-queries die je in AlpinoGraph kunt gebruiken zijn daarom een mix van SQL en Cypher. Daar voegt AlpinoGraph nog enkele extra uitbreidingen aan toe, zoals een eenvoudig maar handig systeem van macro's, en visualisatie van de resultaten. [view more]
Category:
Keywords:
|
|
||||
AlpinoGraph |
|
|
|
|
||||
alud | 2.14.0 2024-04-24 |
|
A Go package for deriving Universal Dependencies from Dutch sentences parsed with Alpino [view more]
Category:
Keywords:
|
|
||||
alud |
|
|
|
|
|
|||
github.com/rug-compling/alud |
|
|
|
|
|
|||
analiticcl | 0.4.6 2024-04-22 11:29:55 +0200 |
|
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation [view more]
Keywords:
|
|
|
|||
AnnoRepo | ||||||||
AnnoRepo | 0.6.3 2024-04-03 17:17:29 +0200 |
|
Implementation of W3C Web Annotation Protocol (root project) [view more]
Keywords:
|
|
||||
annorepo-client | 0.1.3 2023-11-29 16:33:58 +0100 |
|
A Python client for accessing an AnnoRepo server [view more] |
|
||||
version |
|
|
|
|
|
|||
asrservice | 0.3 2024-04-12 10:39:45 +0200 |
|
An Automatic Speech Recognition Service for a variety of languages, powered by WhisperX [view more]
Category:
Keywords:
|
|
||||
Automatic Speech Recognition Service | 0.3 |
|
|
|
||||
asrservice |
|
|
|
|
|
|||
auchann | 0.2.0 2023-08-21 12:04:40 +0200 |
|
The AuChAnn (Automatic CHAT Annotation) package can generate CHAT annotations based on a transcript-correction pairs of utterances. [view more] |
|
|
|||
auchann |
|
|
|
|
|
|||
Automatic Speech Recognition for Dutch | 0.6.2 |
|
This is a web-based automatic speech recogniser for Dutch, capable of transcribing dutch speech recordings using multiple models. [view more]
Category:
Keywords:
|
|
||||
Automatic Transcription of Dutch Speech Recordings | 0.6.1 |
|
This webservice uses automatic speech recognition to provide the transcriptions of recordings spoken in Dutch. You can upload and process only one file per project. For bulk processing and other questions, please contact Henk van den Heuvel at h.vandenheuvel@let.ru.nl. |
|
|
|||
Automatic Speech Recognition for Dutch |
|
|
|
|
|
|||
Blacklab & Corpus Search | ||||||||
A Blacklab Server CLARIN FCS 2.0 endpoint | 0.1 2023-05-10 15:46:27 +0200 |
|
CLARIAH Federated content search corpora, developed by the Dutch Language Institute (INT), is a service to enable searching in multiple Dutch corpora at the same time. This application implements the CLARIN FCS 2.0 specification on top of Dutch language corpora. This repository hosts the source code. [view more]
Keywords:
|
|
||||
FCS Aggregator |
|
The Aggregator application is a part of the CLARIN-FCS common federated content search infrastructure. It serves as a user interface to perform queries to CLARIN-resources and display search results. The Aggregator communicates with components called endpoints, which are provided as a service by all centres who participate in the federated content search. Each endpoint provides access to one or more searchable resources. The user can select a specific resource or resources, based on the resource name or on the language, or search through all of them. The content of these resources is searched with the query supplied to the endpoint. The endpoint returns results to this query and the aggregator collects the responses from all the endpoints and displays them to the user. |
|
|
|
|||
Dutch FCS endpoints hosted at INT |
|
CLARIAH Federated content search backends - instances for several Dutch corpora |
|
|
|
|
||
BlackLab Corpus Search | 3.0.1 2022-10-06 13:08:42 +0200 |
|
The parent project for BlackLab Core and BlackLab Server. [view more]
Keywords:
|
|
||||
INT Corpus Frontend | 3.1.1 2024-02-02 16:25:03 +0300 |
|
A web application to search corpora through the BlackLab Server web service. [view more]
Keywords:
|
1 |
||||
Brieven als Buit search |
|
Brieven als Buit provided by the Dutch Language Institute in Leiden. |
|
|
|
|||
Corpus Hedendaags Nederlands |
|
CHN, provided by the Dutch Language Institute in Leiden. |
|
|
|
|||
OpenSoNaR |
|
OpenSoNaR, provided by the Dutch Language Institute in Leiden. |
|
|
|
|||
Broccoli | 0.39.0 2024-09-02 13:54:06 +0200 |
|
Da Broker 🥦 [view more] |
|
|
|
||
burgerLinker | 0.0.1-SNAPSHOT 2022-09-21 11:03:01 +0200 |
|
Command line tool for linking civil registries [view more] |
|
|
|
||
burgerLinker |
|
|
|
|
|
|||
CHAMD | 0.5.12 2024-03-13 11:22:57 +0100 |
|
Conversion and cleaning of CHILDES CHA files into PaQu Plaintext Metadata Format (to convert to Alpino). [view more] |
|
|
|
||
chamd |
|
|
|
|
|
|||
CLAM | 3.2.10 2024-03-14 |
|
Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice. [view more]
Keywords:
|
|
||||
clam |
|
CLAM Data & Client API - programming library for Python |
|
|
|
|
||
clamnewproject |
|
Developer tool to create a new CLAM project |
|
|
|
|
||
clamservice |
|
Webservice daemon, the core component of CLAM. May be invoked directly in development, often invoken indirectly via WSGI in production environments. |
|
|
|
|
||
CLARIAH LD Proxy | 1.0-SNAPSHOT 2021-06-28 21:21:07 +0200 |
|
Keep you LD URI's resolvable [view more] |
|
|
|
||
CLARIAH Tool Discovery | ||||||||
CLARIAH Tool Discovery | 1.6.4 2024-06-04 13:07:44 |
|
This is the over-arching project for CLARIAH Tool Discovery, its components harvest and aggregate codemeta from source repositories and service endpoints, automatically converting known metadata schemes in the process. This project holds the Tool Source Registry, pointing to all the tools that are to be harvested. It also holds the validation schema. [view more]
Category:
Keywords:
|
|
||||
CLARIAH Tools |
|
This is a web portal where you can find all tools (i.e. software and software services) developed in the CLARIAH project, as well as some tools from predecessors and sister projects. This list is automatically harvested from the tool producers and providers themselves, and updated daily. Our tools are designed for researchers and developers in the Humanities and Social Sciences. Not all tools are suitable for all audiences and not all tools are mature and stable, this information should be clearly indicated for each tool, so you can make an informed judgement whether a tool might be suitable for you. |
|
|
|
|||
codemeta-harvester | 0.4.0 2024-06-03 11:36:44 |
|
Harvest and aggregate codemeta from source repositories and service endpoints, automatically converting known metadata schemes in the process [view more]
Keywords:
|
|
||||
codemeta-harvester |
|
|
|
|
|
|||
codemeta-lod-to-cmdi | 1.0-SNAPSHOT 2023-05-15 11:54:11 +0200 |
|
CLARIAH Tool Discovery output (LOD -> CMDI conversion) [view more] |
|
|
|
||
codemeta-server | 0.4.1 2023-11-24 |
|
Web API serving codemeta software metadata using codemeta and schema.org, provides a SPARQL endpoint and also offers a human web-interface [view more]
Category:
Keywords:
|
|
||||
codemeta-server |
|
|
|
|
|
|||
codemeta2html | 0.1.0 2023-05-15 18:08:47 +0100 |
|
Convert software metadata in codemeta to html for visualisation, can generate fully-fledged static sites that serve well as a portal for a collection of software [view more]
Category:
Keywords:
|
|
||||
codemeta2html |
|
|
|
|
|
|||
codemeta2html |
|
|
|
|
|
|||
CodeMetaPy | 2.5.3 2024-06-14 11:33:47 +0200 |
|
Codemetapy is a command-line tool and python library to work with the codemeta software metadata standard. Codemeta builds upon schema.org and defines a vocabulary for describing software source code. It maps various existing metadata standards to a unified vocabulary. Codemetapy allows you to generate codemeta from various sources. [view more]
Category:
Keywords:
|
|
||||
codemeta |
|
|
|
|
|
|||
codemetapy |
|
|
|
|
|
|||
CMD2RDF | 1.0.1 2021-03-07 18:35:08 +0100 |
|
No description provided
Keywords:
|
|
|
|
||
COBALT | unknown 2020-07-17 09:55:43 +0200 |
|
Corpus annotation tool [view more]
Keywords:
|
|
|
|
||
Colibri Core | 2.5.9 |
|
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. [view more]
Keywords:
|
|
||||
colibri-histogram |
|
Computes a histogram for ngram occurrences (and optionally skipgrams) in the corpus. This is a high-level convenience script over underlying tools. |
|
|
|
|
||
colibri-queryngrams |
|
Interactive command line tool to n-grams with their counts from one or more plain-text corpus files. This is a high-level convenience script over underlying tools. |
|
|
|
|
||
colibri-loglikelihood |
|
Compares the frequency of patterns between two or more corpus files (plain text) by computing log likelihood, following the methodology of Rayson and Garside (2000), Comparing corpora using frequency profiling. In proceedings of the workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000). 1-8 October 2000, Hong Kong, pp. 1 - 6: http://www.comp.lancs.ac.uk/~paul/publications/rg_acl2000.pdf. This is a high-level convenience script over underlying tools. |
|
|
|
|
||
colibri-classencode |
|
Encodes a plain text corpus to a binary encoded corpus and a class file |
|
|
|
|
||
colibri-freqlist |
|
Extract n-grams (and optionally skipgrams) with their counts from one or more plain-text corpus files. This is a high-level convenience script over underlying tools. |
|
|
|
|
||
colibri-patternmodeller |
|
Extract, model and compare recurring patterns (n-grams, skipgrams, flexgrams) and their frequencies in text corpus data. This is the main tool of Colibri Core. |
|
|
|
|
||
colibri-ngramstats |
|
Computes a summary report on the count of ngrams (and optionally skipgrams) in the corpus. This is a high-level convenience script over underlying tools. |
|
|
|
|
||
colibri-classdecode |
|
Decodes a binary encoded corpus and a class file to a plain text corpus |
|
|
|
|
||
colibri-ngrams |
|
Extract n-grams of a particular size by moving a sliding window over the corpus. This is a high-level convenience script over underlying tools. |
|
|
|
|
||
colibri-coverage |
|
Computes the coverage of training/background corpus on a particular test/foreground corpus, i.e how many of the patterns in the test corpus were found during training, how many tokens are covered, and how is this all distributed?. This is a high-level convenience script over underlying tools. |
|
|
|
|
||
colibri-reverseindex |
|
Computes and prints reverse index of the corpus, for each token position in the corpus, all patterns that start at that position are shown. This is a high-level convenience script over underlying tools. |
|
|
|
|
||
colibri-findpatterns |
|
Find patterns in corpus data based on a presupplied list of patterns (one per line). This is a high-level convenience script over underlying tools. |
|
|
|
|
||
colibri-cooc |
|
Computes co-occurrence statistics (absolute co-cooccurrence or pointwise mutual information) between patterns in a corpus |
|
|
|
|
||
Corpus Editor for Syntactically Annotated Resources (Cesar) | unknown |
|
Django web application that communicates with the CorpusStudioWeb back-end 'Crpp'. Two main purposes: (1) browse texts, (2) conduct syntactic searches with definable output per hit. Searches are translated to Xquery 'under the hood' [view more]
Keywords:
|
|
|
|
||
RU-Cesar |
|
|
|
|
||||
cow_csvw | 1.21 2024-03-08 16:02:10 +0100 |
|
Integrated CSV to RDF converter, using CSVW and nanopublications [view more]
Keywords:
|
|
|
|||
cow_tool |
|
|
|
|
|
|||
cow_tool_cli |
|
|
|
|
|
|||
DANE | ||||||||
DANE | 0.4.3 2024-05-13 10:13:48 +0200 |
|
Utils for working with the Distributed Annotation and Enrichment system [view more]
Category:
|
|
||||
dane-asr-worker | 0.1.0 |
|
Automatic speech recognition through an external service. Depends on DANE-server [view more]
Category:
|
|
|
|
|
|
dane-download-worker | 0.9.0 |
|
Basic "DANE worker" that downloads input data via HTTP(s) URLs for further processing by other DANE workers. Depends on DANE-server [view more]
Category:
|
|
|
|
|
|
DANE-server | 0.3.1 2023-06-19 09:07:32 +0200 |
|
Back-end for the Distributed Annotation 'n' Enrichment (DANE) system [view more] |
|
||||
DANE-server |
|
|
|
|
|
|||
dane-workflows | 0.9.0 |
|
Python library for setting up simple data processing workflows (using DANE) [view more]
Category:
|
|
|
|
|
|
dane-workflows |
|
|
|
|
|
|||
deepfrog | 0.2.1 2021-04-11 15:29:58 +0200 |
|
A deep learning NLP suite (PoS,lemmatiser,NER) with FoLiA XML support [view more]
Category:
Keywords:
|
|
|
|||
Dexter | v0.15.0 2024-05-22 11:15:55 +0200 |
|
No description provided |
|
|
|
||
did-summarizer | unknown 2024-02-02 13:28:23 +0100 |
|
Linked Data summarizer driven by Decentralized Identifiers (DIDs) [view more] |
|
|
|
||
Dutch_FrameNet_Lexicon | unknown 2020-07-08 09:32:55 +0200 |
|
No description provided |
|
|
|
||
Electronisch woordenboek van de Achterhoekse en Liemerse dialecten | unknown |
|
Django web application that facilities viewing and searching a dictionary of Dutch dialects from the regions 'Achterhoek' and 'Liemers' [view more]
Keywords:
|
|
|
|
||
e-WALD |
|
|
|
|
||||
Electronisch woordenboek van de Gelderse dialecten | unknown |
|
Django web application that facilities viewing and searching a dictionary of Dutch dialects from the province 'Gelderland' [view more]
Keywords:
|
|
|
|
||
e-WGD |
|
|
|
|
||||
Electronisch woordenboek van de Gelderse dialecten | unknown |
|
Django web application that facilities viewing and searching a dictionary of dialects from the Dutch province 'Noord-Brabant' as well as the Belgian provinces of Antwerpen, Vlaams-Brabant and Brussels [view more]
Keywords:
|
|
|
|
||
e-WBD |
|
|
|
|
||||
Electronisch woordenboek van de Limburgse dialecten | unknown |
|
Django web application that facilities viewing and searching a dictionary of the Dutch Limburgian dialects [view more]
Keywords:
|
|
|
|
||
e-WLD |
|
|
|
|
||||
FLAT | ||||||||
FoLiA-Linguistic-Annotation-Tool | 0.11.5 2024-07-05 13:27:34 +0200 |
|
FLAT is a web-based linguistic annotation environment based around the FoLiA format (https://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm. [view more]
Category:
Keywords:
|
|
|
|||
FLAT: the FoLiA Linguistic Annotation Tool |
|
|
|
|||||
FoLiA-Linguistic-Annotation-Tool |
|
|
|
|
|
|||
foliadocserve | 0.7.8 2024-02-07 16:51:51 +0100 |
|
The FoLiA Document Server is a backend HTTP service to interact with documents in the FoLiA format, a rich XML-based format for linguistic annotation (http://proycon.github.io/folia). It provides an interface to efficiently edit FoLiA documents through the FoLiA Query Language (FQL). [view more]
Category:
Keywords:
|
|
|
|||
foliadocserve |
|
|
|
|
|
|||
FoLiA | ||||||||
folia | 0.0.6 2020-11-16 14:24:33 +0100 |
|
High-performance library for handling the FoLiA XML format (Format for Linguistic Annotation) [view more]
Category:
Keywords:
|
|
|
|||
folia |
|
|
|
|
|
|||
FoLiA tools | 2.5.7 2024-05-14 12:13:07 +0200 |
|
FoLiA-tools contains various Python-based command line tools for working with FoLiA XML (Format for Linguistic Annotation) [view more]
Category:
Keywords:
|
|
|
|||
alpino2folia |
|
|
|
|
|
|||
conllu2folia |
|
|
|
|
|
|||
dcoi2folia |
|
|
|
|
|
|||
folia2annotatedtxt |
|
|
|
|
|
|||
folia2columns |
|
|
|
|
|
|||
folia2dcoi |
|
|
|
|
|
|||
folia2html |
|
|
|
|
|
|||
folia2rst |
|
|
|
|
|
|||
folia2salt |
|
|
|
|
|
|||
folia2stam |
|
|
|
|
|
|||
folia2txt |
|
|
|
|
|
|||
foliabench |
|
|
|
|
|
|||
foliacat |
|
|
|
|
|
|||
foliacorrect |
|
|
|
|
|
|||
foliacount |
|
|
|
|
|
|||
foliaerase |
|
|
|
|
|
|||
foliaeval |
|
|
|
|
|
|||
foliafreqlist |
|
|
|
|
|
|||
foliaid |
|
|
|
|
|
|||
folialangid |
|
|
|
|
|
|||
foliamerge |
|
|
|
|
|
|||
foliaquery |
|
|
|
|
|
|||
foliaquery1 |
|
|
|
|
|
|||
foliasetdefinition |
|
|
|
|
|
|||
foliaspec |
|
|
|
|
|
|||
foliaspec2json |
|
|
|
|
|
|||
foliaspec2rdf |
|
|
|
|
|
|||
foliasplit |
|
|
|
|
|
|||
foliatextcontent |
|
|
|
|
|
|||
foliatree |
|
|
|
|
|
|||
foliaupgrade |
|
|
|
|
|
|||
foliavalidator |
|
|
|
|
|
|||
rst2folia |
|
|
|
|
|
|||
tei2folia |
|
|
|
|
|
|||
transcribedspeech2folia |
|
|
|
|
|
|||
txt2folia |
|
|
|
|
|
|||
FoLiApy | 2.5.11 2024-03-28 17:23:49 +0100 |
|
An extensive library for processing FoLiA documents. FoLiA stands for Format for Linguistic Annotation and is a very rich XML-based format used by various Natural Language Processing tools. [view more]
Category:
Keywords:
|
|
|
|||
FoLiApy |
|
|
|
|
|
|||
foliautils | 0.22 |
|
Command-line utilities for working with the Format for Linguistic Annotation (FoLiA). [view more]
Keywords:
|
|
||||
FoLiA-alto |
|
Convert ALTO DIDL files into a series of FoLiA documents |
|
|
|
|
||
FoLiA-txt |
|
Convert plain text to FoLiA, the output will contain only <p> and <str> nodes. See ucto or rst2folia (FoLiA-tools) for alternatives. |
|
|
|
|
||
FoLiA-wordtranslate |
|
Simple word-by-word translator on the basis of a dictonary and/or rewrite rules |
|
|
|
|
||
FoLiA-pm |
|
Convert Political Maskup XML to FoLiA |
|
|
|
|
||
FoLiA-2text |
|
Convert FoLiA documents into plain text |
|
|
|
|
||
FoLiA-hocr |
|
Convert hOCR (as outputted by Tesseract) to FoLiA |
|
|
|
|
||
FoLiA-langcat |
|
Language Identification using textcat. |
|
|
|
|
||
FoLiA-page |
|
Convert PAGE XML to FoLiA |
|
|
|
|
||
FoLiA-collect |
|
Collect n-gram statistics from tsv files produced by FoLiA-stats, aggregating results. |
|
|
|
|
||
FoLiA-idf |
|
Count words in a series of FoLiA documents and compute IDF statistics, which are outputted to a tsv file |
|
|
|
|
||
FoLiA-stats |
|
Gather n-gram statistics over a series of FoLiA documents |
|
|
|
|
||
FoLiA-clean |
|
FoLiA-clean will produce a cleaned up version of a FoLiA file, or a whole directory of FoLiA files, removing specified annotation types and specified text classes |
|
|
|
|
||
FoLiA-correct |
|
Correct FoLiA documents using correction candidates generated by TICCL-rank (from ticcltools) |
|
|
|
|
||
libfolia | 2.20 |
|
This is a C++ Library for working with the Format for Linguistic Annotation (FoLiA). [view more]
Keywords:
|
|
||||
folialint |
|
FoLiA validation tool |
|
|
|
|
||
libfolia |
|
FoLiA Library with API for C++ |
|
|
|
|
||
piereling | 0.4 2023-11-01 11:43:34 +0100 |
|
Piereling is a webservice and web-application to convert between a variety of document formats, mostly from and to FoLiA XML. It is intended for NLP pipelines. [view more]
Category:
Keywords:
|
|
|
|||
Piereling | 0.4 |
|
Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc. |
|
|
|||
piereling |
|
|
|
|
|
|||
Forced Alignment 2 | 0.3.1 |
|
This webservice provides an output file with word alignments given an NL speech recording and a transcription. [view more]
Keywords:
|
|
||||
ForcedAlignment2 | 0.3.1 |
|
Forced Alignment of text and audio files |
|
|
|||
Forced Alignment 2 |
|
|
|
|
|
|||
Frog | ||||||||
Frog | 0.33 2023-12-05 15:43:06 +0100 |
|
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. It performs automatic linguistic enrichment such as part of speech tagging, lemmatisation, named entity recognition, shallow parsing, dependency parsing and morphological analysis. All NLP modules are based on TiMBL. [view more]
Category:
Keywords:
|
|
||||
mblem |
|
Memory-based Lemmatiser (standalone) |
|
|
|
|
||
mbma |
|
Memory-based Morphological Analysis (standalone) |
|
|
|
|
||
ner |
|
Named Entity Recogniser (standalone) |
|
|
|
|
||
libfrog |
|
Frog Library with API for C++ |
|
|
|
|
||
frog |
|
Command-line interface to the full NLP suite |
|
|
|
|
||
Frog-Webservice | 2.7 2023-12-05 16:06:08 +0100 |
|
Frog is a suite containing a tokeniser, Part-of-Speech tagger, lemmatiser, morphological analyser, shallow parser, and dependency parser for Dutch. This is the webservice for it, for both humans and machines. [view more]
Category:
Keywords:
|
|
||||
Frog Webservice | 2.7 |
|
Frog is a suite containing a tokeniser, Part-of-Speech tagger, lemmatiser, morphological analyser, shallow parser, and dependency parser for Dutch. |
|
|
|||
python-frog | 0.6.10 2023-12-05 15:47:42 +0100 |
|
Python binding to Frog, an NLP suite for Dutch doing part-of-speech tagging, lemmatisation, morphological analysis, named-entity recognition, shallow parsing, and dependency parsing. [view more]
Category:
Keywords:
|
|
||||
toad | v0.8 2023-02-22 17:02:10 +0100 |
|
Toad: Trainer Of All Data, the Frog training collection [view more] |
|
|
|
||
fusus | 0.0.2 2023-04-11 19:46:15 +0200 |
|
Workflow for converting Arabic scanned pages into readable text [view more]
Category:
Keywords:
|
|
|
|
||
g2pservice | 0.3.4 2023-05-12 13:09:12 +0200 |
|
Grapheme to Phoneme converter. Input is a list of words (utf8). Choose one of the language options. [view more]
Category:
Keywords:
|
|
|
|
||
Grapheme to Phoneme converter | 0.3.4 |
|
Grapheme to Phoneme (G2P) conversion. Input is a list of words (utf-8, one word per line). The G2P will output the best guess for the phonetic transcription per word. The system is trained on existing dictionaries. Please choose a language option. The system is a demo-version --- please refer to CLST for using G2P for long word lists. |
|
|
|||
g2pservice |
|
|
|
|
|
|||
GaLAHaD | ||||||||
GaLAHaD | 1.2.2 2024-08-30 14:38:25 +0200 |
|
GaLAHaD (Generating Linguistic Annotations for Historical Dutch) allows linguists to compare taggers, tag their own corpora, evaluate the results and export their tagged documents. [view more]
Category:
|
1 |
|
|||
GaLAHaD |
|
|
|
|
||||
GaLAHaD proxy Docker image |
|
|
|
|
|
|||
GaLAHaD API |
|
|
|
|
|
|||
GaLAHaD proxy |
|
|
|
|
|
|||
GaLAHaD server Docker image |
|
|
|
|
|
|||
GaLAHaD client Docker image |
|
|
|
|
|
|||
GaLAHaD Train Battery | 1.0.0 2024-06-04 |
|
Python program for training linguistic annotation taggers based on a configuration file and list of datasets. It prepares the resulting trained models for dockerization and adds relevant metadata. It is tagger software agnostic as long as a simple python shell is built around it. [view more]
Category:
|
|
||||
GaLAHaD Train Battery - Dockerizer |
|
|
|
|
|
|||
GaLAHaD Train Battery - Trainer |
|
|
|
|
|
|||
int-pie | 1.0.0 2024-06-05 15:51:23 +0200 |
|
The PIE tagger with custom modifications by the Dutch Language Institute (INT). [view more]
Category:
|
1 |
|
|||
evaluate |
|
|
|
|
|
|||
tag |
|
|
|
|
|
|||
train |
|
|
|
|
|
|||
Gecco | 0.3.0 2020-07-11 13:28:04 +0200 |
|
Generic Environment for Context-Aware Correction of Orthography [view more]
Category:
Keywords:
|
1 |
|
|||
gecco |
|
|
|
|
|
|||
Generale Missieven in Text-Fabric | v1.1e 2024-03-27 08:09:41 +0100 |
|
Conversion of Generale Missieven to Text-Fabric and tutorial how to work with the result [view more]
Keywords:
|
|
|
|
||
converter and tutorial notebooks |
|
|
|
|
|
|||
Glem | 1.3.1 2023-10-05 14:28:06 +0200 |
|
GLEM is a lemmatizer for Ancient Greek. [view more]
Category:
Keywords:
|
|
|
|||
Glem | 1.3.1 |
|
|
|
||||
glem |
|
Command-line interface to GLEM |
|
|
|
|
||
gretel | 4.2.4 2022-09-16 15:00:36 +0200 |
|
GrETEL4 (fork from CCL-KULeuven) [view more] |
|
||||
GrETEL 4 |
|
|
|
|
||||
grlc: the git repository linked data API constructor | 1.3.7 2022-03-21 22:20:00 +0100 |
|
grlc, the git repository linked data API constructor, automatically builds Web APIs using SPARQL queries stored in git repositories. [view more]
Keywords:
|
|
|
|
||
grlc: the git repository linked data API constructor |
|
|
|
|
|
|||
hypodisc | 0.1.0 2024-06-03 14:31:16 +0200 |
|
Hypothesis Discovery on RDF Knowledge Graphs [view more]
Keywords:
|
|
|
|||
hypodisc |
|
|
|
|
|
|||
I-Analyzer | 5.3.0 2023-12-08 11:23:35 +0100 |
|
I-analyzer is a tool for exploring corpora (large collections of texts). You can use I-analyzer to find relevant documents, or to make visualisations to understand broader trends in the corpus. The interface is designed to be accessible for users of all skill levels.
I-analyzer is primarily intended for academic research and higher education. We focus on data that is relevant for the humanities, but we are open to datasets that are relevant for other fields. [view more]
Keywords:
|
|
|
|
||
I-Analyzer |
|
|
|
|
||||
ineo | unknown |
|
No description provided |
|
unknown
|
|
|
|
Ineo - Start using digital humanities resources - Ineo |
|
Ineo lets you search, browse, find and select digital resources for your research in humanities and social sciences. The platform is already fully functional, but is still being filled with resource content. At the end of 2023, it will offer access to many tools, datasets, workflows, standards and educational material. |
|
|
|
|||
ineo-collaboration | unknown 2024-09-06 10:42:50 +0200 |
|
how to get metadata into INEO [view more] |
|
||||
Kaldi_NL | v0.4.3 2023-11-01 12:10:26 +0100 |
|
Code related to the Dutch instance and user groups of the KALDI speech recognition toolkit [view more]
Keywords:
|
|
|
|
||
LaMachine | 2.28 |
|
LaMachine is a unified software distribution for Natural Language Processing. We integrate numerous open-source NLP tools, programming libraries, web-services, and web-applications in a single Virtual Research Environment that can be installed on a wide variety of machines. [view more]
Keywords:
|
|
||||
Lenticular Lens | ||||||||
lenticular-lens | 1.0.0 2023-01-25 09:39:33 +0100 |
|
Vue frontend for Lenticular Lens [view more] |
|
||||
Lenticular Lens | 1.0.0 |
|
|
|
|
|||
lenticular-lens | 1.17 2022-10-26 11:53:49 +0200 |
|
Lenticular Lens is a tool which allows users to construct linksets between entities from different Timbuctoo datasets (so called data-alignment or reconciliation). Lenticular Lens tracks the configuration and the algorithms used in the alignment and is also able to report on manual corrections and the amount of manual validation done. [view more] |
|
||||
Golden Agents | lenticularlens.org |
|
|
|
|
||||
Lenticular Lens | 1.0.0 |
|
|
|
|
|||
lenticular-lens-postgresql | 1.3 2023-08-16 12:53:51 +0200 |
|
PostgreSQL extension for Lenticular Lens [view more] |
|
||||
lingua-cli | 0.4.0 2024-06-10 21:56:08 +0200 |
|
Lingua-cli is a command line tool for language classification, using the lingua-rs library. [view more] |
|
|
|||
lingua-cli |
|
|
|
|
|
|||
mbt | 3.10 |
|
MBT is a memory-based tagger-generator and tagger in one. The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger part can tag new sequences. MBT can, for instance, be used to generate part-of-speech taggers or chunkers for natural language processing. It has also been used for named-entity recognition, information extraction in domain-specific texts, and disfluency chunking in transcribed speech. [view more]
Keywords:
|
|
|
|||
libmbt |
|
Memory-based Tagging Library with API for C++ |
|
|
|
|
||
mbt |
|
Memory-based tagger, command-line tool |
|
|
|
|
||
Media Suite | ||||||||
CLARIAH Media Suite | 6.10 2023-11-21 |
|
The CLARIAH Media Suite is a research environment in which researchers can search, bookmark, annotate and compare items from a number of cultural heritage collections [view more]
Keywords:
|
|
|
|
||
CLARIAH Media Suite |
|
|
|
|
||||
Nederlab | ||||||||
MTAS | 8.11.1.0 2022-01-14 11:51:15 +0100 |
|
Multi Tier Annotation Search, a Solr/Lucene based library and plugin providing search and analysis on annotated and structured text. [view more]
Keywords:
|
|
||||
MTAS |
|
|
|
|
|
|||
Nederlab Pipeline | 0.8.0 |
|
A set of workflows for linguistic enrichment of historical dutch [view more]
Keywords:
|
|
||||
nederlab-portal | unknown |
|
No description provided |
|
unknown
|
|
|
|
nederlab onderzoeksportaal |
|
|
|
|
||||
Netwerk Digitaal Erfgoed (NDE) | ||||||||
Dataset Register | unknown |
|
Live index of heritage datasets [view more]
Category:
Keywords:
|
|
||||
Dataset Register OpenAPI |
|
|
|
|
||||
Network of Terms GraphQL API | unknown |
|
GraphQL API for the Network of Terms, a Search engine for finding terms in terminology sources (such as thesauri, classification systems and reference lists) [view more]
Category:
Keywords:
|
|
||||
Network of Terms GraphQL API |
|
|
|
|
||||
Network of Terms Reconciliation API | unknown |
|
Reconciliation API for the Network of Terms, a Search engine for finding terms in terminology sources (such as thesauri, classification systems and reference lists) [view more]
Category:
Keywords:
|
|
||||
Network of Terms Reconciliation API |
|
|
|
|
||||
OpenDutchWordnet | unknown 2021-05-11 16:38:00 +0200 |
|
This repo provides a python module to work with Open Dutch WordNet. It was created using python 3.4. [view more] |
|
|
|
||
OpenDutchWordnet |
|
|
|
|
|
|||
pagexml-tools | 0.5.0 2024-03-18 14:49:12 +0100 |
|
Utility functions for reading PageXML files [view more]
Category:
|
|
|
|
||
version |
|
|
|
|
|
|||
PaQu | 1.0.5 2024-04-24 |
|
Met PaQu (Parse & Query) kun je zoeken in syntactisch geannoteerde Nederlandstalige corpora.
PaQu ondersteunt twee manieren van zoeken. Met de eerste, eenvoudige, manier kun je naar woordparen zoeken, met daarbij eventueel hun syntactische relatie. De tweede, ingewikkeldere, manier gebruikt de zoektaal XPath.
In PaQu is een aantal syntactisch geannoteerde corpora standaard beschikbaar. Maar het is ook mogelijk om je eigen teksten aan te bieden. Deze teksten worden dan door de automatische ontleder geanalyseerd, en opgeslagen. Vervolgens kun je dan op dezelfde manier in je eigen teksten zoeken. [view more]
Category:
Keywords:
|
|
||||
PaQu |
|
|
|
|
||||
pure3dtools | 0.0.4 2024-04-11 21:53:43 +0200 |
|
Pure3D tools [view more] |
|
||||
pure3dtools |
|
|
|
|
|
|||
Ricgraph - Research in context graph | v2.4 2024-09-10 14:00:59 +0200 |
|
Ricgraph, also known as Research in context graph, enables the exploration of researchers, teams, their results, collaborations, skills, projects, and the relations between these items.
Ricgraph can store many types of items into a single graph. These items can be obtained from various systems and from multiple organizations. Ricgraph facilitates reasoning about these items because it infers new relations between items, relations that are not present in any of the separate source systems. It is flexible and extensible, and can be adapted to new application areas.
Throughout this text, we illustrate how Ricgraph works by applying it to the application area research information.
Motivation
Ricgraph, also known as Research in context graph, is software that is about relations between items. These items can be collected from various source systems and from multiple organizations. We explain how Ricgraph works by applying it to the application area research information. We show the insights that can be obtained by combining information from various source systems, insight arising from new relations that are not present in each separate source system.
Research information is about anything related to research: research results, the persons in a research team, their collaborations, their skills, projects in which they have participated, as well as the relations between these entities. Examples of research results are publications, data sets, and software.
Example use cases from the application area research information are:
(1) As a journalist, I want to find researchers with a certain skill and their publications, so that I can interview them for a newspaper article.
(2) As a librarian, I want to enrich my local research information system with research results that are in other systems but not in ours, so that we have a more complete view of research at our university.
(3) As a researcher, I want to find researchers from other universities that have co-authored publications written by the co-authors of my own publications, so that I can read their publications to find out if we share common research interests.
These use cases use different types of information (called items): researchers, skills, publications, etc. Most often, these types of information are not stored in one system, so the use cases may be difficult or time-consuming to answer. However, by using Ricgraph, these use cases (and many others) are easy to answer.
Although this text illustrates Ricgraph in the application area research information, the principle "relations between items from various source systems" is general, so Ricgraph can be used in other application areas.
Main contributions of Ricgraph
(1) Ricgraph can store many types of items in a single graph.
(2) Ricgraph harvests multiple source systems into a single graph.
(3) Ricgraph Explorer is the exploration tool for Ricgraph.
(4) Ricgraph facilitates reasoning about items because it infers new relations between items.
(5) Ricgraph can be tailored for an application area.
Read more about Ricgraph
For a gentle introduction in Ricgraph, read the reference publication: Rik D.T. Janssen (2024). Ricgraph: A flexible and extensible graph to explore research in context from various systems. SoftwareX, 26(101736). https://doi.org/10.1016/j.softx.2024.101736
Extensive documentation, publications, videos and source code can be found in the GitHub repository https://github.com/UtrechtUniversity/ricgraph
The website for Ricgraph can be found at https://www.ricgraph.eu [view more]
Category:
Keywords:
|
|
|
|
||
Script to harvest OpenAlex for Ricgraph |
|
|
|
|
|
|||
Script to harvest the Utrecht University staff pages for Ricgraph |
|
|
|
|
|
|||
Script to harvest the data repository Yoda for Ricgraph |
|
|
|
|
|
|||
Ricgraph REST API |
|
|
|
|
|
|||
Ricgraph Explorer |
|
|
|
|
|
|||
Script to harvest the Research Information System Pure for Ricgraph |
|
|
|
|
|
|||
Script to harvest the Research Software Directory for Ricgraph |
|
|
|
|
|
|||
Ricgraph |
|
|
|
|
|
|||
Script to call all harvest scripts |
|
|
|
|
|
|||
sastadev | 0.2.3 2024-06-18 16:31:39 +0200 |
|
Linguistic functions for SASTA tool [view more] |
|
||||
SASTA |
|
|
|
|
||||
sastadev |
|
|
|
|
|
|||
search-ui | 1.0.0 2022-12-21 08:48:13 +0100 |
|
This repository contains the code for a Search UI to test the functionality of the basic vocabulary-recommender. [view more] |
|
||||
@triply/search-ui | 1.0.0 |
|
|
|
|
|
||
shebanq | v4.2z 2022-10-12 10:12:53 +0200 |
|
Exposing the Hebrew Text Database of the ETCBC [view more]
Keywords:
|
|
|
|
||
SHEBANQ |
|
Search engine for biblical Hebrew based on the Biblia Hebraica Stuttgartensia (Amstelodamensis) database (formerly known as ETCBC, historically known as WIVU) |
|
|
||||
SHEBANQ |
|
|
|
|
||||
SPAQ | unknown 2024-02-14 17:18:20 +0100 |
|
SPAQ (speech aquisition using Surveys) [view more] |
|
|
|
||
STAM | ||||||||
stam | v1.1.0 2024-08-23 14:08:40 +0200 |
|
Stand-off Text Annotation Model (STAM) is a data model for stand-off-text annotation where any information on a text is represented as an annotation. This repository contains the model's full specification, extensions, schemas, examples and documentation. [view more]
Category:
Keywords:
|
|
|
|
||
stam | 0.9.0 2024-08-29 18:19:39 +0200 |
|
STAM is a library for dealing with standoff annotations on text, this is the python binding. [view more]
Category:
Keywords:
|
|
||||
stam |
|
|
|
|
|
|||
stam | 0.15.0 2024-08-29 17:38:16 +0200 |
|
STAM is a powerful library for dealing with stand-off annotations on text. This is the Rust library. [view more]
Category:
Keywords:
|
|
||||
stam |
|
|
|
|
|
|||
stam-tools | 0.8.0 2024-08-29 18:02:54 +0200 |
|
Command-line tools for working with stand-off annotations on text (STAM) [view more]
Category:
Keywords:
|
|
||||
stam-tools |
|
|
|
|
|
|||
T-Scan | 0.10.0 |
|
T-Scan is an analysis tool for Dutch texts to assess the complexity of the text, and is based on original work by Rogier Kraf [view more]
Keywords:
|
|
||||
T-scan |
|
T-Scan is an analysis tool for Dutch text, mainly focusing on text complexity. It has been initially conceptualized by Rogier Kraf and Henk Pander Maat. Rogier Kraf also programmed the first versions. From 2012 on, Henk Pander Maat supervised the development of the extended versions of the tool. These versions were programmed by Maarten van Gompel, Ko van der Sloot, Martijn van der Klis, Sheean Spoel and Luka van der Plas. |
|
|
||||
text-fabric | 12.5.3 2024-07-05 23:12:28 +0200 |
|
Processor and browser for annotated text corpora [view more]
Category:
Keywords:
|
|
|
|
||
Text-Fabric |
|
|
|
|
|
|||
Text-Fabric Browser |
|
|
|
|
|
|||
Text-Fabric |
|
|
|
|
|
|||
textannoviz | 0.15.9 2024-09-13 10:05:20 +0200 |
|
[view more] |
|
|
|
||
textannoviz | 0.15.9 |
|
|
|
|
|
||
TextRepo | ||||||||
textrepo | v1.19.0 2022-03-15 14:51:17 +0100 |
|
Text Repository [view more] |
|
|
|
||
textrepo-client | 0.5.1 2022-04-08 23:52:20 +0200 |
|
A Python client to access a textrepo server [view more] |
|
||||
version |
|
|
|
|
|
|||
TICCL & PICCL | ||||||||
PICCL | 0.9.5 |
|
A set of workflows for corpus building through OCR, post-correction, and normalisation. [view more]
Keywords:
|
|
||||
TICCLTools | 0.10 |
|
TicclTools is a collection of programs to process datafiles, together they constitute the bulk of TICCL: Text Induced Corpus-Cleanup. This software consists of individual modules that are invoked by the pipeline system PICCL. [view more]
Keywords:
|
|
||||
TICCLTools |
|
|
|
|
|
|||
TiMBL | ||||||||
python3-timbl | 2020.6.8 2020-06-08 22:37:07 +0200 |
|
Python 3 language binding for the Tilburg Memory-Based Learner [view more]
Category:
Keywords:
|
|
|
|||
TiMBL | 6.9 |
|
TiMBL is an open source software package implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification with feature weighting suitable for symbolic feature spaces, and IGTree, a decision-tree approximation of IB1-IG. All implemented algorithms have in common that they store some representation of the training set explicitly in memory. During testing, new cases are classified by extrapolation from the most similar stored cases. [view more]
Keywords:
|
|
|
|||
libtimbl |
|
Memory-based Learning Library with API for C++ |
|
|
|
|
||
timbl |
|
Memory-based learner, command-line tool |
|
|
|
|
||
Timbuctoo | 7.15 2024-03-01 10:25:59 +0100 |
|
An RDF datastore that gives researchers control over the sharing of data between datasets
[view more]
Keywords:
|
|
|
|
||
Ucto | ||||||||
python-ucto | 0.6.8 2024-09-12 14:10:03 +0200 |
|
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto). [view more]
Category:
Keywords:
|
|
|
|||
ucto | 0.34 2023-02-22 12:17:06 +0100 |
|
Ucto tokenizes text files: it separates words from punctuation, and splits sentences. This is one of the first tasks for almost any Natural Language Processing application. Ucto offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. [view more]
Category:
Keywords:
|
|
||||
libucto |
|
Ucto Library with API for C++ |
|
|
|
|
||
ucto |
|
Command-line interface to the tokenizer |
|
|
|
|
||
Ucto-Webservice | 2.5.2 2024-03-14 21:54:52 +0100 |
|
Ucto is a rule-based tokeniser for multiple languages. This is the webservice for it, for both humans and machines. [view more]
Category:
Keywords:
|
|
|
|||
Ucto Webservice | 2.5.2 |
|
Ucto is a unicode-compliant tokeniser. It takes input in the form of one or more untokenised texts, and subsequently tokenises them. Several languages are supported, but the software is extensible to other languages. |
|
|
|||
udpipe-service | 4.10 2023-11-26 09:23:18 +0100 |
|
A rest service for an R / udpipe based tokenizer, lemmatizer, pos-tagger and dependency parser. See https://bitbucket.org/fryske-akademy/udpipe for (docker) setup. [view more] |
|
||||
Service to tokenize, lemmatize, pos-tag and dependency parse using udpipe |
|
|
|
|
||||
vocabulary-recommender | 2.0.0 2022-12-23 13:24:13 +0100 |
|
A generic linked data vocabulary recommender library that provides recommendation functions for various backends. [view more]
Keywords:
|
|
||||
vocabulary-recommender |
|
|
|
|
|
|||
vurmpipe | 3.0 2019-03-24 22:55:06 +0100 |
|
VU Reading Machine Pipeline [view more] |
|