GaLAHaD

GaLAHaD (Generating Linguistic Annotations for Historical Dutch) allows linguists to compare taggers, tag their own corpora, evaluate the results and export their tagged documents.

Provided tools & services

GaLAHaD API

Note: No URL was registered for this service (yet)
Type
  • Web API
Documentation
Service Provider
Input data
Type
TextDigitalDocument
Encoding Format
application/folia+xml
Type
TextDigitalDocument
Encoding Format
application/tei+xml
Type
TextDigitalDocument
Encoding Format
https://github.com/newsreader/NAF
Type
TextDigitalDocument
Encoding Format
https://universaldependencies.org/format.html
Type
TextDigitalDocument
Encoding Format
text/tab-separated-values
Type
TextDigitalDocument
Encoding Format
text/plain
Output data
Type
CreativeWork
Encoding Format
application/zip

GaLAHaD client Docker image

Type
  • Software Image

GaLAHaD proxy

Type
  • Server Application

GaLAHaD proxy Docker image

Type
  • Software Image

GaLAHaD server Docker image

Type
  • Software Image

Tool suite: GaLAHaD

The following closely related tools are in a tool suite together with GaLAHaD:

  • 6 - Late prototype: Technology demonstrated in target setting, end-users adopt it for testing purposes.
  • Active: The project has reached a stable, usable state and is being actively developed.

GaLAHaD Train Battery 1.0.0

Python program for training linguistic annotation taggers based on a configuration file and list of datasets. It prepares the resulting trained models for dockerization and adds relevant metadata. It is tagger software agnostic as long as a simple python shell is built around it. [view more]
  • Artificial intelligence, export systems
  • Computational linguistics and philology
  • Linguistics
  • Linux
  • Python
Created: 2024-05-31
Modified: 2024-06-04
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

int-pie 1.0.0

  •   Enrique Manjavacas
  •   Mike Kestemont
  •   Thibault Clerice
The PIE tagger with custom modifications by the Dutch Language Institute (INT). [view more]
  • Analyzing
  • Annotating
  • Artificial intelligence, export systems
  • Computational linguistics and philology
  • Enriching
  • Lemmatizing
  • Linguistics
  • Machine Learning
  • POS-Tagging
  • Tagging
  • Linux
  • Python
Created: 2024-05-31
Modified: 2024-06-05

Citation

You can cite this software using the following citation generated from its metadata:

(2024) GaLAHaD 1.2.2 .
  • Instituut voor de Nederlandse taal
.

Logs & Reviews

Name
Automatic software metadata validation report for GaLAHaD 1.2.2
Author
  • codemetapy validator using software.ttl
Date
2024-09-16 03:08:10
Review
Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems

Validation of GaLAHaD 1.2.2 was successful (score=4/5), but there are some remarks which you may or may not want to address:

1. Info: Reference publications *SHOULD* be expressed, if any (This is missing in the metadata)
Rating
★ ★ ★ ★ ☆
There were 1 error(s) harvesting this metadata, please inspect the log.
(log file starts at Mon Sep 16 03:08:00 UTC 2024)

[harvester info] --> Processing galahad (https://github.com/INL/galahad) [Mon Sep 16 03:08:00 UTC 2024]

[harvester info] Git updating cached clone of https://github.com/INL/galahad...

[harvester info] Found release 1.2.2

[harvester info] Using '1.2.2'

[harvester info] Git reference: 1.2.2

[harvester info] Scanning directory /tmp/codemeta-harvester.cache/galahad for harvestable resources...

[harvester info] found codemeta-harvest.json for galahad (md5sum 6a1e01599a462c3e65c902c213911ef8); values in here take precendence over (override) those in later detection stages

[harvester info] Looking for license....

[harvester info] Found license Apache-2.0

[harvester info] Getting contributors from git...

[harvester info] No git contributors found

[harvester info] Getting top contributor from git...

[harvester info] Git top contributor  will be assigned as author (and maintainer) if none are found in the metadata

[harvester info] Extracting last and first commit date from git log....

[harvester info] Date created: 2024-05-31T16:59:02Z+0200, date modified: 2024-08-30T14:38:25Z+0200

[harvester info] Querying Github/GitLab API (https://github.com/INL/galahad)

[harvester info] Adding URL for found README: readme.md

[harvester info] Found releaseNotes

[harvester info] Querying Zenodo API for DOI (access token provided)...

[harvester info] Looking for TRL information in readme.md...

[harvester info] Looking for repostatus information in readme.md...

[harvester info] Looking for continuous integration information in readme.md...

[harvester info] Found CI https://github.com/INL/Galahad/actions/

[harvester info] Looking for documentation links in readme.md...

[harvester info] Falling back to git tag (1.2.2) if no version number is specified...

[harvester info] Inferring repostatus information from git activity (used only as a fallback if not explicitly provided)...

[harvester info] Inferred repostatus https://www.repostatus.org/#active

[harvester info] Looking for repostatus information in readme.md in master branch...

[harvester info] Setting group GaLAHaD

[harvester info] Reconciliating: codemetapy  --baseuri https://tools.dev.clariah.nl --baseuri https://tools.dev.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl --identifier "galahad" --codeRepository "https://github.com/INL/galahad" --validate /etc/software.ttl --released --enrich --textv "Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems" -O /tmp/out/galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-version.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-repostatus.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/90-authors.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/43-releasenotes.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/41-readme.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/40-gitapi.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/39-gitdate.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/29-license.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/12-ci.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/10-harvest.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/04-applicationSuite.galahad.codemeta.json 

-- begin log --

Passed 11 files/sources but specified 0 input types! Automatically guessing types...

Detected input types: [('/tmp/codemeta-harvester.cache//tmp/99-version.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/99-repostatus.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/90-authors.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/43-releasenotes.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/41-readme.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/40-gitapi.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/39-gitdate.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/29-license.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/12-ci.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/10-harvest.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/04-applicationSuite.galahad.codemeta.json', 'json')]

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://tools.dev.clariah.nl/galahad

Processing source #1 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-version.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/galahad)] processed 1 new triples, total is now 2

Processing source #2 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-repostatus.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/galahad)] processed 1 new triples, total is now 3

Processing source #3 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/90-authors.galahad.codemeta.json

    Found main resource with URI https://tools.dev.clariah.nl/galahad.topcontributor/snapshot

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/galahad)] processed 1 new triples, total is now 3

Processing source #4 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/43-releasenotes.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/galahad)] processed 2 new triples, total is now 5

Processing source #5 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/41-readme.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/galahad)] processed 1 new triples, total is now 6

Processing source #6 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/40-gitapi.galahad.codemeta.json

    Found main resource with URI https://tools.dev.clariah.nl/galahad/snapshot

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/galahad)] processed 13 new triples, total is now 18

Processing source #7 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/39-gitdate.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/galahad)] overriding old http://schema.org/dateCreated (2024-05-31T14:57:58Z -> 2024-05-31T16:59:02Z+0200)

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/galahad)] overriding old http://schema.org/dateModified (2024-09-13T15:03:33Z -> 2024-08-30T14:38:25Z+0200)

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/galahad)] processed 2 new triples, total is now 18

Processing source #8 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/29-license.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/galahad)] overriding old http://schema.org/license (http://spdx.org/licenses/Apache-2.0 -> Apache-2.0)

[CODEMETA CORRECTION (https://tools.dev.clariah.nl/galahad)] automatically converting license to spdx URI

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/galahad)] processed 1 new triples, total is now 18

Processing source #9 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/12-ci.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/galahad)] processed 1 new triples, total is now 19

Processing source #10 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/10-harvest.galahad.codemeta.json

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/galahad

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/dateCreated (2024-05-31T16:59:02Z+0200 -> 2024-05-31)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/codeRepository (https://github.com/INL/galahad -> git+https://github.com/INL/galahad.git)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/description ("Galahad". Goal: enable linguists to experiment with different taggers and use the result in other INT products  -> GaLAHaD (Generating Linguistic Annotations for Historical Dutch) allows linguists to compare taggers, tag their own corpora, evaluate the results and export their tagged documents.)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/downloadUrl (https://github.com/INL/galahad/archive/refs/tags/1.2.2.zip -> https://github.com/INL/galahad)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/name (galahad -> GaLAHaD)

[CODEMETA COMPOSITION (galahad)] overriding old https://codemeta.github.io/terms/contIntegration (https://github.com/INL/Galahad/actions/ -> https://github.com/INL/galahad/actions)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/producer (https://tools.dev.clariah.nl/org/dutch-language-institute -> https://www.ivdnt.org)

[CODEMETA COMPOSITION (galahad)] overriding old https://codemeta.github.io/terms/readme (https://github.com/INL/galahad/blob/1.2.2//readme.md -> https://github.com/INL/Galahad/blob/release/readme.md)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/releaseNotes (https://github.com/INL/galahad/releases/tag/1.2.2 -> https://github.com/INL/Galahad/releases)

[CODEMETA COMPOSITION (galahad)] processed 301 new triples, total is now 307

Processing source #11 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/04-applicationSuite.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/galahad

[CODEMETA COMPOSITION (galahad)] processed 1 new triples, total is now 308

Remapping URI to (possibly) new identifier and version component: https://tools.dev.clariah.nl/galahad -> https://tools.dev.clariah.nl/galahad/1.2.2

[CODEMETA VALIDATION (galahad)] done

VALIDATION https://tools.dev.clariah.nl/galahad/1.2.2 #1: Info: Reference publications *SHOULD* be expressed, if any (This is missing in the metadata)

-- end log --

[harvester info] Output written to /tmp/out/galahad.codemeta.json

[harvester info] Harvesting remote service URL https://portal.clarin.ivdnt.org/galahad for galahad: codemetapy  --baseuri https://tools.dev.clariah.nl --baseuri https://tools.dev.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl -O "/tmp/codemeta-harvester.cache//tmp/galahad.codemeta.json" "/tmp/out/galahad.codemeta.json" "https://portal.clarin.ivdnt.org/galahad"

-- begin log --

Passed 2 files/sources but specified 0 input types! Automatically guessing types...

Detected input types: [('/tmp/out/galahad.codemeta.json', 'json'), ('https://portal.clarin.ivdnt.org/galahad', 'web')]

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://tools.dev.clariah.nl/galahad

Processing source #1 of 2

Parsing json-ld file from /tmp/out/galahad.codemeta.json

    Found main resource with URI https://tools.dev.clariah.nl/galahad/1.2.2

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/galahad

[CODEMETA COMPOSITION (galahad)] processed 477 new triples, total is now 477

Processing source #2 of 2

Fallback: Obtaining metadata from remote URL https://portal.clarin.ivdnt.org/galahad

    Service replied with content-type text/html

Traceback (most recent call last):

  File "/usr/bin/codemetapy", line 8, in <module>

    sys.exit(main())

             ^^^^^^

  File "/usr/lib/python3.12/site-packages/codemeta/codemeta.py", line 335, in main

    g, res, args, contextgraph = build(**args.__dict__)

                                 ^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/lib/python3.12/site-packages/codemeta/codemeta.py", line 688, in build

    for targetres in codemeta.parsers.web.parse_web(

  File "/usr/lib/python3.12/site-packages/codemeta/parsers/web.py", line 132, in parse_web

    raise MiddlewareObstructionException(

codemeta.parsers.web.MiddlewareObstructionException: Unable to extract metadata from https://portal.clarin.ivdnt.org/galahad because it immediately redirects to an external (SSO) login page rather than a proper landing page

-- end log --

[harvester error] Failed to obtain or process metadata from remote service URL https://portal.clarin.ivdnt.org/galahad for galahad

[harvester info] <-- Finished processing galahad (https://github.com/INL/galahad) [Mon Sep 16 03:08:13 UTC 2024]

        

Metadata Properties

Version
1.2.2 (release notes)
Interface types
  • Server Application
  • Software Image
  • Web API
  • Web Application
Source code repository
 https://github.com/INL/galahad  Stars are an indicator of the popularity of this project on GitHub
Category
  • Analyzing
  • Annotating
  • Artificial intelligence, export systems
  • Comparing
  • Computational linguistics and philology
  • Converting
  • Enriching
  • Lemmatizing
  • Linguistics
  • Machine Learning
  • Merging
  • POS-Tagging
  • Software for humanities
  • Tagging
  • Textual and linguistic corpora
Development Status
  • 6 - Late prototype: Technology demonstrated in target setting, end-users adopt it for testing purposes.
  • Active: The project has reached a stable, usable state and is being actively developed.
Issue Tracker (Support)
https://github.com/INL/galahad/issues  The number of open issues on the issue tracker  The number of closes issues on the issue tracker
Documentation
License
Author(s)
Maintainer(s)
Contributor(s)
Producer
Programming Language
  • Javascript
  • Kotlin
  • Typescript
Continuous Integration Tests
https://github.com/INL/galahad/actions
Runtime Platform
  • JVM
  • Node
Operating System
  • Linux
Software dependencies
  • node-sass
  • @vue/eslint-config-typescript
  • klaxon
  • pinia
  • eslint-plugin-vue
  • axios
  • @typescript-eslint/parser
  • eslint
  • js-yaml
  • typescript
  • kotlinx-coroutines-core-jvm
  • vue-router
  • springdoc-openapi-starter-webmvc-ui
  • buffer
  • @typescript-eslint/eslint-plugin
  • content-disposition
  • snakeyaml
  • json-loader
  • spring-boot-devtools
  • kotlin-reflect
  • @types/js-yaml
  • safe-buffer
  • vue
  • @types/jest
  • log4j-api-kotlin
  • vite
  • kotlin-stdlib
  • mutationobserver-shim
  • @types/uuid
  • @rollup/plugin-yaml
  • uuid
  • sass
  • spring-boot-starter-web
  • vue-slider-component
  • @vitejs/plugin-vue
  • kotlinx-serialization-json-jvm
Metadata validation
★ ★ ★ ★ ☆
Created
2024-05-31
Last modified
2024-08-30 14:38:25 +0200  Last commit (main branch). Gives an indication of project development activity and rough indication of how up-to-date the latest release is.  Number of commits since the last release. Gives an indication of project development activity and rough indication of how up-to-date the latest release is.