INT Corpus Frontend

A web application to search corpora through the BlackLab Server web service.

Provided tools & services

Brieven als Buit search

Brieven als Buit provided by the Dutch Language Institute in Leiden.
Type
  • Web Application

Corpus Hedendaags Nederlands

CHN, provided by the Dutch Language Institute in Leiden.
Type
  • Web Application

OpenSoNaR

OpenSoNaR, provided by the Dutch Language Institute in Leiden.
Type
  • Web Application

Tool suite: Blacklab & Corpus Search

The following closely related tools are in a tool suite together with INT Corpus Frontend:

  • Web Application
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

A Blacklab Server CLARIN FCS 2.0 endpoint 0.1

CLARIAH Federated content search corpora, developed by the Dutch Language Institute (INT), is a service to enable searching in multiple Dutch corpora at the same time. This application implements the CLARIN FCS 2.0 specification on top of Dutch language corpora. This repository hosts the source code. [view more]
  • BlackLab
  • CLARIN
  • corpus search
  • FCS 2.0
  • Federated Content Search
  • Nederlab
Created: 2016-09-11
Modified: 2023-05-10
  • Active: The project has reached a stable, usable state and is being actively developed.
Created: 2012-10-04
Modified: 2022-10-06

Citation

You can cite this software using the following citation generated from its metadata:

Logs & Reviews

Name
Automatic software metadata validation report for INT Corpus Frontend 3.1.1
Author
  • codemetapy validator using software.ttl
Date
2024-05-24 03:05:32
Review
Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems

Validation of INT Corpus Frontend 3.1.1 was successful (score=3/5), but there are some warnings which should be addressed:

1. Info: Software source code *SHOULD* link to a continuous integration service that builds the software and runs the software's tests (This is missing in the metadata)
2. Warning: Documentation *SHOULD* be expressed (This is missing in the metadata)
3. Info: Reference publications *SHOULD* be expressed, if any (This is missing in the metadata)
4. Info: The funder *SHOULD* be acknowledged (This is missing in the metadata)
5. Info: The technology readiness level *SHOULD* be expressed (This is missing in the metadata)
Rating
★ ★ ★ ☆ ☆
There were 1 error(s) harvesting this metadata, please inspect the log.
(log file starts at Fri May 24 03:05:13 UTC 2024)

[harvester info] --> Processing corpus-frontend (https://github.com/INL/corpus-frontend) [Fri May 24 03:05:13 UTC 2024]

[harvester info] Git updating cached clone of https://github.com/INL/corpus-frontend...

[harvester info] Found release v3.1.1

[harvester info] Using 'v3.1.1'

[harvester info] Git reference: v3.1.1

[harvester info] Scanning directory /tmp/codemeta-harvester.cache/corpus-frontend for harvestable resources...

[harvester info] found pom.xml (Java/Maven) for corpus-frontend, converting to codemeta

[harvester info] Looking for license....

[harvester info] No license file found

[harvester info] Getting contributors from git...

[harvester info] Getting top contributor from git...

[harvester info] Git top contributor Koen Mertens <koen.mertens@ivdnt.org> will be assigned as author (and maintainer) if none are found in the metadata

[harvester info] Extracting last and first commit date from git log....

[harvester info] Date created: 2014-03-19T11:00:15Z+0100, date modified: 2024-02-02T16:25:03Z+0300

[harvester info] Querying Github/GitLab API (https://github.com/INL/corpus-frontend)

[harvester info] Adding URL for found README: README.md

[harvester info] Found releaseNotes

[harvester info] Querying Zenodo API for DOI (access token provided)...

[harvester info] Looking for TRL information in README.md...

[harvester info] Looking for repostatus information in README.md...

[harvester info] Looking for continuous integration information in README.md...

[harvester info] Looking for documentation links in README.md...

[harvester info] Falling back to git tag (v3.1.1) if no version number is specified...

[harvester info] Inferring repostatus information from git activity (used only as a fallback if not explicitly provided)...

[harvester info] Inferred repostatus https://www.repostatus.org/#active

[harvester info] Looking for repostatus information in README.md in master branch...

[harvester info] Setting group Blacklab & Corpus Search

[harvester info] Reconciliating: codemetapy  --baseuri https://tools.dev.clariah.nl --baseuri https://tools.dev.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl --identifier "corpus-frontend" --codeRepository "https://github.com/INL/corpus-frontend" --validate /etc/software.ttl --released --enrich --textv "Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems" -O /tmp/out/corpus-frontend.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-version.corpus-frontend.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-repostatus.corpus-frontend.codemeta.json /tmp/codemeta-harvester.cache//tmp/90-authors.corpus-frontend.codemeta.json /tmp/codemeta-harvester.cache//tmp/43-releasenotes.corpus-frontend.codemeta.json /tmp/codemeta-harvester.cache//tmp/41-readme.corpus-frontend.codemeta.json /tmp/codemeta-harvester.cache//tmp/40-gitapi.corpus-frontend.codemeta.json /tmp/codemeta-harvester.cache//tmp/39-gitdate.corpus-frontend.codemeta.json /tmp/codemeta-harvester.cache//tmp/32-contributors.corpus-frontend.codemeta.json /tmp/codemeta-harvester.cache//tmp/21-java.corpus-frontend.codemeta.json /tmp/codemeta-harvester.cache//tmp/04-applicationSuite.corpus-frontend.codemeta.json 

-- begin log --

Passed 10 files/sources but specified 0 input types! Automatically guessing types...

Detected input types: [('/tmp/codemeta-harvester.cache//tmp/99-version.corpus-frontend.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/99-repostatus.corpus-frontend.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/90-authors.corpus-frontend.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/43-releasenotes.corpus-frontend.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/41-readme.corpus-frontend.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/40-gitapi.corpus-frontend.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/39-gitdate.corpus-frontend.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/32-contributors.corpus-frontend.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/21-java.corpus-frontend.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/04-applicationSuite.corpus-frontend.codemeta.json', 'json')]

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://tools.dev.clariah.nl/corpus-frontend

Processing source #1 of 10

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-version.corpus-frontend.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/corpus-frontend

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/corpus-frontend)] processed 1 new triples, total is now 2

Processing source #2 of 10

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-repostatus.corpus-frontend.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/corpus-frontend

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/corpus-frontend)] processed 1 new triples, total is now 3

Processing source #3 of 10

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/90-authors.corpus-frontend.codemeta.json

    Found main resource with URI https://tools.dev.clariah.nl/corpus-frontend.topcontributor/snapshot

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/corpus-frontend

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/corpus-frontend)] processed 8 new triples, total is now 10

Processing source #4 of 10

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/43-releasenotes.corpus-frontend.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/corpus-frontend

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/corpus-frontend)] processed 2 new triples, total is now 12

Processing source #5 of 10

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/41-readme.corpus-frontend.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/corpus-frontend

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/corpus-frontend)] processed 1 new triples, total is now 13

Processing source #6 of 10

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/40-gitapi.corpus-frontend.codemeta.json

    Found main resource with URI https://tools.dev.clariah.nl/corpus-frontend/snapshot

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/corpus-frontend

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/corpus-frontend)] processed 12 new triples, total is now 24

Processing source #7 of 10

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/39-gitdate.corpus-frontend.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/corpus-frontend

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/corpus-frontend)] overriding old http://schema.org/dateCreated (2014-07-11T08:18:55Z -> 2014-03-19T11:00:15Z+0100)

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/corpus-frontend)] overriding old http://schema.org/dateModified (2024-05-22T08:27:13Z -> 2024-02-02T16:25:03Z+0300)

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/corpus-frontend)] processed 2 new triples, total is now 24

Processing source #8 of 10

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/32-contributors.corpus-frontend.codemeta.json

    Found main resource with URI https://tools.dev.clariah.nl/corpus-frontend.contributors/snapshot

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/corpus-frontend

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/corpus-frontend)] processed 68 new triples, total is now 87

Processing source #9 of 10

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/21-java.corpus-frontend.codemeta.json

    Found main resource with URI https://tools.dev.clariah.nl/nl.inl.blacklab.corpus-frontend/3.1.1

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/corpus-frontend

[CODEMETA COMPOSITION (nl.inl.blacklab.corpus-frontend)] overriding old http://schema.org/author (https://tools.dev.clariah.nl/stub/H-6ee3ea3283c5c3dd -> https://tools.dev.clariah.nl/stub/H-15b0695313c3fe60)

[CODEMETA COMPOSITION (nl.inl.blacklab.corpus-frontend)] overriding old http://schema.org/codeRepository (https://github.com/INL/corpus-frontend -> https://github.com/inl/corpus-frontend)

[CODEMETA COMPOSITION (nl.inl.blacklab.corpus-frontend)] overriding old http://schema.org/description (BlackLab Frontend, a feature-rich corpus search interface for BlackLab. -> A web application to search corpora through the BlackLab Server web service.)

[CODEMETA COMPOSITION (nl.inl.blacklab.corpus-frontend)] overriding old http://schema.org/name (corpus-frontend -> INT Corpus Frontend)

[CODEMETA COMPOSITION (nl.inl.blacklab.corpus-frontend)] overriding old http://schema.org/version (v3.1.1 -> 3.1.1)

[CODEMETA COMPOSITION (nl.inl.blacklab.corpus-frontend)] processed 100 new triples, total is now 177

Processing source #10 of 10

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/04-applicationSuite.corpus-frontend.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/corpus-frontend

[CODEMETA COMPOSITION (nl.inl.blacklab.corpus-frontend)] processed 1 new triples, total is now 178

Remapping URI to (possibly) new identifier and version component: https://tools.dev.clariah.nl/corpus-frontend -> https://tools.dev.clariah.nl/corpus-frontend/3.1.1

[CODEMETA VALIDATION (corpus-frontend)] done

[CODEMETA ENRICHMENT (corpus-frontend)] Guessing interface type http://schema.org/WebApplication based on clues

[CODEMETA ENRICHMENT (corpus-frontend)] considering first author as maintainer

VALIDATION https://tools.dev.clariah.nl/corpus-frontend/3.1.1 #1: Info: Software source code *SHOULD* link to a continuous integration service that builds the software and runs the software's tests (This is missing in the metadata)

VALIDATION https://tools.dev.clariah.nl/corpus-frontend/3.1.1 #2: Warning: Documentation *SHOULD* be expressed (This is missing in the metadata)

VALIDATION https://tools.dev.clariah.nl/corpus-frontend/3.1.1 #3: Info: Reference publications *SHOULD* be expressed, if any (This is missing in the metadata)

VALIDATION https://tools.dev.clariah.nl/corpus-frontend/3.1.1 #4: Info: The funder *SHOULD* be acknowledged (This is missing in the metadata)

VALIDATION https://tools.dev.clariah.nl/corpus-frontend/3.1.1 #5: Info: The technology readiness level *SHOULD* be expressed (This is missing in the metadata)

-- end log --

[harvester info] Output written to /tmp/out/corpus-frontend.codemeta.json

[harvester info] Harvesting remote service URL https://portal.clarin.inl.nl/autocorp/ for corpus-frontend: codemetapy  --baseuri https://tools.dev.clariah.nl --baseuri https://tools.dev.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl -O "/tmp/codemeta-harvester.cache//tmp/corpus-frontend.codemeta.json" "/tmp/out/corpus-frontend.codemeta.json" "https://portal.clarin.inl.nl/autocorp/"

-- begin log --

Passed 2 files/sources but specified 0 input types! Automatically guessing types...

Detected input types: [('/tmp/out/corpus-frontend.codemeta.json', 'json'), ('https://portal.clarin.inl.nl/autocorp/', 'web')]

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://tools.dev.clariah.nl/corpus-frontend

Processing source #1 of 2

Parsing json-ld file from /tmp/out/corpus-frontend.codemeta.json

    Found main resource with URI https://tools.dev.clariah.nl/corpus-frontend/3.1.1

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/corpus-frontend

[CODEMETA COMPOSITION (corpus-frontend)] processed 199 new triples, total is now 199

Processing source #2 of 2

Fallback: Obtaining metadata from remote URL https://portal.clarin.inl.nl/autocorp/

    Service replied with content-type text/html

Traceback (most recent call last):

  File "/usr/bin/codemetapy", line 8, in <module>

    sys.exit(main())

  File "/usr/lib/python3.10/site-packages/codemeta/codemeta.py", line 335, in main

    g, res, args, contextgraph = build(**args.__dict__)

  File "/usr/lib/python3.10/site-packages/codemeta/codemeta.py", line 688, in build

    for targetres in codemeta.parsers.web.parse_web(

  File "/usr/lib/python3.10/site-packages/codemeta/parsers/web.py", line 132, in parse_web

    raise MiddlewareObstructionException(

codemeta.parsers.web.MiddlewareObstructionException: Unable to extract metadata from https://portal.clarin.inl.nl/autocorp/ because it immediately redirects to an external (SSO) login page rather than a proper landing page

-- end log --

[harvester error] Failed to obtain or process metadata from remote service URL https://portal.clarin.inl.nl/autocorp/ for corpus-frontend

[harvester info] Harvesting remote service URL https://opensonar.ivdnt.org/ for corpus-frontend: codemetapy  --baseuri https://tools.dev.clariah.nl --baseuri https://tools.dev.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl -O "/tmp/codemeta-harvester.cache//tmp/corpus-frontend.codemeta.json" "/tmp/out/corpus-frontend.codemeta.json" "https://opensonar.ivdnt.org/"

[harvester info] Harvesting remote service URL https://brievenalsbuit.ivdnt.org for corpus-frontend: codemetapy  --baseuri https://tools.dev.clariah.nl --baseuri https://tools.dev.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl -O "/tmp/codemeta-harvester.cache//tmp/corpus-frontend.codemeta.json" "/tmp/out/corpus-frontend.codemeta.json" "https://brievenalsbuit.ivdnt.org"

[harvester info] Harvesting remote service URL https://chn.ivdnt.org/ for corpus-frontend: codemetapy  --baseuri https://tools.dev.clariah.nl --baseuri https://tools.dev.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl -O "/tmp/codemeta-harvester.cache//tmp/corpus-frontend.codemeta.json" "/tmp/out/corpus-frontend.codemeta.json" "https://chn.ivdnt.org/"

[harvester info] <-- Finished processing corpus-frontend (https://github.com/INL/corpus-frontend) [Fri May 24 03:05:48 UTC 2024]

        

Metadata Properties

Version
3.1.1 (release notes)
Interface types
  • Web Application
Source code repository
 https://github.com/INL/corpus-frontend  Stars are an indicator of the popularity of this project on GitHub
Keywords
  • corpus
Development Status
  • Active: The project has reached a stable, usable state and is being actively developed.
Issue Tracker (Support)
https://github.com/INL/corpus-frontend/issues  The number of open issues on the issue tracker  The number of closes issues on the issue tracker
Documentation
License
Author(s)
Maintainer(s)
Contributor(s)
Producer
Programming Language
  • Java
Runtime Platform
  • Java
Software dependencies
  • Saxon-HE
  • commons-beanutils
  • commons-collections4
  • commons-configuration2
  • commons-io
  • commons-lang3
  • gson
  • javax.servlet-api
  • junit
  • slf4j-api
  • slf4j-jdk14
  • velocity
  • velocity-tools-generic
  • velocity-tools-view
Metadata validation
★ ★ ★ ☆ ☆
Created
2014-03-19 11:00:15 +0100
Last modified
2024-02-02 16:25:03 +0300  Last commit (main branch). Gives an indication of project development activity and rough indication of how up-to-date the latest release is.  Number of commits since the last release. Gives an indication of project development activity and rough indication of how up-to-date the latest release is.