PICCL

A set of workflows for corpus building through OCR, post-correction, and normalisation.

Tool suite: TICCL & PICCL

The following closely related tools are in a tool suite together with PICCL:

  • Software Library
  • Unsupported: The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.

TICCLTools 0.10

TicclTools is a collection of programs to process datafiles, together they constitute the bulk of TICCL: Text Induced Corpus-Cleanup. This software consists of individual modules that are invoked by the pipeline system PICCL. [view more]
  • natural language processing
  • nlp
  • normalization
  • ocr
  • Posix
Created: 2015

References

Citation

Please use one of the above reference publications to cite the software, if you want to cite the software directly, you can use the following citation generated from the metadata:

PICCL 0.9.5 .
  • Centre for Language and Speech Technology
.

Logs & Reviews

Name
Automatic software metadata validation report for PICCL 0.9.5
Author
  • codemetapy validator using software.ttl
Date
2024-07-22 03:12:20
Review
Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems

Validation of PICCL 0.9.5 was successful (score=3/5), but there are some warnings which should be addressed:

1. Info: An interface type *SHOULD* be expressed: Software source code should define one or more target products that are the resulting software applications offering specific interfaces (This is missing in the metadata)
2. Warning: Documentation *SHOULD* be expressed (This is missing in the metadata)
3. Info: The technology readiness level *SHOULD* be expressed (This is missing in the metadata)
Rating
★ ★ ★ ☆ ☆
(log file starts at Mon Jul 22 03:12:18 UTC 2024)

[harvester info] --> Processing piccl (https://github.com/LanguageMachines/PICCL) [Mon Jul 22 03:12:18 UTC 2024]

[harvester info] Git updating cached clone of https://github.com/LanguageMachines/PICCL...

[harvester info] Found release v0.9.5

[harvester info] Using 'v0.9.5'

[harvester info] Git reference: v0.9.5

[harvester info] Scanning directory /tmp/codemeta-harvester.cache/piccl for harvestable resources...

[harvester info] found codemeta.json for piccl (md5sum bc252783fbb5adeb07f8d56d41b33821); **NOTE: this is considered authoritative and most other detection methods will be skipped now!**

[harvester info] Inferring repostatus information from git activity (used only as a fallback if not explicitly provided)...

[harvester info] Inferred repostatus https://www.repostatus.org/#inactive

[harvester info] Looking for repostatus information in README.md in master branch...

[harvester info] Found repostatus (master branch) https://www.repostatus.org/#unsupported

[harvester info] Setting group TICCL & PICCL

[harvester info] Reconciliating: codemetapy  --baseuri https://tools.dev.clariah.nl --baseuri https://tools.dev.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl --identifier "piccl" --codeRepository "https://github.com/LanguageMachines/PICCL" --validate /etc/software.ttl --released --enrich --textv "Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems" -O /tmp/out/piccl.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-repostatus.piccl.codemeta.json /tmp/codemeta-harvester.cache//tmp/10-jsonld.piccl.codemeta.json /tmp/codemeta-harvester.cache//tmp/05-repostatus.piccl.codemeta.json /tmp/codemeta-harvester.cache//tmp/04-applicationSuite.piccl.codemeta.json 

-- begin log --

Passed 4 files/sources but specified 0 input types! Automatically guessing types...

Detected input types: [('/tmp/codemeta-harvester.cache//tmp/99-repostatus.piccl.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/10-jsonld.piccl.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/05-repostatus.piccl.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/04-applicationSuite.piccl.codemeta.json', 'json')]

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://tools.dev.clariah.nl/piccl

Processing source #1 of 4

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-repostatus.piccl.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/piccl

[CODEMETA COMPOSITION (https://tools.dev.clariah.nl/piccl)] processed 1 new triples, total is now 2

Processing source #2 of 4

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/10-jsonld.piccl.codemeta.json

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/piccl

[CODEMETA COMPOSITION (piccl)] overriding old https://codemeta.github.io/terms/developmentStatus (https://www.repostatus.org/#inactive -> active)

[CODEMETA CORRECTION (piccl)] automatically converting status active to repostatus URI

[CODEMETA CORRECTION (piccl)] automatically converting spdx license URI from https:// to http:///

[CODEMETA COMPOSITION (piccl)] processed 126 new triples, total is now 126

Processing source #3 of 4

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/05-repostatus.piccl.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/piccl

[CODEMETA COMPOSITION (piccl)] overriding old https://codemeta.github.io/terms/developmentStatus (https://www.repostatus.org/#active -> https://www.repostatus.org/#unsupported)

[CODEMETA COMPOSITION (piccl)] processed 1 new triples, total is now 126

Processing source #4 of 4

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/04-applicationSuite.piccl.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.dev.clariah.nl/piccl

[CODEMETA COMPOSITION (piccl)] processed 1 new triples, total is now 127

Remapping URI to (possibly) new identifier and version component: https://tools.dev.clariah.nl/piccl -> https://tools.dev.clariah.nl/piccl/0.9.5

[CODEMETA VALIDATION (piccl)] done

[CODEMETA ENRICHMENT (piccl)] adding author https://tools.dev.clariah.nl/stub/H4eb9c81916eb2cb4 as contributor

[CODEMETA ENRICHMENT (piccl)] adding author https://orcid.org/0000-0002-1046-0006 as contributor

[CODEMETA ENRICHMENT (piccl)] considering first author as maintainer

VALIDATION https://tools.dev.clariah.nl/piccl/0.9.5 #1: Info: An interface type *SHOULD* be expressed: Software source code should define one or more target products that are the resulting software applications offering specific interfaces (This is missing in the metadata)

VALIDATION https://tools.dev.clariah.nl/piccl/0.9.5 #2: Warning: Documentation *SHOULD* be expressed (This is missing in the metadata)

VALIDATION https://tools.dev.clariah.nl/piccl/0.9.5 #3: Info: The technology readiness level *SHOULD* be expressed (This is missing in the metadata)

-- end log --

[harvester info] Output written to /tmp/out/piccl.codemeta.json

[harvester info] <-- Finished processing piccl (https://github.com/LanguageMachines/PICCL) [Mon Jul 22 03:12:20 UTC 2024]

        

Metadata Properties

Version
0.9.5 (release notes)
Interface types
  • Unknown
Software website
Source code repository
 https://github.com/LanguageMachines/PICCL  Stars are an indicator of the popularity of this project on GitHub
Keywords
  • natural language processing
  • nlp
  • ocr
Development Status
  • Unsupported: The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.
Issue Tracker (Support)
https://github.com/LanguageMachines/PICCL/issues  The number of open issues on the issue tracker  The number of closes issues on the issue tracker
Documentation
License
Author(s)
Maintainer(s)
Contributor(s)
Producer
Funder
Programming Language
  • Nextflow
Continuous Integration Tests
https://travis-ci.org/LanguageMachines/PICCL
Operating System
  • POSIX
Software dependencies
  • Tesseract OCR
  • TICCLtools
  • aNtiLoPe
  • Nextflow
  • FoLiA utilities
Metadata validation
★ ★ ★ ☆ ☆
Created
2015