Skip to content

Tokenization Improvements

With the tokenizer module now having been around for a bit, its time to make some improvements based on the initial arch. This ticket will server as the master for some other dev tasks to help improve this service.

  • #76 (closed) - Tokenization of Local Bags (for ingest)
  • #94 - Update Tokenization API
    • Send json body of filenames for tokens - easier encoding/decoding of paths with special characters
  • #95 (closed) - Maintain a Cache of Tokens
    • Probably just an H2 db, nothing fancy. Might rework flow from here as well.
  • #96 (closed) - Updates to the Tokenization Process
    • Changes to facilitate ingest tokenization and a local cache
    • Should make both changes a matter of adding a few classes and plugging them in
  • Investigate Intermittent Stopping of Tokenization
    • Seems to be related to objects not being removed from threadpool.
    • Might not be necessary after above changes.
Edited by Ghost User