Tokenization Improvements
With the tokenizer
module now having been around for a bit, its time to make some improvements based on the initial arch. This ticket will server as the master for some other dev tasks to help improve this service.
- #76 (closed) - Tokenization of Local Bags (for ingest)
-
#94 - Update Tokenization API
- Send json body of filenames for tokens - easier encoding/decoding of paths with special characters
-
#95 (closed) - Maintain a Cache of Tokens
- Probably just an H2 db, nothing fancy. Might rework flow from here as well.
-
#96 (closed) - Updates to the Tokenization Process
- Changes to facilitate ingest tokenization and a local cache
- Should make both changes a matter of adding a few classes and plugging them in
- Investigate Intermittent Stopping of Tokenization
- Seems to be related to objects not being removed from threadpool.
- Might not be necessary after above changes.