ace issueshttps://gitlab.umiacs.umd.edu/adapt/ace/-/issues2019-04-24T14:06:24-04:00https://gitlab.umiacs.umd.edu/adapt/ace/-/issues/61Investigate Potential AuditThread Deadlock2019-04-24T14:06:24-04:00Ghost UserInvestigate Potential AuditThread DeadlockAt UCSD we saw an AuditThread which had been idle for 13 days, indicating that a deadlock might exist somewhere in the AuditThread. They have audit blocking enabled, with a maximum of a 5 minute block. From my testing, when going beyond ...At UCSD we saw an AuditThread which had been idle for 13 days, indicating that a deadlock might exist somewhere in the AuditThread. They have audit blocking enabled, with a maximum of a 5 minute block. From my testing, when going beyond the 5 minutes, a NullPointerException is thrown and the audit fails.
At the very least I think it might be time to rework how the blocking works now that we have better tools with Functions, Supplies, etc. In addition, it might be good to look into adding a number of retries available, instead of a maximum number of total time spent waiting. This way instead of checking against elapsed time, we only increment our attempt number and sleep for the specified amount of time.
### Info so far
* It occurs during an audit, so the beginning and end are ok
* RequestThread log entries are seen continuously, so the batch has not yet been closed
* ValidationThread log entries are not seen, a good candidate for where the blocking is occurring
* This would put our idle somewhere around `validator.add(item.getFileDigest(), token);`
* The TokenValidator itself seems sound, should be able to process without a lock
* TokenValidator::add does attempt to acquire a lock which can block, and is only blocked during TokenValidator::processBatch
* TokenValidator::processBatch only seems to block when communicating with the IMS; I don't see the database calls being what is causing us to block indefinitely (though maybe it's possible)
* It's possible that the old IMSService::blockUntil had a bug in it, though it seemed to be soundhttps://gitlab.umiacs.umd.edu/adapt/ace/-/issues/37Audit threads not always cleaning up properly after system_errors2018-01-23T08:32:46-05:00Ghost UserAudit threads not always cleaning up properly after system_errorsWe've noticed when there are problems connecting to the IMS, the audit threads in the Audit Manager do not always terminate. This could be from extra validation/request batch threads going on, but it causes other audits to be unresponsiv...We've noticed when there are problems connecting to the IMS, the audit threads in the Audit Manager do not always terminate. This could be from extra validation/request batch threads going on, but it causes other audits to be unresponsive (as they can't be submitted) and unintuitive behavior in the ui.