Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • A ace
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 36
    • Issues 36
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 0
    • Merge requests 0
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • adapt
  • ace
  • Issues
  • #61
Closed
Open
Issue created Apr 11, 2019 by Michael Ritter@shakeContributor

Investigate Potential AuditThread Deadlock

At UCSD we saw an AuditThread which had been idle for 13 days, indicating that a deadlock might exist somewhere in the AuditThread. They have audit blocking enabled, with a maximum of a 5 minute block. From my testing, when going beyond the 5 minutes, a NullPointerException is thrown and the audit fails.

At the very least I think it might be time to rework how the blocking works now that we have better tools with Functions, Supplies, etc. In addition, it might be good to look into adding a number of retries available, instead of a maximum number of total time spent waiting. This way instead of checking against elapsed time, we only increment our attempt number and sleep for the specified amount of time.

Info so far

  • It occurs during an audit, so the beginning and end are ok
  • RequestThread log entries are seen continuously, so the batch has not yet been closed
  • ValidationThread log entries are not seen, a good candidate for where the blocking is occurring
  • This would put our idle somewhere around validator.add(item.getFileDigest(), token);
  • The TokenValidator itself seems sound, should be able to process without a lock
  • TokenValidator::add does attempt to acquire a lock which can block, and is only blocked during TokenValidator::processBatch
  • TokenValidator::processBatch only seems to block when communicating with the IMS; I don't see the database calls being what is causing us to block indefinitely (though maybe it's possible)
  • It's possible that the old IMSService::blockUntil had a bug in it, though it seemed to be sound
Assignee
Assign to
Time tracking