Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Who present in meeting
Commit
Details
Root Cause Analysis

Actions


Takeaways

...

10/09/2020 16:45

Who present in meetingJay, Rob, Isaac, Pete R, Mark L, Luna, Phill C
Commita1bfda21dc8
DetailsFailure in the functiona-tests stage test classes: PushAuthenticationSenderNode, PushResultVerifierTest
Root Cause Analysis

Looks like a Miranda timeout as the mock push service is not indicating that it has received a message. The commit that triggered this does not appear to be directly related to the failure we are seeing.

Failure seen in functional tests https://temper-dashboard.engineering.forgerock.com/openam/master/functional/test-run/a1bfda21dc848bcd0dac3274697efc473953138f/functional-tests

PushAuthenticationSenderNode and PushResultVerifierTest both timed out while awaiting a state change to be witnessed by Miranda. The build that indicated this failure was not displayed to us in the dashboard due to failures in bitbucket. Instead, investigation focused on PRs that had been merged to master recently, and it was noted that the commit o3ceae saw similar failures in the functional-test stages when run by Jenkins.

Actions

Reverted the offending commit: https://stash.forgerock.org/projects/OPENAM/repos/openam/pull-requests/12352/overview

Master has been left locked while we await this commit to be built and verify the issue is resolved.

TakeawaysAdditional scrutiny should be applied to checking that Jenkins has approved a given commit. This is especially important in situations where build failures have been frequent and it has taken some time to get the Jenkins approval, and when multiple other individuals have approved the PR.

...

10/09/2020 16:42

Who present in meetingJay, Rob, Isaac, Pete R, Mark L, Luna
Commita1bfda21dc8
DetailsFailure in the image-smoke-tests stage test class JwtBearerTokenEndpoint
Root Cause AnalysisSee RCA on incident report "07/09/2020 15:49"

Actions

Raised the JIRA that was previously created and attempted mitigation back into process.
Takeaways

This failure has been seen three times in the last 48 hours and needs addressing.

An attempt to mitigate this issue was committed in 161065b9785 which does not appear to have worked. Further investigation may be required, a potential solution would be to replace the hard-coded 'alias.localtest.me' with '<random string>.localtest.me' though this may simply hide the underlying issue of realms not being deleted before subsequent tests are run.

...

09/09/2020 09:51

Who present in meetingJay, Isaac, Rob, Rich, Andy
Commit9bf21928638
Details

14 functional tests marked as failed.

Test class: TestSessionResourceV2V3AndV4

Code Block
themeConfluence
SessionResource endpoint version 4_0 returns valid:false in JSON with HTTP 200
when using stateless Sessions
when getSessionInfoAndResetIdleTime is called after admin has logged the user out
when an admin session ID is sent along with the request in a cookie
Root Cause AnalysisBelieved to be a repeating flakey test. Failure observed on master at 03/09/2020 09:30. Also observed recently on a pull request.

Actions

Tracked under AME-20214.

TakeawaysGiven the test is flaky, the recommendation is to allow pipeline to continue.

...