Skip to end of metadata
Go to start of metadata

Template

dd/MM/yyyy hh:mm

Who present in meeting
Commit
Details
Root Cause Analysis

Actions


Takeaways

21/09/2020 10:35

Who present in meetingAlun, Andrew F, Dipu, Jay, Kajetan, Mark L, Michael Carter, Ravi, Gabor, Rich W
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/ebb287c18edf9edfffc19c9039e2ad355d5b8ad7

However, there were hidden commits that did not appear on the dashboard prior to this commit. 
https://ci.forgerock.org/blue/organizations/jenkins/OpenAM-master%2FOpenAM-Pipeline/detail/master/4781/changes

The commit responsible for the failure could therefore have been anyone of these:
https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/ebb287c18edf9edfffc19c9039e2ad355d5b8ad7
https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/32a20cc2219d618c5b4051487a0661515fa308c2
https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/1ce940e79e92aded40ea7fa630f5f739419af4e7
https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/9a2b1cc8c0cfac1d6dca218ae79da3baf2a4ba55
https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/723e746fed2785418ce9bf915fd150a9d41d3c83
Detailspublish-artifacts stage failure 
Root Cause Analysis
[Pipeline] [gcp-gce-jenkins-agent-2] unstash
[Pipeline] [gcp-gce-jenkins-agent-2] unstash
[Pipeline] [gcp-gce-jenkins-agent-2] unstash
[Pipeline] [gcp-gce-jenkins-agent-2] sh
[gcp-gce-jenkins-agent-2] + docker pull gcr.io/forgerock-io/am/docker-build:7.1.0-ebb287c18edf9edfffc19c9039e2ad355d5b8ad7
[gcp-gce-jenkins-agent-2] Error response from daemon: manifest for gcr.io/forgerock-io/am/docker-build:7.1.0-ebb287c18edf9edfffc19c9039e2ad355d5b8ad7 not found
[Pipeline] [gcp-gce-jenkins-agent-2] }
[Pipeline] [gcp-gce-jenkins-agent-2] // withEnv
[Pipeline] [gcp-gce-jenkins-agent-2] }
[Pipeline] [gcp-gce-jenkins-agent-2] // node
[Pipeline] [gcp-gce-jenkins-agent-2] }
[Pipeline] [gcp-gce-jenkins-agent-2] // stage
[Pipeline] [gcp-gce-jenkins-agent-2] }
[Pipeline] [gcp-gce-jenkins-agent-2] // timeout
[Pipeline] [gcp-gce-jenkins-agent-2] echo
[gcp-gce-jenkins-agent-2] Exception occurred:
[gcp-gce-jenkins-agent-2] hudson.AbortException: script returned exit code 1
[gcp-gce-jenkins-agent-2] at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.handleExit(DurableTaskStep.java:558)
[gcp-gce-jenkins-agent-2] at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:504)
[gcp-gce-jenkins-agent-2] at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:450)
[gcp-gce-jenkins-agent-2] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[gcp-gce-jenkins-agent-2] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[gcp-gce-jenkins-agent-2] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[gcp-gce-jenkins-agent-2] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[gcp-gce-jenkins-agent-2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[gcp-gce-jenkins-agent-2] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[gcp-gce-jenkins-agent-2] at java.lang.Thread.run(Thread.java:748)

Appears as though the publish-artifacts stage is still trying to pull the am Docker image which no longer exists.

Actions

Update publish-artifacts.groovy to remove pulling am Docker image. AME-20378 - Getting issue details... STATUS

Takeaways
  • Jenkins log file was v. large and difficult to pull/search in order to find issue → possible to fragment logs or at least be able to open in browser?
  • Awareness of pipeline stages that may need to be updated given certain AM modifications may not have been present


14/09/202 15:09

Who present in meetingDavid L, Rob W, Pete, Jay B
Commitf5c396902fe755a2b6415961403ecef9420e118b
DetailsThe build stage has failed.
Root Cause Analysis

The Build stage has failed with the exception:

Exception: hudson.AbortException: script returned exit code 1 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.handleExit(DurableTaskStep.java:558)
 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:504)
 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:450)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Primary Report: See more logs at 
https://qa.forgerock.com/am/master/f5c396902fe755a2b6415961403ecef9420e118b/343813e9-8727-4f7f-a221-82d94b3cdf68/build/jenkinslogs.txt ... 
(23 lines excluded) ... + docker ps --filter label=1 CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES + 
docker container prune --force --filter label=1 Total reclaimed space: 0B + docker ps --filter label=1 
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES script returned exit code 1
Exception occurred: hudson.AbortException: script returned exit code 1
 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.handleExit(DurableTaskStep.java:558)
 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:504)
 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:450)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748) + curl --show-error --silent -u **** 
'https://ci.forgerock.org/blue/rest/organizations/jenkins/pipelines/OpenAM-master/pipelines/OpenAM-Pipeline/branches/master/runs/4771/nodes/?limit=10000' + mkdir 2064b2e3-6d44-4764-b8ad-8fcdb301ede6 + curl 
--show-error --silent -u **** 
'https://ci.forgerock.org/blue/rest/organizations/jenkins/pipelines/OpenAM-master/pipelines/OpenAM-Pipeline/branches/master/runs/4771/nodes/1183/log/?start=0'

The build stage on PR#11539 was successful. We have noticed that the PR was run with the configuration:

CI Build Configuration
mock-build=true
fast-mode=true
only-stages=ui-admin-smoke-tests,mandatory-coverage

The change included in the PR attempts to copy the stashed m2 files into every stages input directory. This was run on a PR with mock-build, which skips the build stage, and uses stashes from a previous run for subsequent stages. On the merge to master, this tries to copy the m2 stash into the input directory of the build stage. At this point, the m2 stash does not exist, and because cp command exits with an error code if there is nothing to copy, an exception is thrown.

Actions

Revert commit from master

Takeaways

The mock-build=true configuration broke the build in the jenkins scripts.

The build could add a comment and task to the PP in stash that will warn the developer of the risks.  AME-20337 - Getting issue details... STATUS


11/09/2020 16:50

Who present in meetingJay, Andy, Isaac, Pete R, Luna
Commit 469cf832d38
DetailsFailure in the functional-tests stage test classes: TestSetDynamicAttributes
Root Cause Analysis

Looks like a Miranda timeout as the mock push service is not indicating that it has received a message. The commit that triggered this does not appear to be directly related to the failure we are seeing.

Failure seen in functional tests https://temper-dashboard.engineering.forgerock.com/openam/master/functional/test-run/469cf832d38f4e5a908d20e4c648176e017f4c6e/functional-tests

The test history does not seem to show this test as being flaky, however the run on the PR prior to merging was successful and local runs during the post mortem were also successful.

The test has succeeded in isolation locally, when looped in temper and run multiple times against a locally deployed AM. The stage was also green prior to merging the PR.

Actions

PR 12419 has been created to re-run the functional-tests stage. If it fails, the commit will be reverted, if it succeeds, an investigation will be needed into the flakiness of the test.

Takeaways


10/09/2020 16:45

Who present in meetingJay, Rob, Isaac, Pete R, Mark L, Luna, Phill C
Commita1bfda21dc8
DetailsFailure in the functional-tests stage test classes: PushAuthenticationSenderNode, PushResultVerifierTest
Root Cause Analysis

Looks like a Miranda timeout as the mock push service is not indicating that it has received a message. The commit that triggered this does not appear to be directly related to the failure we are seeing.

Failure seen in functional tests https://temper-dashboard.engineering.forgerock.com/openam/master/functional/test-run/a1bfda21dc848bcd0dac3274697efc473953138f/functional-tests

PushAuthenticationSenderNode and PushResultVerifierTest both timed out while awaiting a state change to be witnessed by Miranda. The build that indicated this failure was not displayed to us in the dashboard due to failures in bitbucket. Instead, investigation focused on PRs that had been merged to master recently, and it was noted that the commit o3ceae saw similar failures in the functional-test stages when run by Jenkins.

Actions

Reverted commit 03ceae73a16 with PR-12352.

Master has been left locked while we await this commit to be built and verify the issue is resolved.

Takeaways

Additional scrutiny should be applied to checking that Jenkins has approved a given commit. This is especially important in situations where build failures have been frequent and it has taken some time to get the Jenkins approval, and when multiple other individuals have approved the PR.

When commit 03ceae73a16 was merged to master no Jenkins Post-Commit build was run. The next run failed at the functional-test stage due to Jenkins restart. Subsequent commits were also missed by Jenkins again further delaying visibility of the issue.


10/09/2020 16:42

Who present in meetingJay, Rob, Isaac, Pete R, Mark L, Luna
Commita1bfda21dc8
DetailsFailure in the image-smoke-tests stage test class JwtBearerTokenEndpoint
Root Cause AnalysisSee RCA on incident report "07/09/2020 15:49"

Actions

Raised the JIRA that was previously created and attempted mitigation back into process.
Takeaways

This failure has been seen three times in the last 48 hours and needs addressing.

An attempt to mitigate this issue was committed in 161065b9785 which does not appear to have worked. Further investigation may be required, a potential solution would be to replace the hard-coded 'alias.localtest.me' with '<random string>.localtest.me' though this may simply hide the underlying issue of realms not being deleted before subsequent tests are run.


09/09/2020 09:51

Who present in meetingJay, Isaac, Rob, Rich, Andy
Commit9bf21928638
Details

14 functional tests marked as failed.

Test class: TestSessionResourceV2V3AndV4

SessionResource endpoint version 4_0 returns valid:false in JSON with HTTP 200
when using stateless Sessions
when getSessionInfoAndResetIdleTime is called after admin has logged the user out
when an admin session ID is sent along with the request in a cookie
Root Cause AnalysisBelieved to be a repeating flakey test. Failure observed on master at 03/09/2020 09:30. Also observed recently on a pull request.

Actions

Tracked under AME-20214.

TakeawaysGiven the test is flaky, the recommendation is to allow pipeline to continue.

08/09/2020 12:44

Who present in meetingIsaac, Luna, Jay, Kevin
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/d96990b1e478f9e3b6c442f5bb5fbd9ff4061d61
DetailsFailure in the image-smoke-tests stage test class JwtBearerTokenEndpoint
Root Cause AnalysisSee RCA on incident report "07/09/2020 15:49"

Actions

Priority raised.

TakeawaysThis failure has been seen twice in the last 24 hours and probably needs addressing.

08/09/2020 10:35

Who present in meetingIsaac, Luna, Kevin, Jay, Rob, Andy F, Richard
Commit9e89805d62c
Details

InterruptedException in JavaDoc build.

java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:248) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:237) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:298) at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:67) at org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:264) at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:247) at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:180) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113) at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20) at com.forgerock.pipeline.reporting.ElasticSearchPipelineRunStorage.updateRunStage(file:/var/lib/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/reporting/ElasticSearchPipelineRunStorage.groovy:94) at com.forgerock.pipeline.reporting.PipelineRun.updateStatus(/forgerock/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/reporting/PipelineRun.groovy:205) at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2030) at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2015) at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2056) at com.forgerock.pipeline.reporting.PipelineRun.updateStatus(/forgerock/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/reporting/PipelineRun.groovy:191) at com.forgerock.pipeline.reporting.PipelineRun.updateStageStatusAsInProgress(/forgerock/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/reporting/PipelineRun.groovy:164) at WorkflowScript.runPipeline(WorkflowScript:148) at com.forgerock.pipeline.stage.StageRunner.accept(file:/var/lib/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/stage/StageRunner.groovy:81) at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2030) at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2015) at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2056) at com.forgerock.pipeline.stage.StageRunner.accept(file:/var/lib/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/stage/StageRunner.groovy:80) at com.forgerock.pipeline.stage.StageRunner.runStage(file:/var/lib/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/stage/StageRunner.groovy:136) at ___cps.transform___(Native Method) at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:57) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82) at sun.reflect.GeneratedMethodAccessor650.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:103) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82) at sun.reflect.GeneratedMethodAccessor650.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72) at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:60) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82) at sun.reflect.GeneratedMethodAccessor650.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72) at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21) at com.cloudbees.groovy.cps.Next.step(Next.java:83) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163) at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122) at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261) at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$101(SandboxContinuable.java:34) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.lambda$run0$0(SandboxContinuable.java:59) at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:237) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:58) at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:332) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:83) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:244) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:232) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Root Cause AnalysisAppears to be caused by JENKINS-46507.

Actions

Jenkins was restarted twice to prevent further builds from falling over. Robin suggests the root cause can only be fixed in GCP.

TakeawaysLook out for InterruptedException and flag ASAP for Jenkins restart.

07/09/2020 15:49

Who present in meetingIsaac, Luna, Pete, Rob
Commit311cb771cee
DetailsRepetition of the ExternalDjTest test failure seen at 047da8872ad.
Root Cause AnalysisSame as original.

Actions

We expect the original fix which is in PR to resolve this issue. If it reoccurs then we can take subsequent action.

Takeaways

07/09/2020 15:49

Who present in meetingIsaac, Luna, Pete, Rob
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/fac4299620899d8e1b4ab1f83890ffabbc55f1e0
DetailsOauth2 ImageSmokeTestFailure in RFC7523
Root Cause Analysis

5 failures were located in the image smoke test run https://qa.forgerock.com/am/master/fac4299620899d8e1b4ab1f83890ffabbc55f1e0/e167c118-a9f4-464c-868a-c654fad8b0c8/image-smoke-tests/ImageSmokeTestsReport/class-com.forgerock.openam.functionaltest.oauth2.rfc7523.JwtBearerTokenEndpoint.html


The commit that triggered the failure does not appear to contain information which should have resulted in this failure on its own:

https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/fac4299620899d8e1b4ab1f83890ffabbc55f1e0


It appears the test expects for a specific realm to not exist at the start of each test. This realm name appears to be hardcoded ("alias.localtest.me"). The test then fails when it attempts to create this realm at the start of the test.

Actions

A JIRA task has been created AME-20296 - Getting issue details... STATUS to investigate the cause of the realms not being cleanup correctly during the run.

TakeawaysIt is important to not hardcode significant values (realm names, user names, etc.) in tests. Instead, randomly generate them on each test run, so that issues caused by cleanup failing are mitigated. However, doing so may mask failures in our cleanup code or potential concurrency issues.

07/09/2020 15:29

Who present in meetingIsaac, Luna, Pete, Rob
Commit18aa0358e1a
Details

UserConfigTreeTest failed during a FT run on a commit.

Root Cause Analysis

UserConfigTreeTest has a failure:

https://qa.forgerock.com/am/master/18aa0358e1a6f5bffef0f6fdffe35e34b04efcb4/96600e67-8ef0-4f14-b5cd-e27f4e6e7ff5/functional-tests/functional-tests-report/FunctionalTestsReportReport/class-com.forgerock.openam.functionaltest.auth.trees.UserConfigTreeTest.html#e6b7b8f113a134da6daed496e999f8676e8aa7fa4104a49e148420abb583ca38


In a commit subsequent to this one, we see a successful run of this test:

https://temper-dashboard.engineering.forgerock.com/openam/master/functional/test-run/fac4299620899d8e1b4ab1f83890ffabbc55f1e0/functional-tests


The commit that triggered the failure does not appear to contain information which should have resulted in this failure on its own:

https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/18aa0358e1a6f5bffef0f6fdffe35e34b04efcb4


The test history is available in this Kibana Query however, at time of writing, test-events is currently re-indexing, so we are unable to view the query results.

Actions

There are no AM logs with which to proceed with investigation at this point - see the link to the failed test information. AME-20295 - Getting issue details... STATUS has been created for further investigation.

TakeawaysSimilar situation to the break at 07/09/2020 11:00.

07/09/2020 14:40

Who present in meetingPete, Emma
CommitN/A
Details3 instances of am-jenkins-static-lightweight-agent running at once, 2 of which are showing as offline in Jenkins.
Root Cause Analysis

groovy.json.JsonException: Unable to determine the current character, it is not a string, number, array, or object

The above exception is being thrown in the logs for every executor in a given pipeline. Each failure is then creating a process on a am-jenkins-static-lightweight-agent instance. This exception is being caused by the response from jenkins when trying to download log files for a stage being a html error page rather than the expected JSON.


At the end of the log file, the following is observed,
[gcp-gce-jenkins-agent-3] Cannot contact am-jenkins-static-lightweight-agent-8lw19p: java.lang.InterruptedException
[gcp-gce-jenkins-agent-2] Cannot contact am-jenkins-static-lightweight-agent-8lw19p: java.lang.InterruptedException
[gcp-gce-jenkins-agent-5] Cannot contact am-jenkins-static-lightweight-agent-8lw19p: java.lang.InterruptedException
[gcp-gce-jenkins-agent-6] Cannot contact am-jenkins-static-lightweight-agent-8lw19p: java.lang.InterruptedException
[gcp-gce-jenkins-agent-4] Cannot contact am-jenkins-static-lightweight-agent-8lw19p: java.lang.InterruptedException

Actions

TakeawaysThe cause of the java.lang.InterruptedException is still not clear, these notes have been added to AME-20247



07/09/2020 11:00

Who present in meetingJay B, Phil A, Dipu S, Kevin U, Isaac
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/37d4aa2551b41c45f7719202fefe2b16e84fd122
DetailsThere is an intermittent failure of the SocialAuthNodeTest in the functional tests stage.
Root Cause AnalysisThere was no obvious root cause, the commit that failed was not relevant and the subsequent commit passed. 

Actions

This is an intermittent failure and the issue  AME-20294 - Getting issue details... STATUS  has been raised to track the issue.

TakeawaysThat the test was flaky was obvious because the next commit had run and passed, if master had been immediately locked this would not have been obvious. One suggestion when implementing the automatic locking of master is to create a PR that re-runs the failed stage.

04/09/2020 10:40

Date/Time04/09/2020 10:40
Who present in meetingPhil A, Emma, Pete, Gabor
Commit047da8872ad
Details

This is an intermittent failure of the ExternalDJTest in functional-tests.

Root Cause Analysis

This is an intermittent failure caused by AM caching ExternalDJ connections based on the DJ Host URL. There is already a ticket and a PR open to change the functional test API to avoid this issue.
AME-20134

Actions

Takeaways

03/09/2020 15:40

Date/Time03/09/2020 15:40
Who present in meetingPhil A, Emma, Pete, Phill C
Commitb25bc870b07 ← Seen on this commit, but caused by 1b4c3e77036 which did not appear on the dashboard.
Details

Build Log

Root Cause Analysis

Build failure caused by unit test failure.

Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.215 sec <<< FAILURE! - in FbcMasterRulesTest


Tests fail due to redundant JSON files. These files are no longer required as they are already provided in the base config of the docker image.

Actions

Takeaways


03/09/2020 09:30

Date/Time03/09/2020 09:30
Who present in meetingPhil A, Andrew, Andy, Emma, Isaac, Pete
Commitb42a6d9395b ← Seen on this commit, but potentially in the range cce413bc3dc...b42a6d9395b
Details

Temper Report

Kibana Query

Functional tests stage had 14 failures in REST-Session.

We can see from the Kibana Query that this has failed 3 times for the same reason in the last month 

Root Cause Analysis


Actions

Created  AME-20280 - Getting issue details... STATUS

TakeawaysThere was an issue which hampered this investigation. For some reason commits to master aren't always triggering pipeline runs. This issue was seen last week and "fixed" with a Jenkins & Bitbucket restart. This requires further investigation from Releng and IT.


01/09/2020 16:30

Date/Time01/09/2020 16:30
Who present in meetingRich, Phil A, Jay
Commitc6fd48493b8
Details

Temper Report

Kibana Query

Image smoke stage had 6 failures.

We can see from the Kibana Query that this has failed 3 times for the same reason in the last month 

Root Cause Analysis
The stacktrace suggests it might be a timing issue with DS 


{{ 2020-09-01 12:46:56,733 [main] WARN com.forgerock.openam.functionaltest.commands.ConfigManager - Could not execute command: Delete realm: L0p3dEJlYXJlclRva2VuRW5kcG9pbnQtSUdqMXpDVmNCS1B4STB3}}
{{ org.forgerock.http.protocol.ResponseException: Got unsuccessful response: 400 Bad Request}}{{

{"code":400,"reason":"Bad Request","message":"Such node does not exist in the directory server."}

}}when attempting to read http://am.localtest.me:8080/am/json/global-config/realms/L0p3dEJlYXJlclRva2VuRW5kcG9pbnQtSUdqMXpDVmNCS1B4STB3 [1.0]
{{ at com.forgerock.openam.functionaltest.HttpUtils.isSuccessful(HttpUtils.java:76)}}
{{ at com.forgerock.openam.functionaltest.CrestClient.delete(CrestClient.java:245)}}
{{ at com.forgerock.openam.functionaltest.CrestClient.delete(CrestClient.java:218)}}
{{ at com.forgerock.openam.functionaltest.api.Realm$Builder.lambda$create$2(Realm.java:422)}}
{{ at com.forgerock.openam.functionaltest.commands.Command$1.execute(Command.java:51)}}
{{ at com.forgerock.openam.functionaltest.commands.ConfigManager.revert(ConfigManager.java:93)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.runHooks(TestBlockRunner.java:145)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.runBlockHooks(TestBlockRunner.java:131)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.run(TestBlockRunner.java:89)}}
{{ at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.run(TestBlockRunner.java:87)}}
{{ at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.run(TestBlockRunner.java:87)}}
{{ at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.run(TestBlockRunner.java:87)}}
{{ at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.run(TestBlockRunner.java:87)}}
{{ at org.forgerock.cuppa.Runner.runTests(Runner.java:195)}}
{{ at org.forgerock.cuppa.Runner.lambda$run$1(Runner.java:150)}}
{{ at org.forgerock.cuppa.internal.TestContainer.runTests(TestContainer.java:276)}}
{{ at org.forgerock.cuppa.Runner.run(Runner.java:146)}}
{{ at com.forgerock.openam.functionaltest.SurefireProvider.execute(SurefireProvider.java:151)}}
{{ at com.forgerock.openam.functionaltest.SurefireProvider.lambda$executeWithRerun$0(SurefireProvider.java:138)}}
{{ at com.forgerock.openam.functionaltest.SurefireProvider.executeWithRerunAndReports(SurefireProvider.java:156)}}
{{ at com.forgerock.openam.functionaltest.SurefireProvider.executeWithRerun(SurefireProvider.java:137)}}
{{ at com.forgerock.openam.functionaltest.SurefireProvider.invoke(SurefireProvider.java:124)}}
{{ at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:290)}}
{{ at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:242)}}
{{ at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:121)}}
{{  }}

Actions

Created Jira: AME-20231
TakeawaysFlakey test needs fixing. Also It would be helpful to have AM logs added for the image stages


01/09/2020 15:54

Date/Time

01/09/2020 15:54

Who present in meeting
Commit3a4f2d33af55fd2db7bd272a499c206247b85113
Details

Build stage failed
Click on the build "dot"
Find the Jenkins log url nexted in stack trace
Navigated to end of logs

[36mmaven_1 |[0m [INFO] ————————————————————————————————————
[36mmaven_1 |[0m [INFO] BUILD FAILURE
[36mmaven_1 |[0m [INFO] ————————————————————————————————————
[36mmaven_1 |[0m [INFO] Total time: 27:39 min
[36mmaven_1 |[0m [INFO] Finished at: 2020-08-25T09:33:38Z
[36mmaven_1 |[0m [INFO] ————————————————————————————————————
[36mmaven_1 |[0m [ERROR] Failed to execute goal org.apache.maven.plugins:maven-archetype-plugin:3.0.1:integration-test (default-integration-test) on project auth-tree-node-archetype:
[36mmaven_1 |[0m [ERROR] Archetype IT 'basic' failed: Execution failure: exit code = 1
[36mmaven_1 |[0m [ERROR] -> [Help 1]
[36mmaven_1 |[0m [ERROR]
[36mmaven_1 |[0m [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[36mmaven_1 |[0m [ERROR] Re-run Maven using the -X switch to enable full debug logging.
[36mmaven_1 |[0m [ERROR]
[36mmaven_1 |[0m [ERROR] For more information about the errors and possible solutions, please read the following articles:
[36mmaven_1 |[0m [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[36mmaven_1 |[0m [ERROR]
[36mmaven_1 |[0m [ERROR] After correcting the problems, you can resume the build with the command
[36mmaven_1 |[0m [ERROR] mvn <goals> -rf :auth-tree-node-archetype

Existing ticket - AME-18031


3a4f2d33af55fd2db7bd272a499c206247b85113
Same as above but different AM artefact

Root Cause AnalysisWhilst building the maven archetype module it couldn't download an artifact
it shouldn't be being built as its been built before hand by the maven reactor
Inconsistent which AM artefact that is being downloaded incorrectly

Actions

Response to the incident was to allow the subsequent commit to run. Further investigation is needed before the root cause can be established.

Tickets Raised:

  • AME-20191 - re-run failed builds due to dependency resolution issues downloading AM artifact incorrectly - Emma
  • AME-20192 - Instrument the build process to verify the artefacts are being stored correctly locally on filesystem - Rob
TakeawaysThinking around the tree node archetype not explicitly declaring its required dependencies
Could we remove integration test (cause) if value is not significant?




  • No labels