Skip to end of metadata
Go to start of metadata

Template

dd/MM/yyyy hh:mm

Who present in meeting


Commit
Details
Root Cause Analysis

  

Actions


Takeaways

26/10/2020 10:40

Who present in meeting
Commit89a455dc1cd and 576010836c9
Details

The commits relate to PR-12727

There are 3 failures in the image-smoke-tests pipeline stage on master:

WebAuthentication Auth Tree creates a session for the user
WebAuthentication Auth Tree selects the success outcome
WebAuthentication Auth Tree creates a session for the user

There are also 3 failures in the same test class for the PR build:

WebAuthentication Auth Tree creates a session for the user
WebAuthentication Auth Tree creates a session for the user
WebAuthentication Auth Tree logs in successfully

These failures are produced by the com.forgerock.openam.functionaltest.auth.trees.WebAuthenticationTree tests (source code).

There is a note on the PR (here) that many of these tests cases are marked as flaky.

Root Cause Analysis

These tests appear to be flaky. There is no evidence that the tests were flaky in earlier commits.

Actions

Revisited at approx. 15:00

The rerun failed again - the same tests are flaky

Decision - to revert 89a455dc1cd and re-run the image-smoke-tests in this PR

  • Wait for the new PR build to complete before deciding whether or not to revert the second (i.e. the previous) commit as well.
  • Raise a Jira issue for the flaky tests -  AME-20588 - Getting issue details... STATUS
TakeawaysCheck whether a PR contains multiple commits when investigating

21/10/2020 14:20

Who present in meeting
Commit5cc003ddb45 and b4f2fcbed09
Details

Build stage fails with the message similar to this:

Failed to execute goal on project openam-distribution-kit: Could not resolve dependencies for project org.forgerock.am:openam-distribution-kit:pom:7.1.0-SNAPSHOT: Could not find artifact org.forgerock.am:openam-idpdiscovery-war:jar:classes:7.1.0-20201021.113205-153 in forgerock-internal-snapshots (https://maven.forgerock.org/repo/internal-snapshots) -> [Help 1]
Root Cause Analysis

  Not known

Actions

  • Keep master locked until the pipeline goes green on the currently running commit
  • Raise a Jira  AME-20553 - Getting issue details... STATUS
Takeaways

21/10/2020 10:40

Who present in meeting
Commit07859d3b805
Details

The publish-artifacts stage has failed.

The jenkinslogs.txt file is empty (has zero bytes). This is a known issue with Jenkins

Root Cause Analysis

Richard Ward tells us that it is not currently possible to replay the publish-artifacts pipeline stage for any specific commit.

Exception:hudson.AbortException: script returned exit code 128

+ git tag 07859d3b80536885e64403d1cf1c209ba2d18eb2

fatal: tag '07859d3b80536885e64403d1cf1c209ba2d18eb2' already exists

script returned exit code 128

Actions

  • unlock master
TakeawaysDo not expect a replayed build to successfully publish artifacts
Commit50baa59be24 and 07859d3b805
Detailsk8s stage (PIT tests) failures
Root Cause Analysis

Suspected cause  CLOUD-2651 - Getting issue details... STATUS

Actions


TakeawaysNotify the Slack channel #plat-intg-test in the case of k8s failures


20/10/2020 16:50

Who present in meeting
Commit220af541ae0
Details

Image smoke tests fails with the message "The realm alias.localtest.me already exists"
A Jira already exists for this issue and has been reported previously.

Root Cause Analysis

see Jira issue  AME-20296 - Getting issue details... STATUS

Actions

  • Keep master locked until the pipeline goes green
Takeaways

20/10/2020 10:40

Who present in meeting
Commit7374072cdb4
DetailsOAuthSecretsApiIntegration test failure in cross-upgrade-tests-from-7.0.0 failing intermittently
Root Cause Analysis

The password policy for DS 7 has changed and the stack trace hints that the password prefix that is being used to create users has a word from server's dictionary. However, it is not clear why the test fails only intermittently. A Jira has been raised and proposed solutions captured on it.

Further investigation showed that the password that was generated for this failure contains a word that is dictionary ("barn"). Jira has been updated with the details.

Actions

  • unlock master
  • raise a Jira issue -  AME-20530 - Getting issue details... STATUS
Takeaways

16/10/2020 11:50

Who present in meeting
Commita1f2eea3651 8a30dfc1066 4dd2c176514
Details
Failed in build stage. 
Auth Tree Node Archetype maven module fails to pull maven artifacts

Stacktrace for one of them:

Failed to execute goal on project basic: Could not resolve dependencies for project archetype.it:basic:jar:0.1-SNAPSHOT: Could not find artifact org.forgerock.am:openam-license-core:jar:7.1.0-20201016.101716-151 in forgerock-internal-snapshots (https://maven.forgerock.org/repo/internal-snapshots) -> [Help 1]
[gcp-gce-jenkins-agent-1] [36mmaven_1  |[0m [INFO] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal on project basic: Could not resolve dependencies for project archetype.it:basic:jar:0.1-SNAPSHOT: Could not find artifact org.forgerock.am:openam-license-core:jar:7.1.0-20201016.101716-151 in forgerock-internal-snapshots (https://maven.forgerock.org/repo/internal-snapshots)


Pull request raised to disable the flaky integration test

https://stash.forgerock.org/projects/OPENAM/repos/openam/pull-requests/12691/overview

Ref AME-18031 which is a bug to resolve the failures. I will close this bug after merging.

AME-20192 is an improvement to fix (rather than circumvent) the issue. This will remain open

Root Cause Analysis

  openam-auth-node-archetype is known to be flaky. A PR is already in progress.

Actions

  • Remove the Auth Tree Node Archetype maven module until a resolution is found. 
  • Richard Ward is going to update the Jira details
  • Keep master locked for now
TakeawaysThere is a risk that the failing commits will lose visibility. There is a question on whether the failed commits should be replayed.

16/10/2020 10:40

Who present in meeting
Commit7c5580a516f
Details

It appears to be a flaky issue when lodestar is trying to connect to AM.
k8s stage failed with the following message:

Traceback (most recent call last):
  File "/home/jenkins/workspace/ster_OpenAM-Pipeline_master-7KWUIRPT6HNFLY2QMB7XJXASJIKIIS2JOOSQIWWLTWXAK7TL7H6Q/lodestar/spyglaas/tests/k8s/postcommit/am/test_aaa_deploy.py", line 21, in test_deploy
    get_spyglaas_run().deploy_components(profile=self.profile)
  File "/home/jenkins/workspace/ster_OpenAM-Pipeline_master-7KWUIRPT6HNFLY2QMB7XJXASJIKIIS2JOOSQIWWLTWXAK7TL7H6Q/lodestar/spyglaas/lib/spyglaas_run.py", line 203, in deploy_components
    raise Exception('Deployment failed')
Root Cause Analysis

  Possibly related to this Jira -  LODESTAR-486 - Getting issue details... STATUS

Actions

  • unlock master
TakeawaysWe believe it is not necessary to lock master for k8s and pit1 stages. See note on 25/09/2020

15/10/2020 15:55

Who present in meeting
Commit9eb753df9ee
Details

amster-config-upgrader-tests stage failed


Root Cause Analysis

  UpgradeTest#saml-entities-to-secrets fails intermittently. It appears to be deleting an object that does not exist and a ConcurrentModificationException is thrown


Actions

  • master is locked
  • raise a Jira ticket - AME-20495 - Getting issue details... STATUS
  • wait for the next pipeline to succeed

Takeaways

15/10/2020 10:40

Who present in meeting
Commitea2fc6d5ffa
Details

According to the temper dashboard, the first stage of the pipeline "build" is still IN_PROGRESS

The Jenkins server shows that the build failed: https://ci.forgerock.org/job/OpenAM-master/job/OpenAM-Pipeline/job/master/4849/

This failing build was replayed (i.e. for the same commit) and the next Jenkins pipeline was successful.

Root Cause Analysis

The pipeline stage image-smoke-tests gives a stage exception:   

Exception: java.lang.IllegalStateException: Stage Exception at Script3.compose(Script3.groovy:384)

Similar failures have been occurring in PR builds in the last few days. Here is an example root cause:

[36mmaven_1  |[0m [ERROR] Failed to execute goal on project openam-logback: Could not resolve dependencies for project org.forgerock.am:openam-logback:jar:7.1.0-SNAPSHOT: Failed to collect dependencies at org.forgerock.opendj:opendj-server:jar:7.0.0 -> org.apache.jclouds.api:s3:jar:2.2.0 -> org.apache.jclouds.api:sts:jar:2.2.0 -> org.apache.jclouds:jclouds-core:jar:2.2.0 -> com.google.inject:guice:jar:3.0: Failed to read artifact descriptor for com.google.inject:guice:jar:3.0: Could not transfer artifact com.google.inject:guice:pom:3.0 from/to maven-central-remote (http://maven.forgerock.org:80/repo/maven-central-remote/): /home/jenkins/.m2/repository/com/google/inject/guice/3.0/guice-3.0.pom.part (No such file or directory) -> [Help 1]

Actions

  • unlock master
  • raise a Jira issue for the IllegalStateException stage exceptions  AME-20553 - Getting issue details... STATUS
  • inform Maxwell that there is an issue with the temper dashboard reporting for the failed image-smoke-tests (the link to the build is not correct)
Takeaways

14/10/2020 10:40

Who present in meeting
Commit92f88bceca6742bd1ddc6d5f1a14e3a9faa5eb36
DetailsLdapTree functional test failure - LDAP tree creates a session for the user
Root Cause Analysis

Flaky test. The same test was passing before the branch was merged into master (PR: https://stash.forgerock.org/projects/OPENAM/repos/openam/pull-requests/12141/overview).

Test report: https://qa.forgerock.com/am/master/92f88bceca6742bd1ddc6d5f1a14e3a9faa5eb36/d346f8d9-b279-4d32-8af4-345cc18e2599/functional-tests/functional-tests-report/FunctionalTestsReportReport/class-com.forgerock.openam.functionaltest.auth.trees.LdapTree.html#failed-tests

FAILED LDAP tree creates a session for the user
when in a sub-realm
when a tree is configured to authenticate with LDAP
when the password has expired and there is a grace period
when the user enters a new password and confirms it
when the user logs in with the updated password
....
org.junit.ComparisonFailure: [Login state] expected:<[SUCCESS]> but was:<[FAIL]>
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at com.forgerock.openam.functionaltest.api.authentication.process.AuthenticationClient.expectState(AuthenticationClient.java:251)
	at com.forgerock.openam.functionaltest.api.authentication.process.AuthenticationClient.expectSuccess(AuthenticationClient.java:196)
	at com.forgerock.openam.functionaltest.auth.trees.LdapTree.<cuppa test>(LdapTree.java:642)

Actions

  • master is locked
  • kick off a new build
  • record the failure on the existing Jira ticket →  AME-20476 - Getting issue details... STATUS
Takeaways

13/10/2020 09:50

Who present in meeting
Commitb8aa24ce67b200f2b41340673eed85d58db5c453
Details

The commit prior to the failing commit b8aa24ce67b200f2b41340673eed85d58db5c453 was f0d1946a6610cbd073a3b8189aceb044bcfd2218

The build for commit f0d1946a6610cbd073a3b8189aceb044bcfd2218 failed early and was replayed.

The following Jira has been raised for this:  AME-20495 - Getting issue details... STATUS

Root Cause Analysis

There is a single test failure but it is not known if this is a genuine failure or a flaky test.

Test report: https://qa.forgerock.com/am/master/b8aa24ce67b200f2b41340673eed85d58db5c453/ec8aef09-6974-43b5-aac8-f41d1342506b/amster-config-upgrader-tests/AmsterConfigUpgraderTestsReport/class-com.forgerock.openam.amster.upgradester.UpgradeTest.html#00ebc085adab450f44d6cd963181eeefe8d3836e2db70bb73b754819d924522f

2020-10-13 01:48:36,245 [SystemTimerPool] [] ERROR c.i.s.l.e.LDAPv3PersistentSearch - Unable to start persistent search for baseDN dc=openam,dc=forgerock,dc=org: 
Operation failed:
Result Code: Connect Error
Diagnostic Message: No operational connection factories available
Matched DN: 
2020-10-13 01:48:57,445 [smIdmThreadPool-8] [cc67ab22-c193-4477-827d-da27d7087a7f-187860] ERROR o.f.o.s.p.PushNotificationServiceConfigHelperFactory - Unable to retrieve instance of the ServiceConfig for realm /.
2020-10-13 01:48:57,445 [smIdmThreadPool-8] [cc67ab22-c193-4477-827d-da27d7087a7f-187860] ERROR o.f.o.s.p.PushNotificationService - Unable to update preferences for organization /
org.forgerock.openam.services.push.PushNotificationException: Unable to find config for PushNotificationConfig.
	at org.forgerock.openam.services.push.PushNotificationService.getConfigHelper(PushNotificationService.java:284)

  

Actions

Create a dummy PR to re-run the failing stage only: https://stash.forgerock.org/projects/OPENAM/repos/openam/pull-requests/12650/overview

Wait for the PR build to complete before deciding whether or not to revert the suspect commit. 

Takeaways

12/10/2020 15:00

Who present in meeting
Commitb52d33489792dc4f7641dc9dfdd9eadf9c15a020
DetailsException: java.lang.IllegalStateException: Stage Exception at Script3.compose(Script3.groovy:384)
Root Cause Analysis

It appears that temper deployed and configured AM but then could not login.
The file functional-tests-5/jenkinslogs.txt  was empty.

Found the following in functional-tests-5/full-log.txt

2020-10-12 13:52:29,926 [main] INFO com.forgerock.openam.functionaltest.setup.Configure - Configuration complete!
2020-10-12 13:52:32,239 [main] ERROR com.forgerock.openam.functionaltest.deployment.implementations.AdminTokenManager - Unable to get Token with login. Response status: [Status: 500 Internal Server Error] Response text: {"code":500,"reason":"Internal Server Error","message":"Authentication Error!!"}
2020-10-12 13:52:32,289 [main] ERROR com.forgerock.openam.functionaltest.deployment.implementations.AdminTokenManager - Unable to get Token with login. Response status: [Status: 500 Internal Server Error] Response text: {"code":500,"reason":"Internal Server Error","message":"Authentication Error!!"}
2020-10-12 13:52:32,350 [main] ERROR com.forgerock.openam.functionaltest.deployment.implementations.AdminTokenManager - Unable to get Token with login. Response status: [Status: 500 Internal Server Error] Response text: {"code":500,"reason":"Internal Server Error","message":"Authentication Error!!"}
2020-10-12 13:52:32,409 [main] ERROR com.forgerock.openam.functionaltest.deployment.implementations.AdminTokenManager - Unable to get Token with login. Response status: [Status: 500 Internal Server Error] Response text: {"code":500,"reason":"Internal Server Error","message":"Authentication Error!!"}
2020-10-12 13:52:37,449 [Thread-0] INFO com.forgerock.openam.functionaltest.deploy.CargoContainer - Container has shutdown within the configured time

Actions

  • kick off Jenkins pipeline with the aim of promoting the build
  • unlock master if the next build passes
Takeaways

12/10/2020 10:40

Who present in meeting
Commit702e49a68771895afeae96a2ce25a0a196ea5538
Details

Failing test: com.forgerock.openam.functionaltest.auth.trees.LdapTree

The test failure could not be reproduced locally.

The test passed on the PR build.

Root Cause AnalysisBelieved to be a flaky test. 
org.junit.ComparisonFailure: [Login state] expected:<[SUCCESS]> but was:<[FAIL]>
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at com.forgerock.openam.functionaltest.api.authentication.process.AuthenticationClient.expectState(AuthenticationClient.java:251)
	at com.forgerock.openam.functionaltest.api.authentication.process.AuthenticationClient.expectSuccess(AuthenticationClient.java:196)
	at com.forgerock.openam.functionaltest.auth.trees.LdapTree.<cuppa test>(LdapTree.java:640)


Actions

  • unlock master
  • kick off Jenkins pipeline with the aim of promoting the build
  • raise AME issue for the flaky test -  AME-20476 - Getting issue details... STATUS
Takeaways

09/10/2020 12:05

Who present in meeting
Commit09c17227ec6d780b9d2300cf76c26abdef2a4203
Details
*** Elastic Search Send ***
url: http://elasticsearch.temper.internal.forgerock.com:9200/runs/stage/AXUMa2amxT1HpMvxv57d/_update
json: {"doc":{"endTime":"2020-10-09T10:43:58.292Z","status":"SUCCESS","primaryReport":"","reports":{"jenkinslogs.txt":"https://qa.forgerock.com/am/master/09c17227ec6d780b9d2300cf76c26abdef2a4203/bbc9ba1d-b757-46e5-8da7-ec44f1bf913a/checkstyle/jenkinslogs.txt"}}}
Exception occurred: 
 java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302)
Root Cause Analysis

  Elasticsearch unreachable

Actions

  • keep master locked
  • wait for the next build to run
  • check with team maxwell
Takeaways

07/10/2020 16:50

Who present in meeting
Commit709a7b5327a
DetailsBuild stuck in waiting state before failing with 
java.lang.InterruptedException
Root Cause Analysis

Elasticsearch was unreachable for some time during the investigation. Upon being able to make a connection, it was in a yellow state.

Actions

Await the completion of elasticsearch recovery and re-run the commit on master before unlocking.

TakeawaysThere is no timeout set on theHttpURLConnection used to post to elastic search. The interrupted exception may be thrown by Jenkins when the execution of the post fails to return. A PR has been created to try adding a timeout.
https://stash.forgerock.org/projects/RE/repos/groovy-pipeline-libs/pull-requests/44/overview

06/10/2020 16:20

Who present in meeting
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/0ed5d46d8f
Details

Build stage broken

Root Cause Analysis

16:15:16 [gcp-gce-jenkins-agent-2] [36mmaven_1 |[0m [INFO] [ERROR] Failed to execute goal on project basic: Could not resolve dependencies for project archetype.it:basic:jar:0.1-SNAPSHOT: The following artifacts could not be resolved: org.forgerock.am:openam-license-servlet:jar:7.1.0-SNAPSHOT, org.forgerock.am:openam-exceptions:jar:7.1.0-SNAPSHOT: Could not find artifact org.forgerock.am:openam-license-servlet:jar:7.1.0-20201006.145343-125 in forgerock-internal-snapshots (https://maven.forgerock.org/repo/internal-snapshots) -> [Help 1] 16:15:16 [gcp-gce-jenkins-agent-2] [36mmaven_1 |[0m [INFO] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal on project basic: Could not resolve dependencies for project archetype.it:basic:jar:0.1-SNAPSHOT: The following artifacts could not be resolved: org.forgerock.am:openam-license-servlet:jar:7.1.0-SNAPSHOT, org.forgerock.am:openam-exceptions:jar:7.1.0-SNAPSHOT: Could not find artifact org.forgerock.am:openam-license-servlet:jar:7.1.0-20201006.145343-125 in forgerock-internal-snapshots (https://maven.forgerock.org/repo/internal-snapshots) 16:15:16 [gcp-gce-jenkins-agent-2] [36mmaven_1 |[0m [INFO] at org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.getDependencies (LifecycleDependencyResolver.java:269)

16:15:16
[gcp-gce-jenkins-agent-2] [36mmaven_1 |[0m [ERROR] Failed to execute goal org.apache.maven.plugins:maven-archetype-plugin:3.0.1:integration-test (default-integration-test) on project auth-tree-node-archetype: 16:15:16 [gcp-gce-jenkins-agent-2] [36mmaven_1 |[0m [ERROR] Archetype IT 'basic' failed: Execution failure: exit code = 1

Actions

Bug already raised AME-18031

Takeaways

05/10/2020 12:45

Who present in meetingAlun, Isaac, Phil A
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/13e0cff46eb40c2c50a92b4982aeb1268e3a4469
Detailsjavadoc and checkstyle stage failures
Root Cause Analysis

  [gcp-gce-jenkins-agent-5] Exception occurred:
[gcp-gce-jenkins-agent-5] java.lang.InterruptedException
[gcp-gce-jenkins-agent-5] at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302)
[gcp-gce-jenkins-agent-5] at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275)
[gcp-gce-jenkins-agent-5] at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111)
[gcp-gce-jenkins-agent-5] at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:248)
[gcp-gce-jenkins-agent-5] at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:237)
[gcp-gce-jenkins-agent-5] at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:298)
[gcp-gce-jenkins-agent-5] at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:67)
[gcp-gce-jenkins-agent-5] at org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:264)
[gcp-gce-jenkins-agent-5] at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:247)
[gcp-gce-jenkins-agent-5] at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:180)
[gcp-gce-jenkins-agent-5] at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48)
...

...


Seems like issue with Jenkins. Seeing similar error on PR build happening at the same time 
https://ci.forgerock.org/job/OpenAM-master/job/OpenAM-Pipeline/job/PR-11119/68/console


InterruptedException likely caused by resource limits on ElasticSearch being exhausted by Maxwell team during FireStore migration.

Actions

Kill current build and kickoff another one. Wait until we get some green dots on the build pipeline before unlocking master.

Takeaways


02/10/2020 07:40

Who present in meeting
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/7ea3c53727096e7531085e84f5603338d8e67c9e
Details

amster-config-upgrader-tests - single test reported as failing: "When upgrading from 5.5.0"  AME-20394 - Getting issue details... STATUS

Build ran twice for this commit - On the dashboard he failure was only seen in the second run, however if the reports for both runs are looked at the first one reports that failure.

Root Cause Analysis

Flakey reporting in The Pipeline Dashboard - The incorrect report has been displayed.

Actions

Revisited at 15:23

Rerun failed again - flakiness of pipeline

Decision - to revert ea3c53727096e7531085e84f5603338d8e67c9e - and run pit1 on its PR

Unlock after "revert" run - if a successful pit1

Edit 05/10/2020 12:07: GlobalDataStoreServiceTest in amster-tests was shown to be flaky once master was reverted, raised AME-20441

Takeaways-

30/09/2020 20:19

Who present in meeting
Commit7ea2b9d661f19e40e81148fc439188f41f17dedf
DetailsExceptions in the functional-tests-1 & functional-tests-3 stages
Root Cause Analysis

Both stages completed successfully, but then exceptions were thrown while trying to push data to Elasticsearch.

[gcp-gce-jenkins-agent-5] json: {"doc":{"endTime":"2020-09-30T17:43:28.392Z","status":"SUCCESS", (rest of json removed)
[Pipeline] [gcp-gce-jenkins-agent-5] echo
[Pipeline] [gcp-gce-jenkins-agent-5] echo
[gcp-gce-jenkins-agent-5] Exception occurred:
[gcp-gce-jenkins-agent-5] java.lang.InterruptedException


Upon checking the Elasticsearch cluster health, it was in a red state, with a new container starting within the last hour. The cluster health had improved to yellow during this post mortem.

Actions

Master to remain locked while elasticsearch recovers to a green state.

Takeaways

30/09/2020 16:30

Who present in meeting
Commit

Error seen on commit 00cd223e55ec1384168cbf4d147c37fec71087e1 however likely caused by an earlier commit.

Details

Stage 00cd is showing the test failures for the SocialAuthNodeTest. This stage however is not changing code that can affect this commit. Instead it is addressing an issue with the after-hook processing logic for the pipeline. This error was likely preventing stages from uploading the test results to the QA bucket.

Root Cause Analysis

The failures present on the functional-test run are a repeated flaky test which was originally captured 07/09/2020 11:00 in commit 37d4aa2551b41c45f7719202fefe2b16e84fd122.

Test reports for comparison:

There is no further action to take at this stage as this is already captured by a ticket for the DRIVE team to investigate.

Actions

Action AME-20294 has already been raised to track this issue.

Master will now be unlocked.

Takeaways


30/09/2020 15:30

Who present in meetingAlun, Pete, Andy
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/5bb62adb4c75df1f2480ab3e8d767b259adf8498
DetailsFailure in build stage
Root Cause Analysis


 [gcp-gce-jenkins-agent-2] ^[[36mmaven_1  |^[[0m [INFO] ------------------------------------------------------------------------
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [INFO] BUILD FAILURE
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [INFO] ------------------------------------------------------------------------
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [INFO] Total time: 30:34 min
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [INFO] Finished at: 2020-09-30T14:21:23Z
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [INFO] ------------------------------------------------------------------------
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [ERROR] Failed to execute goal org.apache.maven.plugins:maven-archetype-plugin:3.0.1:integration-test (default-integration-test) on project auth-tree-node-archetype:
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [ERROR] Archetype IT 'basic' failed: Execution failure: exit code = 1
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [ERROR] -> [Help 1]
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [ERROR]
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [ERROR] Re-run Maven using the -X switch to enable full debug logging.
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [ERROR]
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [ERROR] For more information about the errors and possible solutions, please read the following articles:
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [ERROR]
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [ERROR] After correcting the problems, you can resume the build with the command
[gcp-gce-jenkins-agent-2] ^[[36mmaven_1 |^[[0m [ERROR] mvn <goals> -rf :auth-tree-node-archetype
[gcp-gce-jenkins-agent-2] ^[[36mmaster-4811-2-build-1601473774375-maven_maven_1 exited with code 1

Actions

AME-18031 - Getting issue details... STATUS  already exists to cover this

TakeawaysPlugin works a charm

30/09/2020 10:45

Who present in meetingEmma, Gabor, Pete, Phil, Rich, Andrew, Andy
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/e77dcaecdc1250a167f671764234b699b29a29d6
DetailsPublish artifacts stage showing as skipped even though the FTs are still showing as in progress
Root Cause Analysis

There are two issues here

  1. There were failures in the functional tests - a recurrence of TestSessionResourceV2V3AndV4
  2. An error in the execution of the after hook which locks master, causing the subsequent after hook which updates the stage status to not run

Actions

Takeaways

30/09/2020 07:30

Who present in meeting

Andrew, Pete

Commit
Details

Exception: Publish Artifacts

Exception: java.lang.NullPointerException: Cannot get property 'stageName' on null object

Maven 3.2.5
— Use a tool from a predefined Tool Installation
Cannot get property 'stageName' on null object
https://ci.forgerock.org/blue/organizations/jenkins/OpenAM-master%2FOpenAM-Pipeline/detail/master/4807/pipeline/10965
Root Cause AnalysisAn oversight in update to pipeline libs - failed to remove an unnecessary line

Actions

Takeaways

Nothing new

Stage does not run on pull requests

25/09/2020 09:15

Who present in meeting
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/381841a1c83d2c0af80ff24a5ebc3274e1ab5c54
DetailsK8S stage failure
Root Cause Analysis

INFO [loop_until]: Function succeeded after 10m 12s (rc=0) - failed to find expected pattern: .*readyReplicas:1.*replicas:1.* - retry ERROR [loop_until]: Function does not return expected pattern (.*readyReplicas:1.*replicas:1.*) DEBUG --- stdout --- DEBUG map[conditions:[map[lastTransitionTime:2020-09-25T01:06:47Z lastUpdateTime:2020-09-25T01:06:47Z message:Deployment does not have minimum availability. reason:MinimumReplicasUnavailable status:False type:Available] map[lastTransitionTime:2020-09-25T01:16:48Z lastUpdateTime:2020-09-25T01:16:48Z message:ReplicaSet "am-f6cd79d56" has timed out progressing. reason:ProgressDeadlineExceeded status:False type:Progressing]] observedGeneration:1 replicas:1 unavailableReplicas:1 updatedReplicas:1]

Actions

No action, this is considered to be a flaky stage but also outside of set of stages that should be considered post mortem

TakeawaysK8s and pit1 stages do not need to be considered for post mortem or locking of master; if there was to become multiple fails in a row, then a post mortem should be considered

24/09/2020 16:37

Who present in meetingAlun, Pete R, Kevin, Rich, Andy F, Mark L
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/189386a7a5139f0ca7672f6bf2b29385f6731dfb
Detailsfailure in build stage
Root Cause Analysis[gcp-gce-jenkins-agent-1] [36mmaven_1 |[0m [INFO] [ERROR] Failed to execute goal on project basic: Could not resolve dependencies for project archetype.it:basic:jar:0.1-SNAPSHOT: Failed to collect dependencies at org.forgerock.am:openam-core:jar:7.1.0-SNAPSHOT -> org.forgerock.am:openam-idsvcs-schema:jar:7.1.0-SNAPSHOT: Failed to read artifact descriptor for org.forgerock.am:openam-idsvcs-schema:jar:7.1.0-SNAPSHOT: Could not find artifact org.forgerock.am:openam-schema:pom:7.1.0-20200924.145523-100 in forgerock-internal-snapshots (https://maven.forgerock.org/repo/internal-snapshots) -> [Help 1]

Doesn't seem from observing artifactory that that artifact does exist https://maven.forgerock.org/repo/internal-snapshots/org/forgerock/am/openam-schema/7.1.0-SNAPSHOT/

This seems to be the same issue as observed in the post mortem on 01/09/2020 15:54.


Actions

AME-18031 - Getting issue details... STATUS  is already created. Adding the DRIVE label and removing Maxwell.

Master has not been locked, we will not lock it and await the next build results as this issue is anticipated to resolve itself in the next build.

TakeawaysAbove issue should be escalated

24/09/2020 07:10

Who present in meeting
Commit
Details

Exception: publish-artifacts

The debug message associated with the change can be seen correctly in the Jenkins log.

Other runs did not execute publish-artifacts because of prior failures in publish artifacts.  For some strange reason publish-artifacts did not run for the breaking change - one would expect this exception to have been seen there.  The stage was skipped.

Root Cause Analysis

publish-artifacts/jenkinslogs.txt is empty


From: https://ci.forgerock.org/blue/organizations/jenkins/OpenAM-master%2FOpenAM-Pipeline/detail/master/4793/pipeline/9858

Shell Script >
+ git checkout -b master origin/master
fatal: A branch named 'master' already exists.
script returned exit code 128

Actions

Investigate why the publish-artifacts was skipped on the breaking run - It should have run if all the mandatory stages passed (Jira to be created by Pete Rogers)

Revert the change - https://stash.forgerock.org/projects/OPENAM/repos/openam/pull-requests/12493/diff

Consider add the ability to test such changes e.g. in a sandbox if this becomes a common occurrence 

Reopen master

Takeaways

23/09/2020 17:05

Who present in meeting
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/9200808dd24adc68d29c7560019425a7d9e53391
DetailsRecurrence of TestSessionResourceV2V3AndV4
Root Cause Analysis

Actions

None -  AME-20280 - Getting issue details... STATUS  investigation/fix in progress

Takeaways

23/09/2020 12:25

Who present in meeting
Commit

https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/f28cca1c7df0e0be52302979821840704bc0a5ce (Add script for creating IdCloud branches)

N.B. This commit did not appear on the dashboard https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/df16d0bba412008992c49e22bab9df48ed85390b (Add license header)

Details

amster config upgrader test failure:

"com.forgerock.openam.amster.upgradester.UpgradeTest Configuration Upgrader 1 failed Amster upgrade succeeds when Upgrading from 5.5.0 when in subrealmsandpolicies test case"

Root Cause Analysis

Flakey test

The changes are unrelated to the failing test.

Actions

Passes in subsequent pipeline run.

AME-20394 - Getting issue details... STATUS

TakeawaysThis investigation was hampered by Elastic search not providing enough information i.e. not functioning.  Starting at the end of this sprint Elastic search will be migrated to Firestorm for more stability in the very near future.

23/09/2020 10:45

Who present in meetingAlun, Rob, Dipu, Andrew V, Kevin, Pete
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/f3077961b992ca57b5e37ed9e0370c7da07257aa
Details

Failing imagine smoke test: TestRESTRealmPut

java.lang.RuntimeException: Failed to create realm
	at com.forgerock.openam.functionaltest.api.Realm$Builder.create(Realm.java:413)
	at com.forgerock.openam.functionaltest.config.IDMConfigImpl.createRealm(IDMConfigImpl.java:67)
	at com.forgerock.openam.functionaltest.restcommon.realm.TestRESTRealmPut.setup(TestRESTRealmPut.java:76)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.forgerock.openam.functionaltest.cuppa.TemperTestInstantiator.invokeMethod(TemperTestInstantiator.java:246)
	at com.forgerock.openam.functionaltest.cuppa.TemperTestInstantiator.<cuppa test>(TemperTestInstantiator.java:154)
Caused by: org.forgerock.http.protocol.ResponseException: Got unsuccessful response: 409 Conflict
{"code":409,"reason":"Conflict","message":"The realm TestRESTRealmPut already exists."}
when attempting to create http://am.localtest.me:8080/am/json/global-config/realms [1.0]: {name=TestRESTRealmPut, parentPath=/, active=true, aliases=[]}
	at com.forgerock.openam.functionaltest.HttpUtils.isSuccessful(HttpUtils.java:76)
	at com.forgerock.openam.functionaltest.CrestClient.create(CrestClient.java:87)
	at com.forgerock.openam.functionaltest.api.Realm$Builder.create(Realm.java:408)
	... 8 more


java.lang.RuntimeException: Failed to delete realm
	at com.forgerock.openam.functionaltest.api.Realm.delete(Realm.java:195)
	at java.base/java.util.Optional.ifPresent(Optional.java:183)
	at com.forgerock.openam.functionaltest.restcommon.realm.TestRESTRealmPut.testPutNewRealm(TestRESTRealmPut.java:100)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.forgerock.openam.functionaltest.cuppa.TemperTestInstantiator.invokeMethod(TemperTestInstantiator.java:246)
	at com.forgerock.openam.functionaltest.cuppa.TemperTestInstantiator.<cuppa test>(TemperTestInstantiator.java:178)
Caused by: org.forgerock.http.protocol.ResponseException: Got unsuccessful response: 400 Bad Request
{"code":400,"reason":"Bad Request","message":"Such node does not exist in the directory server."}
when attempting to read http://am.localtest.me:8080/am/json/global-config/realms/L3Rlc3RQdXROZXdSZWFsbUtEWEZ2 [1.0]
	at com.forgerock.openam.functionaltest.HttpUtils.isSuccessful(HttpUtils.java:76)
	at com.forgerock.openam.functionaltest.CrestClient.delete(CrestClient.java:245)
	at com.forgerock.openam.functionaltest.CrestClient.delete(CrestClient.java:218)
	at com.forgerock.openam.functionaltest.api.Realm.delete(Realm.java:192)
	... 8 more
Root Cause Analysis

Test passed in PR - seems to be a flaky test failure that already seems to be covered by a DRIVE JIRA  AME-20296 - Getting issue details... STATUS


Actions

Unlock master

TakeawaysNeed to address that existing Jira so we don't fall foul of the same issue in the future

21/09/2020 10:35

Who present in meetingAlun, Andrew F, Dipu, Jay, Kajetan, Mark L, Michael Carter, Ravi, Gabor, Rich W
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/ebb287c18edf9edfffc19c9039e2ad355d5b8ad7

However, there were hidden commits that did not appear on the dashboard prior to this commit. 
https://ci.forgerock.org/blue/organizations/jenkins/OpenAM-master%2FOpenAM-Pipeline/detail/master/4781/changes

The commit responsible for the failure could therefore have been anyone of these:
https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/ebb287c18edf9edfffc19c9039e2ad355d5b8ad7
https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/32a20cc2219d618c5b4051487a0661515fa308c2
https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/1ce940e79e92aded40ea7fa630f5f739419af4e7
https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/9a2b1cc8c0cfac1d6dca218ae79da3baf2a4ba55
https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/723e746fed2785418ce9bf915fd150a9d41d3c83
Detailspublish-artifacts stage failure 
Root Cause Analysis
[Pipeline] [gcp-gce-jenkins-agent-2] unstash
[Pipeline] [gcp-gce-jenkins-agent-2] unstash
[Pipeline] [gcp-gce-jenkins-agent-2] unstash
[Pipeline] [gcp-gce-jenkins-agent-2] sh
[gcp-gce-jenkins-agent-2] + docker pull gcr.io/forgerock-io/am/docker-build:7.1.0-ebb287c18edf9edfffc19c9039e2ad355d5b8ad7
[gcp-gce-jenkins-agent-2] Error response from daemon: manifest for gcr.io/forgerock-io/am/docker-build:7.1.0-ebb287c18edf9edfffc19c9039e2ad355d5b8ad7 not found
[Pipeline] [gcp-gce-jenkins-agent-2] }
[Pipeline] [gcp-gce-jenkins-agent-2] // withEnv
[Pipeline] [gcp-gce-jenkins-agent-2] }
[Pipeline] [gcp-gce-jenkins-agent-2] // node
[Pipeline] [gcp-gce-jenkins-agent-2] }
[Pipeline] [gcp-gce-jenkins-agent-2] // stage
[Pipeline] [gcp-gce-jenkins-agent-2] }
[Pipeline] [gcp-gce-jenkins-agent-2] // timeout
[Pipeline] [gcp-gce-jenkins-agent-2] echo
[gcp-gce-jenkins-agent-2] Exception occurred:
[gcp-gce-jenkins-agent-2] hudson.AbortException: script returned exit code 1
[gcp-gce-jenkins-agent-2] at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.handleExit(DurableTaskStep.java:558)
[gcp-gce-jenkins-agent-2] at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:504)
[gcp-gce-jenkins-agent-2] at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:450)
[gcp-gce-jenkins-agent-2] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[gcp-gce-jenkins-agent-2] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[gcp-gce-jenkins-agent-2] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[gcp-gce-jenkins-agent-2] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[gcp-gce-jenkins-agent-2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[gcp-gce-jenkins-agent-2] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[gcp-gce-jenkins-agent-2] at java.lang.Thread.run(Thread.java:748)

Appears as though the publish-artifacts stage is still trying to pull the am Docker image which no longer exists.

Actions

Update publish-artifacts.groovy to remove pulling am Docker image. AME-20378 - Getting issue details... STATUS

Takeaways
  • Jenkins log file was v. large and difficult to pull/search in order to find issue → possible to fragment logs or at least be able to open in browser?
  • Awareness of pipeline stages that may need to be updated given certain AM modifications may not have been present


14/09/202 15:09

Who present in meetingDavid L, Rob W, Pete, Jay B
Commitf5c396902fe755a2b6415961403ecef9420e118b
DetailsThe build stage has failed.
Root Cause Analysis

The Build stage has failed with the exception:

Exception: hudson.AbortException: script returned exit code 1 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.handleExit(DurableTaskStep.java:558)
 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:504)
 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:450)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Primary Report: See more logs at 
https://qa.forgerock.com/am/master/f5c396902fe755a2b6415961403ecef9420e118b/343813e9-8727-4f7f-a221-82d94b3cdf68/build/jenkinslogs.txt ... 
(23 lines excluded) ... + docker ps --filter label=1 CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES + 
docker container prune --force --filter label=1 Total reclaimed space: 0B + docker ps --filter label=1 
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES script returned exit code 1
Exception occurred: hudson.AbortException: script returned exit code 1
 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.handleExit(DurableTaskStep.java:558)
 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:504)
 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:450)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748) + curl --show-error --silent -u **** 
'https://ci.forgerock.org/blue/rest/organizations/jenkins/pipelines/OpenAM-master/pipelines/OpenAM-Pipeline/branches/master/runs/4771/nodes/?limit=10000' + mkdir 2064b2e3-6d44-4764-b8ad-8fcdb301ede6 + curl 
--show-error --silent -u **** 
'https://ci.forgerock.org/blue/rest/organizations/jenkins/pipelines/OpenAM-master/pipelines/OpenAM-Pipeline/branches/master/runs/4771/nodes/1183/log/?start=0'

The build stage on PR#11539 was successful. We have noticed that the PR was run with the configuration:

CI Build Configuration
mock-build=true
fast-mode=true
only-stages=ui-admin-smoke-tests,mandatory-coverage

The change included in the PR attempts to copy the stashed m2 files into every stages input directory. This was run on a PR with mock-build, which skips the build stage, and uses stashes from a previous run for subsequent stages. On the merge to master, this tries to copy the m2 stash into the input directory of the build stage. At this point, the m2 stash does not exist, and because cp command exits with an error code if there is nothing to copy, an exception is thrown.

Actions

Revert commit from master

Takeaways

The mock-build=true configuration broke the build in the jenkins scripts.

The build could add a comment and task to the PP in stash that will warn the developer of the risks.  AME-20337 - Getting issue details... STATUS


11/09/2020 16:50

Who present in meetingJay, Andy, Isaac, Pete R, Luna
Commit 469cf832d38
DetailsFailure in the functional-tests stage test classes: TestSetDynamicAttributes
Root Cause Analysis

Looks like a Miranda timeout as the mock push service is not indicating that it has received a message. The commit that triggered this does not appear to be directly related to the failure we are seeing.

Failure seen in functional tests https://temper-dashboard.engineering.forgerock.com/openam/master/functional/test-run/469cf832d38f4e5a908d20e4c648176e017f4c6e/functional-tests

The test history does not seem to show this test as being flaky, however the run on the PR prior to merging was successful and local runs during the post mortem were also successful.

The test has succeeded in isolation locally, when looped in temper and run multiple times against a locally deployed AM. The stage was also green prior to merging the PR.

Actions

PR 12419 has been created to re-run the functional-tests stage. If it fails, the commit will be reverted, if it succeeds, an investigation will be needed into the flakiness of the test.

Takeaways


10/09/2020 16:45

Who present in meetingJay, Rob, Isaac, Pete R, Mark L, Luna, Phill C
Commita1bfda21dc8
DetailsFailure in the functional-tests stage test classes: PushAuthenticationSenderNode, PushResultVerifierTest
Root Cause Analysis

Looks like a Miranda timeout as the mock push service is not indicating that it has received a message. The commit that triggered this does not appear to be directly related to the failure we are seeing.

Failure seen in functional tests https://temper-dashboard.engineering.forgerock.com/openam/master/functional/test-run/a1bfda21dc848bcd0dac3274697efc473953138f/functional-tests

PushAuthenticationSenderNode and PushResultVerifierTest both timed out while awaiting a state change to be witnessed by Miranda. The build that indicated this failure was not displayed to us in the dashboard due to failures in bitbucket. Instead, investigation focused on PRs that had been merged to master recently, and it was noted that the commit o3ceae saw similar failures in the functional-test stages when run by Jenkins.

Actions

Reverted commit 03ceae73a16 with PR-12352.

Master has been left locked while we await this commit to be built and verify the issue is resolved.

Takeaways

Additional scrutiny should be applied to checking that Jenkins has approved a given commit. This is especially important in situations where build failures have been frequent and it has taken some time to get the Jenkins approval, and when multiple other individuals have approved the PR.

When commit 03ceae73a16 was merged to master no Jenkins Post-Commit build was run. The next run failed at the functional-test stage due to Jenkins restart. Subsequent commits were also missed by Jenkins again further delaying visibility of the issue.


10/09/2020 16:42

Who present in meetingJay, Rob, Isaac, Pete R, Mark L, Luna
Commita1bfda21dc8
DetailsFailure in the image-smoke-tests stage test class JwtBearerTokenEndpoint
Root Cause AnalysisSee RCA on incident report "07/09/2020 15:49"

Actions

Raised the JIRA that was previously created and attempted mitigation back into process.
Takeaways

This failure has been seen three times in the last 48 hours and needs addressing.

An attempt to mitigate this issue was committed in 161065b9785 which does not appear to have worked. Further investigation may be required, a potential solution would be to replace the hard-coded 'alias.localtest.me' with '<random string>.localtest.me' though this may simply hide the underlying issue of realms not being deleted before subsequent tests are run.


09/09/2020 09:51

Who present in meetingJay, Isaac, Rob, Rich, Andy
Commit9bf21928638
Details

14 functional tests marked as failed.

Test class: TestSessionResourceV2V3AndV4

SessionResource endpoint version 4_0 returns valid:false in JSON with HTTP 200
when using stateless Sessions
when getSessionInfoAndResetIdleTime is called after admin has logged the user out
when an admin session ID is sent along with the request in a cookie
Root Cause AnalysisBelieved to be a repeating flakey test. Failure observed on master at 03/09/2020 09:30. Also observed recently on a pull request.

Actions

Tracked under AME-20214.

TakeawaysGiven the test is flaky, the recommendation is to allow pipeline to continue.

08/09/2020 12:44

Who present in meetingIsaac, Luna, Jay, Kevin
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/d96990b1e478f9e3b6c442f5bb5fbd9ff4061d61
DetailsFailure in the image-smoke-tests stage test class JwtBearerTokenEndpoint
Root Cause AnalysisSee RCA on incident report "07/09/2020 15:49"

Actions

Priority raised.

TakeawaysThis failure has been seen twice in the last 24 hours and probably needs addressing.

08/09/2020 10:35

Who present in meetingIsaac, Luna, Kevin, Jay, Rob, Andy F, Richard
Commit9e89805d62c
Details

InterruptedException in JavaDoc build.

java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:248) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:237) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:298) at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:67) at org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:264) at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:247) at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:180) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113) at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20) at com.forgerock.pipeline.reporting.ElasticSearchPipelineRunStorage.updateRunStage(file:/var/lib/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/reporting/ElasticSearchPipelineRunStorage.groovy:94) at com.forgerock.pipeline.reporting.PipelineRun.updateStatus(/forgerock/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/reporting/PipelineRun.groovy:205) at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2030) at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2015) at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2056) at com.forgerock.pipeline.reporting.PipelineRun.updateStatus(/forgerock/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/reporting/PipelineRun.groovy:191) at com.forgerock.pipeline.reporting.PipelineRun.updateStageStatusAsInProgress(/forgerock/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/reporting/PipelineRun.groovy:164) at WorkflowScript.runPipeline(WorkflowScript:148) at com.forgerock.pipeline.stage.StageRunner.accept(file:/var/lib/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/stage/StageRunner.groovy:81) at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2030) at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2015) at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2056) at com.forgerock.pipeline.stage.StageRunner.accept(file:/var/lib/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/stage/StageRunner.groovy:80) at com.forgerock.pipeline.stage.StageRunner.runStage(file:/var/lib/jenkins/jobs/OpenAM-master/jobs/OpenAM-Pipeline/branches/master/builds/4756/libs/java-pipeline-libs/src/com/forgerock/pipeline/stage/StageRunner.groovy:136) at ___cps.transform___(Native Method) at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:57) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82) at sun.reflect.GeneratedMethodAccessor650.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:103) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82) at sun.reflect.GeneratedMethodAccessor650.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72) at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:60) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82) at sun.reflect.GeneratedMethodAccessor650.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72) at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21) at com.cloudbees.groovy.cps.Next.step(Next.java:83) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163) at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122) at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261) at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$101(SandboxContinuable.java:34) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.lambda$run0$0(SandboxContinuable.java:59) at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:237) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:58) at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:332) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:83) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:244) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:232) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Root Cause AnalysisAppears to be caused by JENKINS-46507.

Actions

Jenkins was restarted twice to prevent further builds from falling over. Robin suggests the root cause can only be fixed in GCP.

TakeawaysLook out for InterruptedException and flag ASAP for Jenkins restart.

07/09/2020 15:49

Who present in meetingIsaac, Luna, Pete, Rob
Commit311cb771cee
DetailsRepetition of the ExternalDjTest test failure seen at 047da8872ad.
Root Cause AnalysisSame as original.

Actions

We expect the original fix which is in PR to resolve this issue. If it reoccurs then we can take subsequent action.

Takeaways

07/09/2020 15:49

Who present in meetingIsaac, Luna, Pete, Rob
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/fac4299620899d8e1b4ab1f83890ffabbc55f1e0
DetailsOauth2 ImageSmokeTestFailure in RFC7523
Root Cause Analysis

5 failures were located in the image smoke test run https://qa.forgerock.com/am/master/fac4299620899d8e1b4ab1f83890ffabbc55f1e0/e167c118-a9f4-464c-868a-c654fad8b0c8/image-smoke-tests/ImageSmokeTestsReport/class-com.forgerock.openam.functionaltest.oauth2.rfc7523.JwtBearerTokenEndpoint.html


The commit that triggered the failure does not appear to contain information which should have resulted in this failure on its own:

https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/fac4299620899d8e1b4ab1f83890ffabbc55f1e0


It appears the test expects for a specific realm to not exist at the start of each test. This realm name appears to be hardcoded ("alias.localtest.me"). The test then fails when it attempts to create this realm at the start of the test.

Actions

A JIRA task has been created AME-20296 - Getting issue details... STATUS to investigate the cause of the realms not being cleanup correctly during the run.

TakeawaysIt is important to not hardcode significant values (realm names, user names, etc.) in tests. Instead, randomly generate them on each test run, so that issues caused by cleanup failing are mitigated. However, doing so may mask failures in our cleanup code or potential concurrency issues.

07/09/2020 15:29

Who present in meetingIsaac, Luna, Pete, Rob
Commit18aa0358e1a
Details

UserConfigTreeTest failed during a FT run on a commit.

Root Cause Analysis

UserConfigTreeTest has a failure:

https://qa.forgerock.com/am/master/18aa0358e1a6f5bffef0f6fdffe35e34b04efcb4/96600e67-8ef0-4f14-b5cd-e27f4e6e7ff5/functional-tests/functional-tests-report/FunctionalTestsReportReport/class-com.forgerock.openam.functionaltest.auth.trees.UserConfigTreeTest.html#e6b7b8f113a134da6daed496e999f8676e8aa7fa4104a49e148420abb583ca38


In a commit subsequent to this one, we see a successful run of this test:

https://temper-dashboard.engineering.forgerock.com/openam/master/functional/test-run/fac4299620899d8e1b4ab1f83890ffabbc55f1e0/functional-tests


The commit that triggered the failure does not appear to contain information which should have resulted in this failure on its own:

https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/18aa0358e1a6f5bffef0f6fdffe35e34b04efcb4


The test history is available in this Kibana Query however, at time of writing, test-events is currently re-indexing, so we are unable to view the query results.

Actions

There are no AM logs with which to proceed with investigation at this point - see the link to the failed test information. AME-20295 - Getting issue details... STATUS has been created for further investigation.

TakeawaysSimilar situation to the break at 07/09/2020 11:00.

07/09/2020 14:40

Who present in meetingPete, Emma
CommitN/A
Details3 instances of am-jenkins-static-lightweight-agent running at once, 2 of which are showing as offline in Jenkins.
Root Cause Analysis

groovy.json.JsonException: Unable to determine the current character, it is not a string, number, array, or object

The above exception is being thrown in the logs for every executor in a given pipeline. Each failure is then creating a process on a am-jenkins-static-lightweight-agent instance. This exception is being caused by the response from jenkins when trying to download log files for a stage being a html error page rather than the expected JSON.


At the end of the log file, the following is observed,
[gcp-gce-jenkins-agent-3] Cannot contact am-jenkins-static-lightweight-agent-8lw19p: java.lang.InterruptedException
[gcp-gce-jenkins-agent-2] Cannot contact am-jenkins-static-lightweight-agent-8lw19p: java.lang.InterruptedException
[gcp-gce-jenkins-agent-5] Cannot contact am-jenkins-static-lightweight-agent-8lw19p: java.lang.InterruptedException
[gcp-gce-jenkins-agent-6] Cannot contact am-jenkins-static-lightweight-agent-8lw19p: java.lang.InterruptedException
[gcp-gce-jenkins-agent-4] Cannot contact am-jenkins-static-lightweight-agent-8lw19p: java.lang.InterruptedException

Actions

TakeawaysThe cause of the java.lang.InterruptedException is still not clear, these notes have been added to AME-20247



07/09/2020 11:00

Who present in meetingJay B, Phil A, Dipu S, Kevin U, Isaac
Commithttps://stash.forgerock.org/projects/OPENAM/repos/openam/commits/37d4aa2551b41c45f7719202fefe2b16e84fd122
DetailsThere is an intermittent failure of the SocialAuthNodeTest in the functional tests stage.
Root Cause AnalysisThere was no obvious root cause, the commit that failed was not relevant and the subsequent commit passed. 

Actions

This is an intermittent failure and the issue  AME-20294 - Getting issue details... STATUS  has been raised to track the issue.

TakeawaysThat the test was flaky was obvious because the next commit had run and passed, if master had been immediately locked this would not have been obvious. One suggestion when implementing the automatic locking of master is to create a PR that re-runs the failed stage.

04/09/2020 10:40

Date/Time04/09/2020 10:40
Who present in meetingPhil A, Emma, Pete, Gabor
Commit047da8872ad
Details

This is an intermittent failure of the ExternalDJTest in functional-tests.

Root Cause Analysis

This is an intermittent failure caused by AM caching ExternalDJ connections based on the DJ Host URL. There is already a ticket and a PR open to change the functional test API to avoid this issue.
AME-20134

Actions

Takeaways

03/09/2020 15:40

Date/Time03/09/2020 15:40
Who present in meetingPhil A, Emma, Pete, Phill C
Commitb25bc870b07 ← Seen on this commit, but caused by 1b4c3e77036 which did not appear on the dashboard.
Details

Build Log

Root Cause Analysis

Build failure caused by unit test failure.

Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.215 sec <<< FAILURE! - in FbcMasterRulesTest


Tests fail due to redundant JSON files. These files are no longer required as they are already provided in the base config of the docker image.

Actions

Takeaways


03/09/2020 09:30

Date/Time03/09/2020 09:30
Who present in meetingPhil A, Andrew, Andy, Emma, Isaac, Pete
Commitb42a6d9395b ← Seen on this commit, but potentially in the range cce413bc3dc...b42a6d9395b
Details

Temper Report

Kibana Query

Functional tests stage had 14 failures in REST-Session.

We can see from the Kibana Query that this has failed 3 times for the same reason in the last month 

Root Cause Analysis


Actions

Created  AME-20280 - Getting issue details... STATUS

TakeawaysThere was an issue which hampered this investigation. For some reason commits to master aren't always triggering pipeline runs. This issue was seen last week and "fixed" with a Jenkins & Bitbucket restart. This requires further investigation from Releng and IT.


01/09/2020 16:30

Date/Time01/09/2020 16:30
Who present in meetingRich, Phil A, Jay
Commitc6fd48493b8
Details

Temper Report

Kibana Query

Image smoke stage had 6 failures.

We can see from the Kibana Query that this has failed 3 times for the same reason in the last month 

Root Cause Analysis
The stacktrace suggests it might be a timing issue with DS 


{{ 2020-09-01 12:46:56,733 [main] WARN com.forgerock.openam.functionaltest.commands.ConfigManager - Could not execute command: Delete realm: L0p3dEJlYXJlclRva2VuRW5kcG9pbnQtSUdqMXpDVmNCS1B4STB3}}
{{ org.forgerock.http.protocol.ResponseException: Got unsuccessful response: 400 Bad Request}}{{

{"code":400,"reason":"Bad Request","message":"Such node does not exist in the directory server."}

}}when attempting to read http://am.localtest.me:8080/am/json/global-config/realms/L0p3dEJlYXJlclRva2VuRW5kcG9pbnQtSUdqMXpDVmNCS1B4STB3 [1.0]
{{ at com.forgerock.openam.functionaltest.HttpUtils.isSuccessful(HttpUtils.java:76)}}
{{ at com.forgerock.openam.functionaltest.CrestClient.delete(CrestClient.java:245)}}
{{ at com.forgerock.openam.functionaltest.CrestClient.delete(CrestClient.java:218)}}
{{ at com.forgerock.openam.functionaltest.api.Realm$Builder.lambda$create$2(Realm.java:422)}}
{{ at com.forgerock.openam.functionaltest.commands.Command$1.execute(Command.java:51)}}
{{ at com.forgerock.openam.functionaltest.commands.ConfigManager.revert(ConfigManager.java:93)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.runHooks(TestBlockRunner.java:145)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.runBlockHooks(TestBlockRunner.java:131)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.run(TestBlockRunner.java:89)}}
{{ at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.run(TestBlockRunner.java:87)}}
{{ at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.run(TestBlockRunner.java:87)}}
{{ at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.run(TestBlockRunner.java:87)}}
{{ at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)}}
{{ at org.forgerock.cuppa.internal.TestBlockRunner.run(TestBlockRunner.java:87)}}
{{ at org.forgerock.cuppa.Runner.runTests(Runner.java:195)}}
{{ at org.forgerock.cuppa.Runner.lambda$run$1(Runner.java:150)}}
{{ at org.forgerock.cuppa.internal.TestContainer.runTests(TestContainer.java:276)}}
{{ at org.forgerock.cuppa.Runner.run(Runner.java:146)}}
{{ at com.forgerock.openam.functionaltest.SurefireProvider.execute(SurefireProvider.java:151)}}
{{ at com.forgerock.openam.functionaltest.SurefireProvider.lambda$executeWithRerun$0(SurefireProvider.java:138)}}
{{ at com.forgerock.openam.functionaltest.SurefireProvider.executeWithRerunAndReports(SurefireProvider.java:156)}}
{{ at com.forgerock.openam.functionaltest.SurefireProvider.executeWithRerun(SurefireProvider.java:137)}}
{{ at com.forgerock.openam.functionaltest.SurefireProvider.invoke(SurefireProvider.java:124)}}
{{ at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:290)}}
{{ at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:242)}}
{{ at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:121)}}
{{  }}

Actions

Created Jira: AME-20231
TakeawaysFlakey test needs fixing. Also It would be helpful to have AM logs added for the image stages


01/09/2020 15:54

Date/Time

01/09/2020 15:54

Who present in meeting
Commit3a4f2d33af55fd2db7bd272a499c206247b85113
Details

Build stage failed
Click on the build "dot"
Find the Jenkins log url nexted in stack trace
Navigated to end of logs

[36mmaven_1 |[0m [INFO] ————————————————————————————————————
[36mmaven_1 |[0m [INFO] BUILD FAILURE
[36mmaven_1 |[0m [INFO] ————————————————————————————————————
[36mmaven_1 |[0m [INFO] Total time: 27:39 min
[36mmaven_1 |[0m [INFO] Finished at: 2020-08-25T09:33:38Z
[36mmaven_1 |[0m [INFO] ————————————————————————————————————
[36mmaven_1 |[0m [ERROR] Failed to execute goal org.apache.maven.plugins:maven-archetype-plugin:3.0.1:integration-test (default-integration-test) on project auth-tree-node-archetype:
[36mmaven_1 |[0m [ERROR] Archetype IT 'basic' failed: Execution failure: exit code = 1
[36mmaven_1 |[0m [ERROR] -> [Help 1]
[36mmaven_1 |[0m [ERROR]
[36mmaven_1 |[0m [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[36mmaven_1 |[0m [ERROR] Re-run Maven using the -X switch to enable full debug logging.
[36mmaven_1 |[0m [ERROR]
[36mmaven_1 |[0m [ERROR] For more information about the errors and possible solutions, please read the following articles:
[36mmaven_1 |[0m [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[36mmaven_1 |[0m [ERROR]
[36mmaven_1 |[0m [ERROR] After correcting the problems, you can resume the build with the command
[36mmaven_1 |[0m [ERROR] mvn <goals> -rf :auth-tree-node-archetype

Existing ticket - AME-18031


3a4f2d33af55fd2db7bd272a499c206247b85113
Same as above but different AM artefact

Root Cause AnalysisWhilst building the maven archetype module it couldn't download an artifact
it shouldn't be being built as its been built before hand by the maven reactor
Inconsistent which AM artefact that is being downloaded incorrectly

Actions

Response to the incident was to allow the subsequent commit to run. Further investigation is needed before the root cause can be established.

Tickets Raised:

  • AME-20191 - re-run failed builds due to dependency resolution issues downloading AM artifact incorrectly - Emma
  • AME-20192 - Instrument the build process to verify the artefacts are being stored correctly locally on filesystem - Rob
TakeawaysThinking around the tree node archetype not explicitly declaring its required dependencies
Could we remove integration test (cause) if value is not significant?









  • No labels