This beginner's guide aims to provide some information about how to troubleshoot a performance issue generally or an issue caught during regression testing. This guide assumes that the performance tests are using the Gatling and PyForge frameworks. The tests are run nightly or weekly and can be found here: http://jenkins-fr.internal.forgerock.com:8080/view/AM%20Stress/job/AM-7.1.0/
The slack channel #am_performance also reports any performance issues.
See this page for information about PyForge: https://pyforge.engineering.forgerock.com/docs/getting-started
Config file parameters
Below is an example of the Stress parameters used by the tests.
If the performance drop was reported on a regression test, update the config to match the Jenkins job’s configuration.
PyForge Test Command
This can be found in the Jenkins console output and would like this:
Results and Reports
PyForge generates a folder for each run under the results directory. The gatling report can be found <PYFORGE_HOME>/results/<TIMESTAMP>/<SUITE_PREFIX>/graph/
Finding the commit
The first step in troubleshooting a regression issue is to find the offending commit. The performance regression tests are run nightly, so there may be a few commits between each run. When a drop in performance is reported it usually involves a few commits from AM. If the commit is not very obvious, you can do a git bisect. If you build OpenAM locally on a specific commit, this can be copied to the remote machine into the <PYFORGE_HOME>/archives folder.
The steps involved to run the tests:
- Make sure you have the appropriate configuration (See Config file section)
- Run with the last known good commit
- Run with with the bad commit
- Keep doing a Git bisect until you find the bad commit
See Replicating on how to run the tests. At each step, record the throughput (Req/s) and Response times from the gatling report.
Sometimes, if the performance drop is spread across a few commits, you may have to find alternative ways like profiling and sampling.
It is better to use a Lab machine as running locally can give inconsistent results. Running locally or in the Lab should be similar once you have PyForge cloned. However, there is a process to get a Lab machine ( See Run in the Lab).
Run in the Lab
Go to the page https://wikis.forgerock.org/confluence/pages/viewpage.action?spaceKey=QA&title=Grenoble+Lab and see if there are any available machines. If there is one, put your name against it and update the page. You can then ssh to the box. There is a #grenoble-lab slack channel to assist with credentials and any other problems. Once you get the credentials and login to the box (You will need to be on the VPN for this), create a directory under /external/testuser with your name and clone PyForge in it. Once a test has been run, the results can be accessed via http. For example, http://gouda.internal.forgerock.com/external/testuser/ravigeda/pyforge/results/
See this for detailed information in Flamegraphs: http://www.brendangregg.com/flamegraphs.html
IntelliJ (ultimate) has the async-profiler and also Flamegraph visualisation built in it. You can kick off a PyForge performance test that you need to investigate and attach the profiler.
This is useful when the issue can be replicated locally. But sometimes the problem may only exist when run under load on a remote machine. In such cases, you can set up an async profiler on the box. See https://github.com/jvm-profiling-tools/async-profiler
There are other alternatives to this like using perf utility, creating a map file of JVM symbols.
Detailed steps available here: https://maheshsenniappan.medium.com/java-performance-profiling-using-flame-graphs-e29238130375
If you need to take CPU usage, GC activity, Memory usage or monitor Live Threads you can use Visual VM. A thread dump can also be taken to diagnose any deadlocks. If running locally, run the performance test and open the process in Visual VM. But if you are running the performance test on a remote machine, you would need to enable a JMX connection. To enable it in the PyForge environment add/append the following Java args in the OpenAM section as shown in bold here
Once PyForge has started the test, you can connect your Visual VM to the remote process.