Blog & News
Performance Management Methodology®
The key to understanding your system is learning it at a deeper level. To do that, one must have a very structured methodology to understand all performance components (CPU, Disk and Memory).
The methodology is simple but it must be done on a regular basis (i.e. Monthly / Weekly Reporting).
- Historical data must be collected:
This data can be collected various ways depending on your platform of choice. For IBM Power Systems, collection services, NMON or SAR data is collected throughout the day. MPG’s performance management best practice is to collect data every 5 minutes.
- Structured Performance Guidelines must be established:
Just like our children’s grades in school (A, B, C, D ∓ F), one must create a set of guide lines to be used in your environment. These guidelines (a.k.a “Best Practice Guidelines”) will be used to determine how your system is performing. Performance Guidelines can be unique to your company (i.e. your home grown application routinely experiences 2 second response time), or industry guidelines can be used (i.e. Machine pool faulting must be under 10 Faults/sec).
It’s not uncommon to have a combination. The key is to have a measuring stick.
An example of structured guidelines is shown below:
- Structured Mostly Performance Reporting:
Here is where the rubber meets the road as one must measure our real performance components against the guidelines we established in step 2. In this step we measure all the critical performance components:
- CPU Utilization
- Disk Utilization
- Disk Response Time
- 5250 Response Time
- Faulting Rate
and so on...
Here is a monthly report example where we measure CPU for the previous month:
Here is a 12 month historical trend:
- Monthly Resource Consumption Analysis:
Anyone can measure CPU, disk and memory metrics...However, it’s what you do next that counts. That is, after understanding the core performance components, one must measure the jobs that are running on your system.
The goal, to answer these questions:
- What jobs are consuming the most CPU ms?
- What jobs are experiencing the most IOs?
- What jobs are experiencing the most faults?
The reason step 4 is so vital, is demonstrated in the picture:
Here we see a real life example where one job suddenly used 1657% more CPU ms each month. As you can see, the phenomena not only started in September, but the problem has been on the system for 3 months!
When one does monthly resource consumption analysis, not only do you learn your system at even a deeper level, but problem jobs tend to stuck out like a sore thumb.