This is an analysis of the data contained in the file sar22. The data was collected on 06/22/2005, from 00:20:00 to 17:20:00, from the system 'ux3005'. There were 17 data records used to produce this analysis. The operating system used to produce the sar report was Release 5.3 of AIX. The system configuration data in the sar report indicated that 8.0 processors were configured. 12288 megabytes of memory were seen in the system configuration data.
Data collected by the ps -elf command on 06/22/2005 from 00:20:00 to 17:20:00, and stored in the file /opt/sarcheck/ps/20050622, will also be analyzed. This program will attempt to match the starting and ending times of the ps -elf data with those of the sar report file named sar22.
Table of Contents
When the data was collected, no CPU bottleneck could be detected. A memory bottleneck was seen. No significant I/O bottleneck was seen. A change to at least one tunable parameter has been recommended. Limits to future growth have been noted in the Capacity Planning section.
Some of the defaults used by SarCheck's rules have been overridden using the sarcheck_parms file. See the Custom Settings section of the report for more information.
All recommendations contained in this report are based solely on the conditions which were present when the performance data was collected. It is possible that conditions which were not present at that time may cause some of these recommendations to result in worse performance. To minimize this risk, analyze data from several different days, implement only regularly occurring recommendations, and implement them one at a time or as groups of related parameters.
Additional memory may improve performance. More than half of the system's memory was pinned, making less memory available to meet the needs of processes running on the system. If possible, borrow some memory for test purposes, and monitor system performance and resource utilization before and after its installation.
Change the value of maxfree from 1088 to 1984 with the command 'vmo -o maxfree=1984'. The -o flag changes the value of a parameter only until the next reboot. To make the change permanent, use the command 'vmo -p -o maxfree=1984'. The magnitude of this change has been limited to prevent the recommendation of very large changes. Changing this parameter in smaller increments is a much safer way to tune the system. This change is recommended based on formulas discussed at IBM's pSeries Technical University and at the UserBlue conference. The following data was used in this calculation: The maxpgahead value used was 8. The value of lcpu reported by sar was 8.0. The number of active CPUs reported by sysconf is 1.
Change the value of minfree from 960 to 1920 with the command 'vmo -o minfree=1920'. The -o flag changes the value of a parameter only until the next reboot. To make the change permanent, use the command 'vmo -p -o minfree=1920'. The magnitude of this change has been limited to prevent the recommendation of very large changes. Changing this parameter in smaller increments is a much safer way to tune the system. This change is recommended based on formulas discussed at IBM's pSeries Technical University and at the UserBlue conference. The following data was used in this calculation: The number of memory pools seen was 4. The value of lcpu reported by sar was 8.0. The number of active CPUs reported by sysconf is 1.
Change the value of the maxperm parameter to 60 with the command 'vmo -o maxperm%=60'. The -o flag changes the value of a parameter only until the next reboot. To make the change permanent, use the command 'vmo -p -o maxperm%=60'. This should bring the value of maxperm down in the direction that will improve performance. This change will not be helpful if the system's primary function is an nfs server or it is doing a lot of raw database I/O. The recorded value for maxperm was 80.0 percent.
Change the value of the minperm parameter to 15 with the command 'vmo -o minperm%=15'. The -o flag changes the value of a parameter only until the next reboot. To make the change permanent, use the command 'vmo -p -o minperm%=15'. This recommended value has been set to match that of the maxperm recommendation. If you choose not to change maxperm because the system is an nfs server or it is performing a lot of raw database I/O, minperm should not be changed either. The recorded value for minperm was 20.0 percent.
A CPU upgrade is not recommended because the current CPU had significant unused capacity.
No disk recommendations have been made because no bottleneck was seen.
An average of 29.64 percent of this partition's entitled CPU capacity (%entc) was used during the monitoring period. The percentage peaked at 54.50 from 04:20:03 to 05:20:02. There were 0.82 physical processors in use when the percentage of entitled CPU capacity was at its peak.

The average number of physical processors consumed by this partition (physc) was 0.44. The peak number of physical processors consumed was 0.82 from 04:20:03 to 05:20:02.
Information in this paragraph is taken from the sar -u report. This information may not be completely accurate on a micropartitioned POWER5 system and is provided because people are used to seeing it. Average CPU utilization (%usr + %sys) was only 22.5 percent. This indicates that spare CPU capacity exists. If any performance problems were seen during the entire monitoring period, they were not caused by a lack of CPU power. CPU utilization peaked at 46 percent from 04:20:03 to 05:20:02. The CPU was waiting for I/O (%wio) an average of 6.8 percent of the time. The time that the system was waiting for I/O peaked at 19 percent from 01:20:01 to 03:20:03.

The preceding graph shows the relationship between %entc data and the sum of %usr and %sys. The %entc data is more accurate and should be used instead of the traditional %usr and %sys metrics. The %wio column is probably not very accurate but higher values are likely to indicate times of greater I/O activity. Because the %usr, %sys, and %wio data is not accurate on micropartitioned POWER5 systems, it has not been used to calculate the percent of time that the system was idle.
Information in this paragraph is taken from the runq-sz and %runocc columns in the sar -q report and may not be completely accurate on a micropartitioned POWER5 system. The run queue had an average length of 1.4 which indicates that processes were generally not bound by latent demand for CPU resources. The run queue was usually occupied, despite the lack of a significant run queue length. This condition is usually seen when the number of CPU-intensive processes is low. It is likely that the performance of these processes is closely related to CPU speed.
No buffer cache activity was seen in the sar -b data. This is normal for AIX systems, which typically do not use the traditional buffer cache.
The average rate at which I/O was blocked because an LVM had to wait for pbufs was 0.20 per second. The peak rate was 6.69 per second from 01:00:00 to 01:20:01. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst when LVMs had to wait for pbufs, then a problem may be that the number of pbufs was insufficient. A recommendation to increase the number of pbufs was not made because a memory-poor environment was seen. A recommendation to increase the number of pbufs was not made because the amount of pinned memory was close to the maximum permitted by the maxpin tunable parameter.
The average rate at which I/O was blocked because the kernel had to wait for a free bufstruct (called fsbuf in vmstat -v) was 2.51 per second. The peak rate was 41.73 per second from 00:40:00 to 01:00:00. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst when the kernel had to wait for bufstructs, then a problem may be that bufstructs could not be allocated quickly enough to meet the I/O load. A recommendation to increase the number of bufstructs was not made because a memory-poor environment was seen.

The above graph shows when the rate of I/O blocking was highest. If these times are the ones when performance was poor, if may be possible to improve performance by increasing the appropriate number of buffers.
The average context switch rate (cswch/s) was 5276.59 per second. The context switch rate (cswch/s) peaked at 11278.0 per second from 10:20:01 to 11:20:03. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst during the period of peak context switching, then a problem may be that too many processes were blocked for I/O or IPC.
There was no indication of swapped out processes in the ps -elf data. Processes which have been swapped out are usually found only on systems that have a very severe memory shortage.
The average number of page replacement cycles per second (cycle/s) was 0.015. The number of page replacement cycles per second peaked at 0.06 from 00:20:00 to 01:20:01. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst during the period of peak replacement cycle activity, then a shortage of physical memory may be performance bottleneck.
The average number of kernel threads waiting to be paged in (swpq-sz) was 1.85. The average number of kernel threads waiting to be paged in (swpq-sz) peaked at 2.5 from 02:20:06 to 03:20:03. When the peak was reached, the swap queue was occupied 95 percent of the time. A more useful statistic is sometimes available by multiplying the swpq-sz data by the percent of time the queue was occupied. In this case, the average was 0.83 and the peak was 2.38 from 02:20:06 to 03:20:03. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst when the number of kernel threads waiting to be paged in was at its peak, then a shortage of physical memory may be performance bottleneck.
The following graph shows any significant statistics relating to page replacement cycle rate, number of kernel threads waiting to be paged in, and number of swapped processes.

The average page out rate to the paging spaces was 23.36 per second. The paging space page out rate peaked at 78.22 from 02:00:02 to 02:20:01. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst when the paging space page out rate was at its peak, then a shortage of physical memory may be performance bottleneck. The following graph shows the rate of paging operations to the paging spaces.

The recorded setting for maxpin leaves 2457.60 megabytes of memory unpinnable. A memory-poor environment was seen and more than half of the system's memory was pinned. When most of the memory is pinned, less memory can be freed to meet the needs of processes running on the system.

A change to the value of maxperm has been recommended in order to bring it down toward the calculated target value of 16.0 percent. The target is based on the average value of numperm but it is too far from the recorded value of maxperm for the change to be reasonably implemented in one step. By avoiding very large changes, this program minimizes the problems seen in large, traumatic changes. Because this recommendation will not cause a change to the relationship between maxperm or minperm values and the average numperm value, performance improvement is not likely to be dramatic. The calculated average value of numperm was 20.6 percent.
A recommendation has been made to change the value of minperm to 15. This change will preserve the relationship between the minperm and maxperm parameters.
The following graph shows the relationship between numperm, minperm, and maxperm.

A change to the value of maxfree and/or minfree have been recommended based on formulas discussed at IBM's pSeries Technical University. The magnitude of these changes has been limited to prevent the recommendation of very large changes. Changing these parameters in smaller increments is a much safer way to tune the system. The maxpgahead value used was 8. The number of memory pools seen was 4. The value of lcpu reported by sar was 8.0. The number of active CPUs reported by sysconf is 1.
No I/O bottleneck was seen in the sar statistics, therefore no changes are recommended for maxpgahead.
The value of numclust is 1. If fast disk devices, disk arrays, or striped logical volumes are in use, the performance of disk writes could be improved by increasing this value. SarCheck does not have access to enough information about the system's disk devices to make any specific recommendation for tuning numclust.
The minimum multiprogramming level has been set to 2. This is a safe value for small configurations and may be low for larger configurations. This parameter is very dependent of workload and the correct value cannot be determined with sar and ps data. A memory shortage has been seen and a value which is too low may cause performance problems. More information can be found on page 23 of Frank Waters' "AIX Performance Tuning".
The average rate of System V semaphore calls (sema/s) was 0.001 per second. No problems have been seen, and no changes have been recommended for System V semaphore parameters. Note that SarCheck only checks these parameter's relationships to each other since semaphore usage data is not available.
No System V message activity (msg/s) was seen. No problems have been seen, and no changes have been recommended for System V message parameters. Note that SarCheck only checks these parameter's relationships to each other since message usage data is not available.
There were no times when enforcement of the process threshold limit (kproc-ov) prevented the creation of kernel processes. This indicates that no problems were seen in this area.
The ratio of exec to fork system calls was 1.00. This indicates that PATH variables are efficient.
The average system-wide local I/O rate as measured by the r+w/s column in the sar -d data was 370.53 per second. This I/O rate peaked at 1038 per second from 01:20:01 to 02:20:06.

The following graph shows the average percent busy and service time for up to 5 disks, not sorted with the -dbusy or -dserv switches. Only the first 5 disks to appear in the sar report appear in the graph and these may not be the ones that you want to see. A more useful graph can be created by using the -dbusy or -dserv switches.

Note: 45 disks were present. By default, the presence of more than 12 disks causes SarCheck to only report on the busiest disks. This is meant to control the verbosity of this report. To see all disks included in the report, use the -d option.
The -dtoo switch has been used to format disk statistics into the following table.
| Disk Device Statistics | ||||
|---|---|---|---|---|
| Disk Device | Average Percent Busy | Peak Percent Busy | Queue Depth when occupied | Average Service Time |
| hdisk1 | 17.18 | 57.0 | 1.8 | 10.7 |
| hdisk0 | 26.12 | 74.0 | 1.8 | 9.6 |
| hdisk8 | 2.41 | 34.0 | 313.1 | 3.7 |
| hdisk22 | 3.24 | 24.0 | 0.3 | 3.6 |
| hdisk11 | 2.18 | 36.0 | 242.4 | 2.4 |
| hdisk23 | 4.29 | 22.0 | 0.3 | 2.3 |
| hdisk5 | 5.00 | 74.0 | 1293.2 | 12.2 |
| hdisk24 | 1.82 | 27.0 | 0.5 | 3.5 |
| hdisk7 | 2.35 | 32.0 | 350.9 | 5.7 |
| hdisk27 | 7.88 | 19.0 | 0.3 | 3.4 |
The disk device hdisk1 was busy an average of 17.18 percent of the time and had an average queue length of 1.8 (when occupied). This indicates that the device is not a performance bottleneck. During the peak interval from 01:20:01 to 02:20:06, the disk was 57.0 percent busy. Peak disk busy statistics can be used to help understand performance problems. If performance was worst when the disk was busiest, then a performance bottleneck may be that disk. The average service time reported for this device and its accompanying disk subsystem was 10.7 milliseconds. This is relatively fast. Service time is the delay between the time a request was sent to a device and the time that the device signaled completion of the request.
The disk device hdisk0 was busy an average of 26.12 percent of the time and had an average queue length of 1.8 (when occupied). This indicates that the device is not a performance bottleneck. During the peak interval from 01:20:01 to 02:20:06, the disk was 74.0 percent busy. The average service time reported for this device and its accompanying disk subsystem was 9.6 milliseconds. This is relatively fast.
The disk device hdisk8 was busy an average of 2.41 percent of the time and had an average queue length of 313.1 (when occupied). This usage pattern is typical of corrupted sar data. During the peak interval from 01:20:01 to 02:20:06, the disk was 34.0 percent busy. The average service time reported for this device and its accompanying disk subsystem was 3.7 milliseconds. This is indicative of a very fast disk or a disk controller with cache. Queue length on this device peaked at an unlikely 377.5. This data is surprising and may indicate a problem with the sar -d statistics.
The disk device hdisk22 was busy an average of 3.24 percent of the time and had an average queue length of 0.3 (when occupied). This indicates that the device is not a performance bottleneck. The average service time reported for this device and its accompanying disk subsystem was 3.6 milliseconds. This is indicative of a very fast disk or a disk controller with cache.
The disk device hdisk11 was busy an average of 2.18 percent of the time and had an average queue length of 242.4 (when occupied). This usage pattern is typical of corrupted sar data. During the peak interval from 03:20:03 to 04:20:03, the disk was 36.0 percent busy. The average service time reported for this device and its accompanying disk subsystem was 2.4 milliseconds. This is indicative of a very fast disk or a disk controller with cache. Queue length on this device peaked at an unlikely 249.1. This data is surprising and may indicate a problem with the sar -d statistics.
The disk device hdisk23 was busy an average of 4.29 percent of the time and had an average queue length of 0.3 (when occupied). This indicates that the device is not a performance bottleneck. The average service time reported for this device and its accompanying disk subsystem was 2.3 milliseconds. This is indicative of a very fast disk or a disk controller with cache.
The disk device hdisk5 was busy an average of 5.00 percent of the time and had an average queue length of 1293.2 (when occupied). This usage pattern is typical of corrupted sar data. During the peak interval from 00:20:00 to 01:20:01, the disk was 74.0 percent busy. The average service time reported for this device and its accompanying disk subsystem was 12.2 milliseconds. This service time is acceptable. Queue length on this device peaked at an unlikely 1485.4. This data is surprising and may indicate a problem with the sar -d statistics.
The disk device hdisk24 was busy an average of 1.82 percent of the time and had an average queue length of 0.5 (when occupied). This indicates that the device is not a performance bottleneck. The average service time reported for this device and its accompanying disk subsystem was 3.5 milliseconds. This is indicative of a very fast disk or a disk controller with cache.
The disk device hdisk7 was busy an average of 2.35 percent of the time and had an average queue length of 350.9 (when occupied). This usage pattern is typical of corrupted sar data. During the peak interval from 01:20:01 to 02:20:06, the disk was 32.0 percent busy. The average service time reported for this device and its accompanying disk subsystem was 5.7 milliseconds. This is indicative of a very fast disk or a disk controller with cache. Queue length on this device peaked at an unlikely 438.2. This data is surprising and may indicate a problem with the sar -d statistics.
The disk device hdisk27 was busy an average of 7.88 percent of the time and had an average queue length of 0.3 (when occupied). This indicates that the device is not a performance bottleneck. The average service time reported for this device and its accompanying disk subsystem was 3.4 milliseconds. This is indicative of a very fast disk or a disk controller with cache.
Data collected by ps -elf indicated that at 14:00:00 there were a peak of 972 processes present. This was the largest number of processes seen with ps -elf but it is not likely to be the absolute peak because the operating system does not store the true "high-water mark" for this statistic. There were an average of 968.1 processes present.

No runaway processes, memory leaks, or suspiciously large processes were detected in the data contained in file /opt/sarcheck/ps/20050622.
No table was generated because no unusual resource utilization was seen in the ps -elf data.
This section is designed to provide the user with a rudimentary linear capacity planning model and should be used for rough approximations only. These estimates assume that an increase in workload will affect the usage of all resources equally. These estimates should be used on days when the load is heaviest to determine approximately how much spare capacity remains at peak times.
Based on the limited data available in this single sar report, the system should be able to support a limited increase in workload at peak times before the first resource bottleneck affects performance. See the following paragraphs for additional information.

The CPU can support an increase in workload of approximately 65 percent at peak times. Since page outs and/or swapping were detected, an increase in workload should be accompanied by an increase in memory. The busiest disk can support a workload increase of approximately 1 percent at peak times. For more information on peak resource utilization, refer to the Resource Analysis section of this report.
The default HSIZE value was changed in the sarcheck_parms file from 0.70 to 1.20 times the default gnuplot width.
Please note: In no event can Aptitune Corporation be held responsible for any damages, including incidental or consequent damages, in connection with or arising out of the use or inability to use this software. All trademarks belong to their respective owners.
This software provided for the exclusive use of: Your Company. This software expires on 08/19/2005 (mm/dd/yyyy). Code version: 6.02.01. Serial number: 39495969.
(c) copyright 1995-2005 by Aptitune Corporation, Plaistow NH, USA, All Rights Reserved. http://www.sarcheck.com
| Statistics for system, ux3005 | ||||
|---|---|---|---|---|
| Start of peak interval | End of peak interval | Date of peak interval | ||
| System ID on sar report, | 00C9F42E4C00 | |||
| System ID of this system, | 000481674C00 | |||
| System model number is, | IBM Model 7042/7043 (ED) | |||
| Statistics collected on, | 06/22/2005 | |||
| Average phys processors consumed, | 0.44 | |||
| Peak phys processors consumed, | 0.82 | 04:20:03 | 05:20:02 | 06/22/2005 |
| Average entitled capacity consumed, | 29.64% | |||
| Peak entitled capacity consumed, | 54.5% | 04:20:03 | 05:20:02 | 06/22/2005 |
| Average CPU utilization, | 22.5% | |||
| Peak CPU utilization, | 46% | 04:20:03 | 05:20:02 | 06/22/2005 |
| Average user CPU utilization, | 7.1% | |||
| Average sys CPU utilization, | 15.4% | |||
| Average waiting for I/O, | 6.8% | |||
| Average run queue length, | 1.4 | |||
| Peak run queue length, | 1.6 | 04:20:03 | 05:20:02 | 06/22/2005 |
| Average run queue occupancy, | 53.4% | |||
| Average swap queue length, | 0.83 | |||
| Peak swap queue length, | 2.4 | 02:20:06 | 03:20:03 | 06/22/2005 |
| Peak page replacement cycle rate, | 0.06 | 00:20:00 | 01:20:01 | 06/22/2005 |
| Max paging space page outs, | 78.22 | 02:00:02 | 02:20:01 | 06/22/2005 |
| Max paging space page ins, | 99.74 | 02:00:02 | 02:20:01 | 06/22/2005 |
| Max swapped processes seen by ps, | 0 | |||
| Avg number of processes seen by ps, | 968.1 | |||
| Max number of processes seen by ps, | 972 | 14:00:00 | 06/22/2005 | |
| Average numperm value, | 20.58% | |||
| Average context switch rate, | 5276.59/sec | |||
| Number of kproc overflows seen, | 0 | |||
| Disk device w/highest peak, | hdisk0 | |||
| Avg pct busy for that disk, | 26.1% | |||
| Peak pct busy for that disk, | 74.0% | 01:20:01 | 02:20:06 | 06/22/2005 |
| Avg I/Os blocked for pbuf, | 0.20/sec | |||
| Peak I/Os blocked for pbuf, | 6.69/sec | 01:00:00 | 01:20:01 | 06/22/2005 |
| Avg I/Os blocked for fsbuf, | 2.51/sec | |||
| Peak I/Os blocked for fsbuf, | 41.73/sec | 00:40:00 | 01:00:00 | 06/22/2005 |
| Approx CPU capacity remaining, | 65.1% | |||
| Approx I/O bandwidth remaining, | 1.4% | |||
| Can memory support add'l load, | Limited | |||