SarCheck(TM): Automated Analysis of HP-UX sar and ps data

(English text version 5.00.00)


This is an analysis of the data contained in the file /tmp/rpt. The data was collected on 05/02/2002, from 08:00:00 to 17:00:00, from the HP9000/785/C360 system 'hippie'. There were 108 data records used to produce this analysis. The operating system used to produce the sar report was HP-UX Release B.11.00. 1 processor is present. 64 megabytes of memory are present. The creation of PNG graphs in directory /tmp has been requested.

Data collected by the ps -elf command on 05/02/2002 from 08:00:00 to 17:00:00, and stored in the file /usr/local/ps/20020502, will also be analyzed.

SUMMARY

When the data was collected, no CPU bottleneck could be detected. A memory bottleneck was seen. No significant I/O bottleneck was seen. A change to at least one tunable parameter has been recommended.

Some of the defaults used by SarCheck's rules have been overridden using the sarcheck_parms file. See the Custom Settings section of the report for more information.

RECOMMENDATIONS SECTION

All recommendations contained in this report are based solely on the conditions which were present when the performance data was collected. It is possible that conditions which were not present at that time may cause some of these recommendations to result in worse performance. To minimize this risk, analyze data from several different days, implement only regularly occurring recommendations, and implement them one at a time.

Additional memory may improve performance. If possible, borrow some memory for test purposes, and monitor system performance and resource utilization before and after its installation.

Change 'bufpages' from 0 to 3276. SarCheck has determined that this value should result in a more efficient use of the system's memory.

No disk recommendations have been made because no bottleneck was seen.

It may be possible to reduce memory utilization by reducing the parameter 'maxdsiz'. This parameter defines the maximum data segment size, and a smaller value will prevent users from taking up as much memory. The optimum value of this parameter is application dependent and experimentation is required.

Use the System Administration Manager (SAM) to change the values of tunable parameters. More information on the SAM utility and relinking the kernel is available in the System Administration Tasks manual.

RESOURCE ANALYSIS SECTION

Average CPU utilization was only 0.2 percent. This indicates that spare CPU capacity exists. If any performance problems were seen during the entire monitoring period, they were not caused by a lack of CPU power.

The CPU was waiting for I/O an average of 1.0 percent of the time. This statistic does not indicate the presence of an I/O bottleneck. The time that the system was waiting for I/O peaked at 20 percent from 08:10:01 to 08:15:02.

Graph of CPU utilization

The CPU was idle (neither busy nor waiting for I/O) and had nothing to do an average of 98.8 percent of the time. If overall performance was good, this means that on average, the CPU was lightly loaded. If performance was generally unacceptable, the bottleneck may have been caused by remote file I/O which cannot be directly measured with sar and therefore cannot be considered by SarCheck.

The run queue had an average depth of 1.0 which indicates that processes were generally not bound by latent demand for CPU resources.

The peak run queue occupancy seen was 100 percent from 11:20:00 to 11:25:00. The following graph shows the both run queue length and occupancy. The occupancy is shown as %runocc/100, where a run queue occupied 100 percent of the time would be shown a vertical line reaching a height of 1.0.

Graph of run queue length

The syncer daemon used 0.009 percent of the CPU from 08:00:00 to 17:00:00. The syncer is responsible for writing data from the buffer cache to disk. It's activity indicates that it is not so active as to cause a problem.

This system's buffer cache is dynamic, meaning that its size is determined by the amount of free memory on the system. Buffer cache data indicates that increasing the size of dbc_max_pct would probably not be effective because memory pressure would prevent the buffer cache from growing much beyond the value specified by dbc_min_pct. Based on the current values of dbc_min_pct and dbc_max_pct, the buffer cache can range in size from 12.8 to 25.6 megabytes of memory.

A graph of buffer cache utilization was not generated because the size of the buffer cache hovered closely around the value of dbc_min_pct and the graph would show a straight line. This typically means that memory pressure was preventing the dynamic buffer cache from changing its size dynamically.

At least one indication of a memory shortage was seen in the following statistics: Data collected with ps -elf shows that the sched daemon used 5 seconds of CPU time. This indicates a possible memory shortage. Data collected with ps -elf shows that the vhand daemon used 10 seconds of CPU time. This indicates a possible memory shortage, which is confirmed by other statistics related to memory utilization. The swap out rate peaked at 1.07 per second from 10:40:00 to 10:45:01. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst during the period of peak swap out activity, then a performance bottleneck may be a memory shortage.

Graph of swap out rate

The minimum number of free pages of memory seen was 79. The value of lotsfree was 535 pages and the maximum value of gpgslim seen was 308 pages. The value of desfree was 133. If the minimum number of free pages drops below the value of desfree, the system should benefit from additional memory.

The following graph illustrates the fact that freemem occasionally dipped below the value of desfree, indicating that at times, the system could benefit from more memory. The fact that gpgslim was not constant is an indication that the system could benefit from more memory.

Graph of free memory

The fs_async flag is not set. This may result in reduced disk performance, but keeps filesystem data structures consistent in the event of a system crash. This option is currently in the state recommended for production systems. Since no disk I/O bottleneck was seen on this system, setting the fs_async flag would be unlikely to provide enough of an improvement to justify the additional risk.

No unusual configurable parameter values were seen in those parameters which relate to the process accounting system. The current values of acctsuspend and acctresume are unlikely to have an impact on system performance.

The inode cache did not overflow, but was completely full in 0.9 percent of the samples collected during the monitoring period. With UNIX operating systems such as HP-UX which use the inode table as a cache, this indicates that the inode cache may actually be somewhat larger than necessary.

Graph of inode cache usage

The process and open file tables were less than 80.0 percent full. Peak table usage statistics (max used/table size) as reported by sar: Process table: 72/276. Open file table: 290/920.

The file table, controlled by the nfile parameter, was much larger than necessary. There is nothing to gain by reducing the size of this table, so no change to the parameter 'nfile' is recommended.

No System V semaphore activity was seen. No problems have been seen, and no changes have been recommended for System V semaphore parameters. Note that SarCheck only checks these parameter's relationships to each other since semaphore usage data is not available. Algorithms used by SarCheck to check these relationships are available in the help text of SAM.

No System V message activity was seen. No problems have been seen, and no changes have been recommended for System V message parameters. Note that SarCheck only checks these parameter's relationships to each other since message usage data is not available. Algorithms used by SarCheck to check these relationships are available in the help text of SAM, and in the file /usr/include/sys/msg.h.

The ratio of exec to fork system calls was 0.97. This indicates that PATH variables are efficient.

The following graph shows the average percent busy and service time for 1 disk, not sorted with the -dbusy or -dserv switches. The lack of disk activity can be clearly seen in the %busy statistics for the disks.

The -dtoo switch has been used to format disk statistics into the following table.

Disk Device Statistics
Disk Device Average Percent Busy Peak Percent Busy Queue Depth When Occupied Average Service Time
c0t6d0 1.31 22.1 3.2 16.9

The disk device c0t6d0 was busy an average of 1.31 percent of the time and had an average queue depth of 3.2 (when occupied). This usage pattern is typical of that generated by sync activity. Sync activity refers to efforts made by the sync process to transfer data from the system buffer cache to disk. The average service time reported for this device and its accompanying disk subsystem was 16.9 milliseconds. This is somewhat slow for a modern disk drive, and the disappointing performance may be due to the disk or its controller. Service time is the delay between the time a request was sent to a device and the time that the device signaled completion of the request.

At 08:15:02 ps -elf data indicated that there were 69 processes present. This was the largest number of processes seen with ps -elf but it is not likely to be the absolute peak because the operating system does not store the true "high-water mark" for this statistic.

Graph of the number of processes present

No runaway processes, memory leaks, or suspiciously large processes were detected in the data contained in file /usr/local/ps/20020502. No table was generated because no unusual resource utilization was seen in the ps -elf data.

CAPACITY PLANNING SECTION

This section is designed to provide the user with a rudimentary linear capacity planning model and should be used for rough approximations only. These estimates assume that an increase in workload will affect the usage of all resources equally. These estimates should be used on days when the load is heaviest to determine approximately how much spare capacity remains at peak times.

WARNING: Data in this section may be inaccurate because the length of the average sampling interval was only 5.00 minutes. When the interval is less than 10 minutes, peak statistics are likely to underestimate the remaining amount of CPU or disk capacity.

Based on the limited data available in this single sar report, the system should be able to support a very limited increase in workload at peak times before the first resource bottleneck affects performance. See the following paragraphs for additional information.

Graph of remaining room for growth

The CPU can support an increase in workload of at least 100 percent at peak times. Since page outs and/or swapping were detected, an increase in workload should be accompanied by an increase in memory. The busiest disk can support a workload increase of at least 100 percent at peak times. For more information on peak CPU and disk utilization, refer to the Resource Analysis section of this report.

All system tables measured by sar -v can hold at least twice as many entries as were seen.

CUSTOM SETTINGS SECTION

The default MAXCPU threshold was changed in the sarcheck_parms file from 95.0 to 88.0 percent.

The gnuplot graph directory specified in the sarcheck_parms file with the GRAPHDIR keyword was /tmp.

ERROR: An attempt to change the AVGRQ threshold failed because the new value was out of bounds. An attempt to change the MAXRQ threshold failed because the new value was out of bounds.

Please note: In no event can Aptitune Corporation be held responsible for any damages, including incidental or consequent damages, in connection with or arising out of the use or inability to use this software. All trademarks belong to their respective owners. Evaluation copy for: Your Company. This software expires on 06/05/2002 (mm/dd/yyyy). SC9000 Code version: 5.00.00. Serial number: 00055555.

Thank you for trying this evaluation copy of SarCheck. To order a licensed version of this software, just type 'analyze9000 -o' at the prompt to produce the order form, and follow the instructions.

(c) copyright 1995-2002 by Aptitune Corporation, Plaistow NH, USA, All Rights Reserved. http://www.sarcheck.com

Statistics for system, hippie
  Start of peak interval End of peak interval Date of peak interval
System model number is, 9000/785/C360     
Statistics collected on, 05/02/2002     
Average CPU utilization, 0.2%     
Peak CPU utilization, 4% Multiple peaks Multiple peaks  
Average user CPU utilization, 0.2%     
Average sys CPU utilization, 0.0%     
Average waiting for I/O, 1.0%     
Average run queue depth, 1.0     
Peak run queue depth, 1.3 10:35:01 10:40:00 05/02/2002
Average swap queue occupancy, 0.1%     
Average swap out rate, 0.44/sec     
Average cache read hit ratio, 98.5%     
Average cache write hit ratio, 64.7%     
Disk device w/highest peak, c0t6d0     
Avg pct busy for that disk, 1.31%     
Peak pct busy for that disk, 22.13% 08:10:01 08:15:02 05/02/2002
Max number of processes seen by ps, 69 08:15:02   05/02/2002
Percent of process tbl used, 26.1%     
Process table overflows, No     
Percent of file table used, 31.5%     
File table overflows, No     
Inode cache pct of time full, 0.9%     
Inode cache overflows, No     
Approx CPU capacity remaining, 100%+     
Approx I/O bandwidth remaining, 100%+     
Remaining process tbl capacity, 100%+     
Remaining file table capacity, 100%+     
Can memory support add'l load, No