SarCheck(TM): Automated Analysis of HP-UX sar and ps data

(English text version 6.00.03)


NOTE: This software is scheduled to expire on 01/11/2005 and has not yet been tied to your system's Machine ID. To permanently activate SarCheck, please run /usr/local/bin/analyze9000 -o and send the output to us so that we can generate an activation key for you.

This is an analysis of the data contained in the file /tmp/rpt. There were 2 days of data collected from 11/29/2004 to 11/30/2004, from the HP9000/785/C360 system 'hippie'. There were 200 data records used to produce this analysis. The operating system used to produce the sar report was HP-UX Release B.11.00. 1 processor is present. 64 megabytes of memory are present.

Data collected by the ps -elf command during 2 days between 11/29/2004 and 11/30/2004 will also be analyzed. This program will attempt to match the starting and ending times of the ps -elf data with those of the sar report file named /tmp/rpt.

Table of Contents

SUMMARY

When the data was collected, no CPU bottleneck could be detected. A memory bottleneck was seen. No significant I/O bottleneck was seen. A change to at least one tunable parameter has been recommended.

Limits to future growth have been noted in the Capacity Planning section.

At least one possible runaway process has been detected. See the Resource Analysis section for details.

RECOMMENDATIONS SECTION

All recommendations contained in this report are based solely on the conditions which were present when the performance data was collected. It is possible that conditions which were not present at that time may cause some of these recommendations to result in worse performance. To minimize this risk, analyze data from several different days, implement only regularly occurring recommendations, and implement them one at a time.

Additional memory may improve performance. If possible, borrow some memory for test purposes, and monitor system performance and resource utilization before and after its installation.

Change 'bufpages' from 0 to 2457. SarCheck has determined that this value should be a more efficient way of handling the size of the cache which has been effectively fixed at one size by a memory poor condition.

No disk recommendations have been made because no bottleneck was seen.

It may be possible to reduce memory utilization by reducing the parameter 'maxdsiz'. This parameter defines the maximum data segment size, and a smaller value will prevent users from taking up as much memory. The optimum value of this parameter is application dependent and experimentation is required.

Use the System Administration Manager (SAM) to change the values of tunable parameters. More information on the SAM utility and relinking the kernel is available in the System Administration Tasks manual.

RESOURCE ANALYSIS SECTION

Average CPU utilization was only 19.2 percent. This indicates that spare CPU capacity exists. If any performance problems were seen during the entire monitoring period, they were not caused by a lack of CPU power. User CPU as measured by the %usr column in the sar -u data averaged 18.62 percent and system CPU (%sys) averaged 0.56 percent. The sys/usr ratio averaged 0.03 : 1. CPU utilization peaked at 29 percent during multiple time intervals.

The CPU was waiting for I/O an average of 0.5 percent of the time. This statistic does not indicate the presence of an I/O bottleneck. The time that the system was waiting for I/O peaked at 20 percent from 09:00:00 to 09:10:01, on 11/30/2004.

Graph of CPU utilization

The CPU was idle (neither busy nor waiting for I/O) and had nothing to do an average of 80.3 percent of the time. If overall performance was good, this means that on average, the CPU was lightly loaded. If performance was generally unacceptable, the bottleneck may have been caused by remote file I/O which cannot be directly measured with sar and therefore cannot be considered by SarCheck.

The run queue had an average depth of 1.3 which indicates that processes were generally not bound by latent demand for CPU resources. The run queue was usually occupied, despite the lack of a significant run queue depth. This condition is usually seen when the number of CPU-intensive processes is low. It is likely that the performance of these processes is closely related to CPU speed.

The peak run queue occupancy seen was 100 percent from 11:30:00 to 17:00:01, on 11/29/2004. The following graph shows the both run queue length and occupancy. The occupancy is shown as %runocc/100, where a run queue occupied 100 percent of the time would be shown a vertical line reaching a height of 1.0.

Graph of run queue length

The syncer daemon used 0.03 percent of the CPU from 07:30:01 to 17:00:01. The syncer is responsible for writing data from the buffer cache to disk. It's activity indicates that it is not so active as to cause a problem.

This system's buffer cache is dynamic, meaning that its size is determined by the amount of free memory on the system. Buffer cache data indicates that increasing the size of dbc_max_pct would probably not be effective because memory pressure would prevent the buffer cache from growing much beyond the value specified by dbc_min_pct. Based on the current values of dbc_min_pct and dbc_max_pct, the buffer cache can range in size from 9.6 to 25.6 megabytes of memory. The actual size of the dynamic buffer cache ranged from 9.6 to 9.9 megabytes of memory.

The following graph shows that the actual size of the dynamic buffer cache did not stray far from the minimum allowed by the dbc_min_pct parameter. This is common on systems where the memory pressure prevents the buffer cache from growing. The average size of the dynamic buffer cache was 2465 pages.

Graph of dynamic buffer cache utilization

At least one indication of a memory shortage was seen in the following statistics: Data collected with ps -elf shows that the sched daemon used 67 seconds of CPU time. This indicates a memory shortage. Data collected with ps -elf shows that the vhand daemon used 54 seconds of CPU time. This indicates a possible memory shortage, which is confirmed by other statistics related to memory utilization. The swap out rate indicates that an intermittent memory bottleneck may have existed. This may result in inconsistent performance. The graph above illustrates the fact that the swap out rate peaked at 1.03 per second from 11:20:01 to 11:30:00, on 11/29/2004. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst during the period of peak swap out activity, then a performance bottleneck may be a memory shortage.

Graph of swap out rate

The minimum number of free pages of memory seen was 35. The value of lotsfree was 586 pages and the maximum value of gpgslim seen was 338 pages. The value of desfree was 146. If the minimum number of free pages drops below the value of desfree, the system should benefit from additional memory.

The following graph illustrates the fact that freemem dropped below the value of minfree, which is the threshold for deactivations to begin. Deactivations are a response to excessive memory pressure.

Graph of free memory

The fs_async flag is not set. This may result in reduced disk performance, but keeps filesystem data structures consistent in the event of a system crash. This option is currently in the state recommended for production systems. Since no disk I/O bottleneck was seen on this system, setting the fs_async flag would be unlikely to provide enough of an improvement to justify the additional risk.

The average context switching rate was 248.2 per second. This works out to an average of one context switch every 4.03 milliseconds. No recommendations have been made to the timeslice parameter because no problems were seen with the context switching rate.

No unusual configurable parameter values were seen in those parameters which relate to the process accounting system. The current values of acctsuspend and acctresume are unlikely to have an impact on system performance.

The inode cache did not overflow, but was completely full in 79.5 percent of the samples collected during the monitoring period. No inode table recommendation has been made because a change to the size of the table would not be helpful.

Graph of inode cache usage

The process and open file tables were less than 80.0 percent full. Peak table usage statistics (max used/table size) as reported by sar: Process table: 90/276. Open file table: 431/920.

Graph of open file table usage

The file table, controlled by the nfile parameter, was much larger than necessary. There is nothing to gain by reducing the size of this table, so no change to the parameter 'nfile' is recommended.

The average rate of System V semaphore calls was 0.001 per second. No problems have been seen, and no changes have been recommended for System V semaphore parameters. Note that SarCheck only checks these parameter's relationships to each other since semaphore usage data is not available. Algorithms used by SarCheck to check these relationships are available in the help text of SAM.

No System V message activity was seen. No problems have been seen, and no changes have been recommended for System V message parameters. Note that SarCheck only checks these parameter's relationships to each other since message usage data is not available. Algorithms used by SarCheck to check these relationships are available in the help text of SAM, and in the file /usr/include/sys/msg.h.

The ratio of exec to fork system calls was 0.82. This indicates that PATH variables are efficient.

One volume group was seen and the maxvgs parameter was set to 10. This leaves plenty of room for growth and no changes to maxvgs have been recommended.

The -dtoo switch has been used to format volume group statistics into the following table.

Volume Group Statistics
VG Name Current PVs Active PVs Current LVs Open LVs Total Size Free Space
/dev/vg00 1 1 9 9 4.00 GB 1.99 GB

The volume group /dev/vg00 contained 1 physical volume and 9 logical volumes. All of the logical volumes were open. The size of the group was 4.00 gigabytes, of which 50.24 percent was allocated and 49.76 percent was free.

The following graph shows the average percent busy and service time for 2 disks, not sorted with the -dbusy or -dserv switches. The lack of disk activity can be clearly seen in the %busy statistics for the disks.

The -dtoo switch has been used to format disk statistics into the following table.

Disk Device Statistics
Disk Device
(vol group)
Average Percent Busy Peak Percent Busy Queue Depth When Occupied Average Service Time
c0t6d0
(/dev/vg00)
0.70 22.28 0.8 17.0
c1t2d0 0.00 0.41 0.5 1220.1

The disk device c0t6d0 was busy an average of 0.70 percent of the time and had an average queue depth of 0.8 (when occupied). This indicates that the device is not a performance bottleneck. The average service time reported for this device and its accompanying disk subsystem was 17.0 milliseconds. This is somewhat slow for a modern disk drive, and the disappointing performance may be due to the disk or its controller. Service time is the delay between the time a request was sent to a device and the time that the device signaled completion of the request. The disk device c0t6d0 was reported by pvdisplay as being a 4.00 gigabyte disk. 2036 megabytes of space was reported as being free and 2056 megabytes have been allocated. This disk device was a part of volume group /dev/vg00 and contained 9 logical volumes. At least one logical volume occupied noncontiguous physical extents on the disk. The following paragraph will provide more details.

The logical volume /dev/vg00/lvol5 was located in more than one place on disk c0t6d0. If this logical volume is busy and it is not mirrored, performance will suffer because the disk's read/write heads are likely to travel back and forth in an inefficient manner. The gap between two places where the logical volume was located was 386 blocks in size. This was more than one third of the disk's total size and is a large gap. If /dev/vg00/lvol5 was an active logical volume, large gaps are likely to have been a contributing factor in the slow service time seen on disk volume c0t6d0.

The disk device c1t2d0 was busy an average of 0.00 percent of the time and had an average queue depth of 0.5 (when occupied). This indicates that the device is not a performance bottleneck. The average service time reported for this device and its accompanying disk subsystem was 1220.1 milliseconds. This is so slow that the sar statistics may be unreliable or the device may be something other than a conventional hard disk. Floppy disk drives on model 800 systems will cause this message to be printed. Due to the suspiciously slow average service time, statistics from this device will not be used to in capacity planning and comma-separated statistics.

At 12:00:00 on 11/30/2004 ps -elf data indicated that there were 91 processes present. This was the largest number of processes seen with ps -elf but it is not likely to be the absolute peak because the operating system does not store the true "high-water mark" for this statistic. There were an average of 74.3 processes present.

Graph of the number of processes present

The -ptoo switch has been used to format ps -elf data into the following table.

Interesting ps -elf data
Command User User ID Percent CPU Memory Growth Memory Use
/usr/bin/X11/X root 1569 30.64 -36.0 kb/hr 423 pages
1.652 mb

CPU usage seen in /usr/bin/X11/X, owned by root, pid 1569. Between 09:20:00 and 17:00:01, 8458 seconds of CPU time were used. CPU utilization by this process averaged 30.64 percent during that interval.

CAPACITY PLANNING SECTION

This section is designed to provide the user with a rudimentary linear capacity planning model and should be used for rough approximations only. These estimates assume that an increase in workload will affect the usage of all resources equally. These estimates should be used on days when the load is heaviest to determine approximately how much spare capacity remains at peak times.

Based on the limited data available in these sar reports, the system should be able to support a very limited increase in workload at peak times before the first resource bottleneck affects performance. See the following paragraphs for additional information.

Graph of remaining room for growth

The CPU can support an increase in workload of at least 100 percent at peak times. Since page outs and/or swapping were detected, an increase in workload should be accompanied by an increase in memory. The busiest disk can support a workload increase of at least 100 percent at peak times. For more information on peak CPU and disk utilization, refer to the Resource Analysis section of this report.

The process table, controlled by the parameter 'nproc', can support at least a 100 percent increase in the number of entries. The file table, controlled by the parameter 'nfile', can support approximately a 71 percent increase in the number of entries.

Please note: In no event can Aptitune Corporation be held responsible for any damages, including incidental or consequent damages, in connection with or arising out of the use or inability to use this software. All trademarks belong to their respective owners. This software licensed for the exclusive use of: test. This software must be activated by 01/11/2005 (mm/dd/yyyy). SC9000 Code version: 6.00.03. Serial number: 58483828.

This software is updated frequently. For information on the latest version, contact the party from whom SarCheck was originally purchased, or visit our web site.

NOTE: This software appears to be unregistered. Please register with us by printing the registration form using 'analyze9000 -o', filling it out, and sending it to us via snail mail, fax, or email.

(c) copyright 1995-2004 by Aptitune Corporation, Plaistow NH, USA, All Rights Reserved. http://www.sarcheck.com

Statistics for system, hippie
  Start of peak interval End of peak interval Date of peak interval
System model number is, 9000/785/C360     
Statistics collected from, 11/29/2004     
Statistics collected until, 11/30/2004     
Average CPU utilization, 19.2%     
Peak CPU utilization, 29% Multiple peaks Multiple peaks  
Average user CPU utilization, 18.6%     
Average sys CPU utilization, 0.6%     
Average waiting for I/O, 0.5%     
Average run queue depth, 1.3     
Peak run queue depth, 1.9 Multiple peaks Multiple peaks  
Average swap queue occupancy, 0.0%     
Average swap out rate, 0.87/sec     
Average cache read hit ratio, 98.8%     
Average cache write hit ratio, 59.0%     
Disk device w/highest peak, c0t6d0     
Avg pct busy for that disk, 0.70%     
Peak pct busy for that disk, 22.28% 09:00:00 09:10:01 11/30/2004
Avg number of processes seen by ps, 74.3      
Max number of processes seen by ps, 91 12:00:00   11/30/2004
Percent of process tbl used, 32.6%     
Process table overflows, No     
Percent of file table used, 46.8%     
File table overflows, No     
Inode cache pct of time full, 79.5%     
Inode cache overflows, No     
Approx CPU capacity remaining, 100%+     
Approx I/O bandwidth remaining, 100%+     
Remaining process tbl capacity, 100%+     
Remaining file table capacity, 70.8%     
Can memory support add'l load, No