SarCheck(TM): Automated Analysis of SCO OpenServer sar and ps data

(English text version 3.50)


This is an analysis of the data contained in the file sar10. The data was collected on 01/10/1997, from 08:00:01 to 16:00:01, from system 'scosysv'. There were 7 sar data records used to produce this analysis. Operating system is SCO OpenServer Release 3.2v5.0.0. 1 processor is present. 16 megabytes of memory are present.

SUMMARY

When the data was collected, no CPU bottleneck could be detected. At least one disk drive was busy enough to cause performance degradation. A change has been recommended to at least one tunable parameter. Limits to future growth have been noted in the Capacity Planning section.

RECOMMENDATIONS SECTION

All recommendations contained in this report are based solely on the conditions which were present when the performance data was collected. It is possible that conditions which were not present at that time may cause some of these recommendations to result in worse performance. To minimize this risk, analyze data from several different days, implement only regularly occurring recommendations, and implement them one at a time.

A CPU upgrade is not recommended because the current CPU had significant unused capacity.

Consider balancing the load on disk devices by moving some of the I/O from Sdsk-0 to Sdsk-1 which was only 0.9 percent busy. Please note that available disk space statistics are not present in the sar -d report, and therefore have not been considered in these recommendations.

Increase NBUF from 2800 to 3360. This change will use an additional 613760 bytes of memory. The parameter NBUF can be changed by running the configure(ADM) utility and going to category 1.

Analysis of the freemem data indicates that as an alternative buffer enlargement strategy, the value of NBUF can be safely raised as high as 6970 without creating a memory-poor environment. This change will use an additional 4270080 bytes of memory. We recommend that you avoid making huge, sudden increases in NBUF. In some cases, large increases may create a kernel which will not boot successfully, and could lock up when initializing kernel buffers.

The HT namei cache hit ratio was only 87.2 percent. Change the value of HTCACHEENTS from 665 to 731. This change will use an additional 2904 bytes of memory. The parameter HTCACHEENTS can be changed by running the configure(ADM) utility and going to category 4.

Change the value of NMPBUF from 36 to 52. This parameter is used to set the number of 4KB pages of memory reserved for scatter-gather, transfer, and copy request buffers. This change will use an additional 65536 bytes of memory. The parameter NMPBUF can be changed by running the configure(ADM) utility and going to category 1.

The implementation of all recommended changes will use an additional 682200 bytes of memory. This total does not include the more aggressive alternate NBUF sizing recommendation. We do not recommend implementing all of these changes at once.

More information on the configure utility is available in the OpenServer Performance Guide. Once you use configure to change parameters, you should relink the kernel and reboot the system in order to implement those changes. See the OpenServer Performance Guide for more information.

RESOURCE ANALYSIS SECTION

Average CPU utilization was only 22.4 percent. This indicates that spare capacity exists within the CPU. If any performance problems were seen during the monitoring period, they were not caused by a lack of CPU power. CPU utilization peaked at 30 percent from 08:00:01 to 08:20:01.

The run queue had an average depth of 1.1. This indicates that there was not likely to be a performance problem caused by processes waiting for the CPU.

The CPU was idle (neither busy nor waiting for I/O) and apparently had nothing to do an average of 75.3 percent of the time. If overall performance was good, this means that on average, the CPU was lightly loaded. If performance was generally unacceptable, the bottleneck may have been caused by remote file I/O which cannot be directly measured with sar and cannot be considered by SarCheck.

The average cache hit ratio of logical reads was only 73.9 percent. It may be possible to improve performance and reduce disk I/O by increasing NBUF, the number of buffers.

In the event of a system crash, an average of 20 seconds worth of data will be lost because it will not have been written to disk. This is controlled by the NAUTOUP and BDFLUSHR parameters. This statistic has been calculated using the formula: NAUTOUP + (BDFLUSHR / 2).

The ratio of exec to fork system calls was 1.03. This indicates that PATH variables are efficient.

The HT namei cache hit ratio was only 87.2 percent. HTCACHEENTS should be increased until the percent of hits averages 90 percent or above. Note that namei caching is only performed when the length of directory and filenames are 14 characters or less. For best performance, pathname components should be less than 15 characters long.

No DTFS filesystem activity was seen during the monitoring period.

No evidence of a memory shortage was seen in the following statistics: The swap queue was occupied an average of 0 percent of the time. The average rate at which the page stealer and/or swapper daemons have reclaimed pages of memory was 0.7 per second. The average swap out transfer request rate was 0.0 per second.

Some of the swap area was used during the monitoring period. Together with the information in the previous paragraph, this indicates that the system is neither memory-rich, nor memory-poor.

The average number of free pages reported by sar was significantly higher than the value of GPGSHI, confirming that the system is not chronically short of memory.

Non-zero values were seen in the ompb/s column of the sar -h report, indicating that scatter-gather buffers were not available and processes may have been temporarily put to sleep. The recommended change to NMPBUF is designed to fix this problem.

The value of MAXUP is 100 and the maximum grown size of the process table as reported by sar was 71. There is no reason to change the value of MAXUP or MAX_PROC based on this data. It's alright for the value of MAXUP to be greater than the grown size of the process table because the table will grow dynamically.

The device Sdsk-0 was busy an average of 68.9 percent of the time and had an average queue depth of 2.0 (when occupied). This indicates that the device was likely to be a performance bottleneck. During the peak interval from 08:00:01 to 08:20:01, the disk was 84.8 percent busy. Peak disk busy statistics can be used to help understand performance problems. If performance was worst when the disk was busiest, then a performance bottleneck may be that disk. The average service time reported for this device and its accompanying disk subsystem was 16.2 milliseconds. This is somewhat slow for a modern disk drive, and the disappointing performance may be due to the disk, the location of multiple filesystems on the disk, or the disk controller. Service time is the delay between the time a request was sent to a device and the time that the device signaled completion of the request.

The device Sdsk-1 was busy an average of 0.9 percent of the time and had an average queue depth of 2.0 (when occupied). This indicates that the device is not a performance bottleneck. The average service time reported for this device and its accompanying disk subsystem was 12.9 milliseconds. This service time is good.

More information on performance analysis and tuning can be found in the SCO OpenServer(TM) Performance Guide, which is part of the SCO OpenServer Release 5 documentation set. The SarCheck reference guide contains a bibliography of relevant performance documentation.

CAPACITY PLANNING SECTION

The section is designed to provide the user with a rudimentary linear capacity planning model and should be used for rough approximations only. These estimates assume that an increase in workload will affect the usage of all resources equally. These estimates should be used on days when the load is heaviest to determine approximately how much spare capacity remains at peak times.

Based on the limited data available in this single sar report, the system cannot support an increase in workload at peak times without some loss of performance or reliability, and the bottleneck is likely to be disk I/O. Implementation of some of the suggestions in the recommendations section may help to increase the system's capacity.

The CPU can support an increase in workload of at least 100 percent at peak times. Because some swap space was used and significant paging or swapping statistics were not seen, the amount of memory present can probably handle a moderate increase in workload. The busiest disk can support a workload increase of approximately 0 percent at peak times. For more information on peak CPU and disk utilization, refer to the Resource Analysis section of this report.

Please note: In no event can Aurora Software Inc. be held responsible for any damages, including incidental or consequent damages, in connection with or arising out of the use or inability to use this software. All trademarks belong to their respective owners. Evaluation copy for: Your Company. This software expires on 02/28/1997 (mm/dd/yyyy). Code version: 3.50. Serial number: 00028784.

Thank you for trying this evaluation copy of SarCheck. To order a licensed version of this software, just type 'analyze -o' at the prompt to produce the order form, and follow the instructions.

(c) copyright 1994-1996 by Aurora Software Inc., Plaistow NH, USA, All Rights Reserved.