The SarCheck utility analyzes your system for possible performance bottlenecks such as memory shortages and 'leaks', disk load imbalances, CPU bottlenecks, runaway processes, and improperly set tunable parameters. It also determines approximately how much of an additional workload the system can support at peak times of the day. It does this by analyzing a user-specified sar report, the output of ps -elf, and scanning various files and kernel structures, producing a plain English report which explains the resource bottlenecks seen and makes recommendations which can improve system performance.
Important: SarCheck's recommendations are designed to produce incremental improvements, so SarCheck works best when it's run regularly. Unlike some other performance tools, no attempt is made to guess the ultimately correct value for any parameter. That just doesn't work. Instead, SarCheck will recommend that you increase or decrease values based on the data available, and will continue to recommend changes until there is no more room for improvement. According to the SCO OpenServer Performance Guide and most experts, performance tuning is, by definition, an iterative process involving trial and error. SarCheck will not only help you to make those iterative changes, but will also explain the reasons for each recommendation.
Another difference between SarCheck and other tuning tools is that SarCheck will not recommend changes unless it's trying to fix a specific problem. Other tools will recommend small irrelevant changes to tunable parameters regardless of whether any true benefit will be provided. This is a numbers game that we don't play. Some customers are impressed by demo software that recommends a lot of changes. Our philosophy is that you don't want to fix problems that don't exist, you don't want to reboot your system any more than you have to, and you don't want a tuning tool to recommend anything unnecessary. If you disagree with this, please feel free to call and enlighten us.
The final philosophical difference between SarCheck and other performance tools is that SarCheck does not monitor system activity. It leaves that job to the sar and ps utilities, and analyzes the data contained in their output. Since sar and ps are included with the operating system, we didn't see a need to create yet another monitor for you to buy.
SarCheck can be run from its menu or from the command line. For security reasons, your account must have permission to read the sar and ps data or report files that you wish to analyze.
Based on its analysis of the resources and statistics described above, SarCheck may recommend a variety of steps which can be taken to improve system performance.
This version of SarCheck version 3 is designed to work with SCO UnixWare 7. Other versions are available HP-UX, SCO OpenServer 5, Solaris SPARC, and the older SCO UNIX 3.2v4.
Log in as root, put the 3.5" diskette in the drive, and use the following command:
tar xvf
This will install the following files on your system:
/usr/local/bin/analyze
/usr/local/bin/sarcheck
/usr/local/bin/ps1
/usr/local/bin/ps2
/usr/local/etc/analyze.txt
/usr/local/etc/analyze.key
/usr/bin/analyze
/usr/bin/sarcheck
Any existing files with these names will be overwritten.
The installation of SarCheck does not require rebuilding the kernel. This is important because it means that SarCheck will not increase the size of your kernel and you don't have to worry about incompatibilities between SarCheck and third party drivers.
One of the most powerful features in Version 3 is its ability to analyze data collected by ps -elf and add this information to its analysis of sar data. A few simple steps are required to take advantage of this powerful new feature:
As an alternative, the following cron entries are oriented towards the 24x7 monitoring that many administrators prefer:
0,20,40 * * * * /usr/local/bin/ps1
5 23 * * * /usr/local/bin/ps2
As a rule, we ship this software with the activation key enabled for a short period. If you have bought this software from a distributor or reseller, the software may require activation. If this software prompts you for an activation key, call us at +1-603-382-4200, or fax us at +1-603-382-4247. Please note that because SarCheck uses sar to collect its data, your system can collect the data for later analysis, whether or not SarCheck has been activated.
To reactivate the software or change the expiration date, we will need the SarCheck serial number and the Operating System serial number, which can be found by typing:
analyze -s
Remove the following files from your system:
/usr/local/bin/analyze
/usr/local/bin/sarcheck
/usr/local/etc/analyze.txt
/usr/local/etc/analyze.key
/usr/local/bin/ps1
/usr/local/bin/ps2
/usr/bin/analyze
/usr/bin/sarcheck
and if present, /usr/local/etc/analyze.dlr.
Note: Because SarCheck is not linked into your system's kernel, you will not lose any kernel changes if you deinstall SarCheck.
Note: The sarcheck command is actually a shell script that executes the binary analyze. There are a number of settings within this script that can be edited by those proficient in shell scripting and editing.
To analyze sar statistics from a menu, log in as root and type:
sarcheck
The message sarcheck: not found usually indicates that /usr/local/bin is not in your PATH and you've removed the symbolic link located in /usr/bin/sarcheck.
A series of choices will appear on the screen. If you accept all the defaults by pressing the Enter/Return key, the previous day's sar data will be analyzed, and this is the easiest way to get started. For security reasons, your account must have permission to access the sar/ps data or report files that you wish to analyze. We recommend that you run SarCheck as root in order to assure that you have access to sar data.
The first prompt will ask whether you want to analyze sar data or a sar report, or exit sarcheck. Sar data is usually found in /usr/adm/sa/sann, where nn is the day of the month. Sar reports are already reduced into a readable form and are usually found in /usr/adm/sa/sarnn. Please note that unless you enable sar, the data and reports will not exist.
Analyzing the reports will be marginally faster than analyzing the data, but an advantage to analyzing the data is that you can choose the start and end times for analysis by editing the sarcheck script. If you use the d option but attempt to analyze data, you may encounter a bug in sar which causes a core dump. If you want to analyze a report, be sure to respond with an 'r'.
Analyze what? d sar data
r a sar report
* Accept remaining defaults
x exit sarcheck
(default = d): _
After you pick the d or r options, you will be prompted to enter the name of the data or report file. In either case the default will be the statistics from the previous business day. The sarcheck script will change your working directory to /usr/adm/sa, so you do not have to use the absolute address of the file. If you wish to exit sarcheck, enter an x.
Sar data is usually found in /usr/adm/sa/sann. Based on user-definable defaults, data from 08:00 to 17:01 will be analyzed. Enter the name of the sar data file that you wish to analyze.
Available data files in /usr/adm/sa: sa20 sa21 sa25 sa26 sa27 sa28 sa29 sa30 (default = sa29):
Note that if you run sarcheck on a Saturday, Sunday or Monday, Friday's statistics will be analyzed by default. This is because weekend statistics are frequently not representative of a "loaded" system, and there is a possibility that misleading recommendations would be generated.
You can override the option to exclude the analysis of weekend data by editing the sarcheck script and changing the value of SKIPWKND from Y to N. If you want to analyze a 24 hour day instead of the default 08:00 to 17:01, change the value of ST to 00:00 and change the value of EN to 23:01.
The next option allows you to pick formatting. The default will produce a report with page numbers and page breaks (ctrl-L) included. For users that prefer to paginate the report with another tool, such as pg, the p option will suppress these page breaks. You can also choose to produce an HTML document at this point, and can decide whether to format the disk analysis in table form. HTML documents are best viewed with a web browser. If you wish to exit sarcheck, enter an x.
Pick formatting: n Normal, with page breaks
p Page breaks suppressed
h Create HTML document, no disk table
t Create HTML document, with disk table
o Create HTML document, disk table only
* Accept remaining defaults
x exit sarcheck
(default = n): _
The verbosity option controls how verbose the SarCheck report is. The default verbose mode may produce a report 4 pages long, while Superquiet mode may only contain 5 lines of text and will automatically suppress page breaks. Please note that instructions for implementing recommendations, explanations, and alternate tuning strategies may be suppressed by the quiet modes. When you're first using SarCheck, we recommend using the verbose mode so that you don't miss anything. If you wish to exit sarcheck, enter an x.
Verbosity level: v Verbose mode
q Quiet, most verbiage suppressed
Q Superquiet, all verbiage Suppressed
* Accept remaining defaults
x exit sarcheck
(default = v): _
Analysis of ps -elf data will provide you with a closer look at memory bottlenecks and the ability to detect runaway processes and memory leaks. The verbose analysis of ps -elf data will increase SarCheck's sensitivity to potential runaway processes and memory leaks, sometimes to the point of generating "false alarms". See the section entitled "Setup of ps -elf Data Collection" for more information. If you wish to exit sarcheck, enter an x.
Analyze ps -elf n No, analyze sar data only
data? y Yes, analyze sar and ps data
e Enhanced sensitivity of ps data analysis
* Accept remaining defaults
x exit sarcheck
(default = y): _
This option is used to decide where to send the analysis. Note that some of these choices will be different based on the pager you use and any modifications made to the sarcheck script.
The tabular summary option is used to print a summary of statistics in table form at the end of the report, and an example is included later in the manual. If HTML output has been selected, an HTML table is created. This option is useful for transferring statistics to a spreadsheet or graphics program, producing output which can be easily parsed by other programs. and for generating an easy to read table at the end of an HTML page.
Tabular Summary? y Print a tabular summary at the
end of the report
i Print a tabular summary instead of
the report
n Print the report without a summary
* Accept remaining defaults
x exit sarcheck
(default = n):
If you choose to send the output to a file, you'll be prompted for the name of the file. The default file name is /tmp/yyyymmddhhmmss, which is a date/time stamp. You can modify the default by editing the sarcheck script. If you wish to exit sarcheck, enter an x.
Send output to: 1 more (the screen)
2 lp -s (a printer)
3 A file
(default = 1): _
For security reasons, your account must have permission to access the sar data or report files that you wish to analyze, and SarCheck may need to interrogate the kernel. We recommend that you run SarCheck as root in order to assure that you have access to the necessary data. To analyze a sar report file called sar12, type:
analyze sar12
The message analyze: not found usually indicates that /usr/local/bin is not in your PATH and you've removed the symbolic link located in /usr/bin/analyze.
For best results, pipe the output to more so that you can read it, or redirect it to a file if you want to save it. A report will be produced which contains information about your system, a brief summary, a recommendations section (if applicable), and a resource analysis section. For more information, see the section entitled "How to interpret the analysis".
For users that prefer to paginate the report with another utility such as pg, the -p option will suppress page numbers and page breaks. To take advantage of this option, type:
analyze -p sar12|pg
SarCheck can be run automatically by adding an entry to root's crontab file. Determine the time that /usr/lib/sa/sa2 is run, and use cron to run /usr/bin/analyze after that time. Here are two examples which assume sa2 is run at 18:00, the analysis will be done at 18:30, and the entire crontab entry will really be on one line:
In order to print a SarCheck analysis every weeknight, use the following entry:
30 18 * * 1-5 /usr/local/bin/analyze /usr/adm/sa/sar`date +\%d` | lp -s
To keep all of SarCheck's recommendations in the /usr/ops directory, use the following entry:
30 18 * * 1-5 /usr/local/bin/analyze /usr/adm/sa/sar`date +\%d` > /usr/ops/`date +\%y\%m\%d`
Because the output of the analyze program is stdout, you can pipe or redirect it in lots of ways. It can be printed, mailed, stored... whatever works best in your environment.
Another important new feature in SarCheck Version 3 is the ability to analyze multiple days of data at once. SarCheck will analyze any number of concatenated sar reports. The limitation is that these reports must actually exist. Here is an example of how to analyze all sar reports from the first seven days of the month.
Please note that the analyze program does not work if wildcard characters are used as a filename. Wildcard characters should be used with the cat command in order to produce a single file for the analyze program. Once you've become used to working with concatenated sar reports, you'll probably discover that the find command in /usr/lib/sa/sa2 removes the sar reports too quickly, and the naming convention used in /usr/lib/sa/sa1 is too restrictive. Make copies of the sa1 and sa2 scripts, and then modify them to meet your needs.
Please note that SarCheck will not analyze ps -elf data when it analyzes multiple days of sar report which have been concatenated.
| -c | Turn off the capacity planning section. |
| -d | Print info on all disks if more than 12 disk drives were seen in sar. Because the SarCheck report will produce a paragraph on each disk, reports may get too verbose on systems a large number of disk devices. Without this option, SarCheck will "filter out" information on disks which are lightly used. |
| -dbusy | If the -dtbl switch is used, -dbusy will sort the disk information by average percent busy. |
| -dserv | If the -dtbl switch is used, -dserv will sort the disk information by average service time. |
| -dtbl | If the -html switch is used, -dtbl will produce a table of disk statistics instead of generating a paragraph on each disk. Cells in the table will be color coded to highlight interesting disk statistics. This option is recommended for large systems where a large number of individual paragraphs on disk activity would be hard to comprehend. |
| -dtoo | If the -html switch is used, -dtoo will If the -html switch is used, -dtbl will produce a table of disk statistics in addition to generating a paragraph on each disk. Cells in the table will be color coded to highlight interesting disk statistics and will link to the appropriate paragraph. |
| -h | Displays help text (how to use SarCheck). |
| -hm | Instructions for analyzing multiple days of sar data. |
| -hp | Instructions for analyzing supplemental ps -elf data. |
| -html | Produce a report formatted with HTML tags. See the section "HTML output" for more information. |
| -k | Allows you to change the activation key and expiration date |
| -o | Prints an order form for those wishing to purchase a license. |
| -p | Suppress page numbering & page breaks. |
| -ps | Incorporate the analysis of ps -elf file /usr/local/ps/yyyymmdd where the date is extracted from the sar data. |
| -pf | Include analysis of a specified file containing ps -elf data |
| -pv | Verbose analysis of ps -elf data, overridden by -Q and -q |
| -plp | Suppress warnings about suspiciously large processes |
| -pml | Suppress warnings about possible memory leaks |
| -prp | Suppress warnings about possible runaway processes |
| -Q | Print the least verbose analysis, automatically sets -p. |
| -q | Print a less verbose analysis. |
| -r | Print an analysis only if recommendations are made. |
| -s | Display all the information needed to activate SarCheck. |
| -t | This option will produce a summary of interesting statistics in a tabular format. This output can be parsed with relative ease. If the -html switch is used, the statistics will be presented in an HTML table, and cells in the table will be color coded to highlight noteworthy statistics. This option works well with the -dtbl switch. |
| -tonly | This option will produce nothing but a summary of interesting statistics in a tabular format. All recommendations, analysis, and other hopefully interesting text will vanish. If the -html switch is used, the statistics will be presented in an HTML table, and cells in the table will be color coded to highlight noteworthy statistics. |
| -w | Suppress newlines for export to PC-based WP programs. |
To display the online help text, type:
analyze -h
This sends a subset of the instructions found in this manual to stdout, which defaults to the screen.
For other help text, type:
analyze -hp or analyze -hm
Note that the -hp option will produce more than 24 lines of text and may be easier to read if piped to more, less, pg, or whatever you prefer.
Important: Understand that you can inadvertently cause SarCheck to produce misleading or incorrect recommendations. SarCheck looks at an individual day of data reported by sar and ps, and that data should reflect the most active time of the day when performance is most important, and the most active days of the week or month.
An analysis of a non-peak load will produce recommendations that may be inappropriate. SarCheck will warn you about this if resource utilization appears to be unusually light. Choose the day that you wish to analyze by specifying the data or report files that represent the busiest days. Determine the busiest times of the day, and if necessary, modify the sarcheck script to analyze data from only that time period.
Important: Understand that you can inadvertently cause SarCheck to produce misleading or incorrect recommendations. SarCheck looks at an individual day of data reported by sar, and that data should reflect the most active time of the day when performance is most important, and the most active days of the week or month.
An analysis of a non-peak load will produce recommendations that may be inappropriate. SarCheck will warn you about this if resource utilization appears to be unusually light. For example, if your system is in use from 8:00AM to 5:00PM, and runs batch jobs and backups at night with plenty of time to spare, performance is probably most important during the day. To produce the most useful report for SarCheck, use a command such as:
sar -A -s8:00 -e17:01 > reportfile
This command will use all options (-A), and will only include data collected between 8:00AM (-s8) and 5:00PM (-e17:01) in the report. If the busiest day in the last few weeks was the 29th of the month, and you want to produce a report of system activity between 8:00 and 5:00 on that day, use the following command:
sar -A -s8:00 -e17:01 -f /usr/adm/sa/sa29 > report29
Note that 17:01 was used instead of 17:00. This is because the ending time must be later than the last data you want included in the report. To produce the same report for the entire day, just leave out the -s and -e switches as follows:
sar -A -f /usr/adm/sa/sa29 > report29
Of course, this will only work if /usr/adm/sa/sa29 actually exists. Sar data will not exist unless sar has been enabled with the sar_enable command.
Adding the -ps option will give you more information if the system is set up to record ps -elf data. See the section entitled "Setup of ps -elf data collection" for more information.
It's important to analyze reports from days when the processing load was greatest. It may be that on those days, SarCheck may find resource bottlenecks which did not exist on days when the system did less work.
If you analyze data from the weekend, SarCheck may tell you how to optimize the system for weekend processing. Whether that makes sense or not in your environment (and it frequently won't) is up to you. SarCheck will warn you if statistics from a given time period appear to show an idle or lightly used system, but you know best when your system's performance is most important. Like any other program which processes data, the rule remains: "Garbage in, garbage out".
At the beginning of the analysis, the name of the sar report file, the date, time, number of intervals, number of processors seen, amount of memory, and system name is printed for identification purposes. If the -ps option has been selected, the name of the ps -elf data file will be printed as well.
Important: When data from one system is analyzed on another, it is likely to result in incorrect or misleading recommendations. A warning message will appear if the name of the system on the sar report is different than the name of the system running SarCheck. This is because the contents of the messages, stune and mtune files are likely to be different on different systems.
A warning message will appear if impossible data is seen in the sar report. Examples would be CPU utilization of 313% or a swap queue occupancy of -88%. The type of sar data which contains the problem will be identified.
A message will be printed if 3 or more sar options have been chosen, but at least one is missing. These messages will also appear if you use SarCheck to analyze the output of sar -uqv, or any other combination of 3 or more options. While it can be useful to explicitly specify the options you're interested in when analyzing sar data manually, SarCheck is best run with all sar's options (-A) in use.
The Summary section will highlight any bottlenecks that were seen in the areas of CPU, memory, or I/O, will indicate if tunable parameters need to be changed, and will point out if limits to future growth were found by the capacity planning algorithms. If no bottlenecks are seen, the summary will say so, and point out that no recommendations will be made.
The Recommendations section is present only if SarCheck has recommendations to make. The recommendations in this section are based solely on the data contained in the sar and ps files, and the values of various tunable parameters, and should be taken in that context. For example, if batch jobs are run on Saturdays, and SarCheck analyzes statistics from that day, it may decide that an I/O bottleneck existed and spare memory was present, and an increase in buffer size may be appropriate. Following these recommendations may improve performance on Saturdays, but could hurt performance during the week by reducing the amount of memory available to users.
The changes to tunable parameters recommended by SarCheck are designed to cause slow, gradual improvement in order to prevent "surprises". Whenever it is reasonable to do so, changes to the value of many tunables are limited to about 20 or 30 percent. These gradual changes are designed to prevent any unanticipated side effects of major change in a tunable parameter. Whenever possible, SarCheck will calculate the amount of memory what will be used or saved if a recommendation is followed.
In some cases, significant changes will be recommended. For example, the value of NHBUF must be either a power of 2 or zero, therefore it cannot be changed gradually.
Due to the interrelationships between tunable parameters and system resources, sarcheck goes beyond the basic rules of thumb whenever possible. For example, the rule of thumb for the %rcache field in sar -b statistics is that the value should be greater than 90 percent. SarCheck looks at the average %rcache value, percent of time that %rcache is less than 90 percent, whether or not the system is memory poor, the speed and number of disks, any signs of disk bottlenecks, the number of processors, and the current values of BUFHWM and NHBUF (both the values which were specified using the configure utility and the ones which are actually in use) before it makes recommendations.
The Resource Analysis section translates the data contained in the sar report into English. Much of this data is provided for reference, and explanations are given where appropriate. The implications of various statistics regarding CPU utilization, buffer sizing, memory utilization, system table sizes, and disk I/O bandwidth are presented in this section.
The times when the various resources are most heavily used appear in this section. If these times correlate well with the times that performance degradation was reported, it can be inferred that exhaustion of these resources may be a cause of performance problems. Peak usage statistics are also used by the capacity planning section.
If the -ps option was selected, the results of SarCheck's search for runaway processes, memory leaks, and suspiciously large processes will be reported
The Capacity Planning section can be used to approximate the amount of capacity left on the system, based solely on the sar data being analyzed. CPU, memory, disk, and system table use statistics are examined in order to determine which resource is likely to become exhausted first.
This section is not meant to perform the same functions as the more expensive tools available for larger systems. It is designed to help meet the needs of system administrators, many of whom are managing growing systems and need to know how much "room" is left before various resources become exhausted.
The exhaustion of resources is defined as any single interval in which CPU usage exceeded 90 percent, and a disk was busy more than 75 percent of the time, swapping was detected, or the process table was more than 80 percent full. Because the interval with the greatest resource usage is used, the capacity planning report will be less accurate if peak resource use occurred during an interval of less than 10 minutes or if multiple sar reports are being analyzed at once.
Disclaimers, trademark information, etc. At the end of the report is a disclaimer, trademark and copyright information, your software serial number, code version, licensee, and if applicable, the software's expiration date.
To get the most benefit out of SarCheck, we recommend using it as follows:
Use the -o option of analyze to print an order form, ask your reseller or distributor, or contact us directly.
Please check the FAQ section at the end of this manual. If you need additional assistance:
Call us at 1-603-382-4200, fax us at 1-603-382-4247, write to us at PO Box 1033, Plaistow NH 03865, USA, use our email address: support@sarcheck.com, visit our web site at http://www.sarcheck.com, or contact the party from whom you purchased SarCheck.
This release contains between eight and ten files:
/usr/local/bin/analyze This is the program which performs the analysis.
/usr/local/bin/sarcheck This is the menu-driven front end script for 'analyze'.
/usr/local/bin/ps1 This is a script that collects ps -elf data, and is roughly analogous to the sa1 script used by sar.
/usr/local/bin/ps2 This is a script that cleans up ps -elf data, and is roughly analogous to a subset of the sa2 script used by sar.
/usr/local/etc/analyze.txt This file contains the text used to produce the analysis. In general, we recommend that you do not modify this file, because it may leave us unable to support the software. Users outside of the United States may modify the spelling of certain words in the file if they wish. For example, the word 'utilization' can be changed to 'utilisation'. If you would like a non-English version of SarCheck, please call us.
/usr/local/etc/analyze.key This file contains the software's expiration date (if applicable) and the activation key. Please do not edit or otherwise tamper with this file, as this may permanently damage the software. If you need an activation key, call us for instructions.
/usr/bin/analyze A symbolic link to /usr/local/bin/analyze.
/usr/bin/sarcheck A symbolic link to /usr/local/bin/sarcheck.
/usr/local/etc/analyze.dlr This optional file is present when you've received SarCheck from certain distributors or resellers. The purpose of this file is to provide you with that distributor's or reseller's phone number for sales and support information. (not implemented at time of this printing)
/usr/local/etc/scuwman.html This file is an HTML copy of this manual. We're currently rolling it into the product. If you'd like a copy and it wasn't included on the diskette, please contact us.
The following example was produced with the -w option, used to suppress page breaks and newlines. This option sounds pretty odd, but it's really useful when exporting SarCheck reports to a PC-based Word Processing program. The text of the SarCheck report is printed in Courier font, and the explanation follows.
SarCheck(TM): AUTOMATED ANALYSIS OF SCO UnixWare 7 sar and ps data (English text version 3.00)
The title line prints the version of /etc/analyze.txt file which was in use. Different versions can be used for languages other than English. If you are interested in a non-English version of SarCheck, please call us.
This is an analysis of the data contained in the file sar15. The data was collected on 01/15/2000, from 08:00:00 to 14:00:01, from system 'aurora'. There were 6 sar data records used to produce this analysis. Operating system is SCO UnixWare Release 7.1.0. 1 processor is present. 64 megabytes of memory are present.
Data collected by the ps -elf command on 01/15/2000 from 08:00:00 to 14:00:01, and stored in the file /usr/local/ps/20000115, will also be analyzed.
This introductory paragraph prints the name of the sar report which was analyzed, when the data was collected, the number of records contained in the sar report, and other information about the system environment. Information concerning additional ps -elf data will also be displayed if analysis of ps data has been requested.
When the data was collected, no CPU bottleneck could be detected. No significant I/O bottleneck was seen. A change has been recommended to at least one tunable parameter. Limits to future growth have been noted in the Capacity Planning section.
The summary lists any bottlenecks detected, and any problems which may impact the accuracy of the analysis. If SarCheck found no problems and was unable to make any recommendations, that fact would be mentioned here.
All recommendations contained in this report are based solely on the conditions which were present when the performance data was collected. It is possible that conditions which were not present at that time may cause some of these recommendations to result in worse performance. To minimize this risk, analyze data from several different days, implement only regularly occurring recommendations, and implement them one at a time.
The first paragraph of the recommendations section explains how to implement the recommendations. More information on this topic can be found in the "How to produce the most accurate analysis..." and "How to get the most from SarCheck" sections of this manual.
A CPU upgrade is not recommended because the current CPU had significant unused capacity.
An opinion on the need for a CPU upgrade is made regardless of the findings. This is one of the few cases in the Recommendations Section where SarCheck will mention a resource even if there is no need for a change. In addition to parameter tuning recommendations, SarCheck will suggest hardware upgrades where they are likely to help.
No disk recommendations have been made because no bottleneck was seen.
SarCheck will help you to use your existing hardware whenever possible. Disk balancing, additional disks, or faster disks are recommended here if needed.
Change the value of NHBUF from 64 to 1024. The parameter NHBUF can be changed by running the System Tuner tool in the System Folder of the SCO Admin tool.
Change the value of DNLCSIZE from 1350 to 1755. The %dnlc average may have been skewed by a large number of files being accessed for the first time in a while. Use of the find command can cause this to happen and the DNLCSIZE recommendation may be affected by that. The parameter DNLCSIZE can be changed by running idtune(ADM).
These are examples of parameter tuning recommendations. As a rule, SarCheck will recommend small, incremental changes to the system's tunables in order to produce gradual change.
In some cases, such as with the parameter NHBUF, the value of the parameter must be a power of 2. Not only are small changes to this parameter impossible, but large changes are safe.
More information on changing tunable parameters is available in the system tuner and various man pages.
The resource analysis section is the place where various aspects of resource utilization are discussed regardless of whether a problem was seen.
Average CPU utilization was only 28.7 percent. This indicates that spare capacity exists within the CPU. If any performance problems were seen during the monitoring period, they were not caused by a lack of CPU power. CPU utilization peaked at 31 percent from 13:00:00 to 14:00:01.
CPU utilization statistics from the sar -u report are analyzed here. In addition to average CPU utilization, occasionally heavy utilization and peak utilization is noted. The times of peak resource utilization are noted throughout this section and are provided to help you detect any correlation between peak resource utilization and peak performance degradation.
The run queue had an average depth of 1.0. The run queue was always occupied, despite the lack of a significant run queue depth. This condition is usually seen when the number of CPU-intensive processes is low. It is likely that the performance of these processes is closely related to CPU speed.
The run queue size indicates the average number of "ready to run" processes. The average length of this queue and the percent of time it was occupied are analyzed, and are used to confirm the presence of a CPU bottleneck.
The CPU was idle (neither busy nor waiting for I/O) and apparently had nothing to do an average of 69.8 percent of the time. If overall performance was good, this means that on average, the CPU was lightly loaded. If performance was generally unacceptable, the bottleneck may have been caused by remote file I/O which cannot be directly measured with sar and cannot be considered by SarCheck.
The 7.1.0. version of sar -u includes a %intr column. Version 3.00 of SarCheck will consider this to be "idle time". As more information on this column becomes available, we will add its analysis to SarCheck.
In cases where the system is frequently idle, the percentage of idle time is analyzed. This is an indication of the average amount of time the CPU was neither busy nor waiting for I/O. The percentage of time that the CPU was waiting for I/O may also be reported, and can be a useful way of confirming the presence of I/O bottlenecks. The 'waiting for I/O' statistic is easily skewed by tape or floppy disk access.
The average cache hit ratio of logical reads was 100.0 percent, and the average cache hit ratio of logical writes was 93.0 percent. The cache hit ratios of logical reads and writes are high enough to indicate that system buffer sizes do not need to be increased.
System buffer tuning, controlled primarily by the BUFHWM and NHBUF parameters, is very important when trying to achieve optimum performance. This is one of many areas where SarCheck explains the reasons for its recommendations.
In the event of a system crash, an average of 60 seconds worth of data will be lost because it will not have been written to disk. This is controlled by the NAUTOUP and FDFLUSHR parameters. This statistic has been calculated using the formula: NAUTOUP + (FDFLUSHR / 2).
SarCheck calculates the potential for losing data in a system crash. In cases where the amount of data at risk is unusually high, SarCheck will warn you and may recommend action.
The ratio of exec to fork system calls was 0.88. This indicates that PATH variables are efficient.
Wherever possible, SarCheck will hunt for clues about poor performance. The likelihood of inefficient PATH variables can be inferred from sar data, though no problem was seen in this example.
No evidence of a memory shortage was seen in the following statistics: The swap queue was occupied an average of 1 percent of the time. The average length of the swap queue was 0.0. The average validity fault rate (vflt/s), also known as the address translation page fault rate was 1.6 per second. The average rate at which virtual pages were placed on the freelist was 1.0 per second. The average rate at which virtual pages were scanned by the page stealing daemon was 7.9 per second. The average swap out transfer request rate was 0.0 per second.
Some of the swap area was used during the monitoring period. Together with the information in the previous paragraph, this indicates that the system is neither memory-rich, nor memory-poor.
As seen in the above paragraphs, a number of metrics are used to identify resource bottlenecks.
The directory name lookup cache hit ratio was only 86.3 percent. DNLCSIZE should be increased until the percent of hits averages 90 percent or above. The iget/s rate peaked at 1 from 13:00:00 to 14:00:01. The iget/s rate measures how many s5, ufs, vxfs, and sfs files were located by inode entry per second. This typically happens as a result of directory name lookup cache misses. During the period of peak iget activity, the directory name lookup cache hit rate was 86 percent. It is possible that an event occurred during this period which flushed the cache and may result in a DNLCSIZE recommendation which would not help performance. Heavy use of the find command is an example of the kind of activity which could flush the cache.
The directory name lookup cache should be properly tuned. SarCheck explains the tuning strategy used to achieve the desired result and provides other hints that may be relevant to your system environment.
The value of MAXUP is 80 and the size of the process table as reported by sar was 400. There is no reason to change the value of MAXUP or NPROC based on this data.
Here is a parameter that should be looked at differently in various SCO operating systems. MAXUP is not dynamic, but its value should be less than that of a the process table. In SCO's OpenServer operating system, the process table is dynamic, in SCO UNIX 3.2v4 it is fixed, and in UnixWare 7, it is "autotuned" by default. Autotuning sets the maximum allowable size for the process table, but does not attempt to find its optimum value.
The device sd011 was busy an average of 6.7 percent of the time and had an average queue depth of 1.6 (when occupied). This usage pattern is typical of that generated by sync activity. Sync activity refers to the bdflush daemon's efforts to transfer data from the system buffer cache to disk. The average service time reported for this device and its accompanying disk subsystem was 5.5 milliseconds. This is indicative of a very fast disk or a disk controller with cache. Service time is the delay between the time a request was sent to a device and the time that the device signaled completion of the request.
Disk activity is analyzed in depth. Peak and average busy time, queue depth, and service time are used to identify problems in disk load, load balancing, buffer sizing, and hardware recommendations. The most common SarCheck support question is "Why does SarCheck say that my fast new disk drives are slow?". This is usually due to simultaneous requests for I/O to different filesystems that are physically spaced far apart on the same disk, or to older controllers used with new disks. If increasing the size of the system buffer can help the problem, SarCheck will recommend it.
No runaway processes, memory leaks, or suspiciously large processes were detected in the data contained in file /usr/local/ps/20000115.
One of the most powerful new features in SarCheck version 3 is the ability to detect problems at the process level. SarCheck will monitor your system, and will warn you about any a number of problems.
More information on performance analysis and tuning can be found in the system tuner's help text. The SarCheck reference guide contains a bibliography of relevant performance documentation.
Unfortunately, SarCheck can't answer every question, so we help you to find the information you need. There's also a bibliography at the end of this manual.
The section is designed to provide the user with a rudimentary linear capacity planning model and should be used for rough approximations only. These estimates assume that an increase in workload will affect the usage of all resources equally. These estimates should be used on days when the load is heaviest to determine approximately how much spare capacity remains at peak times.
The Capacity Planning section can help you to understand how much additional load your system can support. This feature is not designed to replace the features found in mainframe-type capacity planning tools, but rather to give you an approximation of how much room for growth remains. Please note the disclaimer in the paragraph above.
Based on the data available in this single sar report, the system should be able to support a moderate increase in workload at peak times, and memory is likely to be the first resource bottleneck. See the following paragraphs for additional information.
This paragraph summarizes the amount of capacity remaining in your system during peak times. If all system resources monitored could support an increase in workload of at least 100 percent, the summary will say that no impending capacity limits were seen. If the first bottleneck is likely to occur in memory, the amount of capacity remaining will not be quantified. This is because the data required for that kind of complex memory modeling cannot be found in the sar report.
The CPU can support an increase in workload of at least 100 percent at peak times. Because some swap space was used and significant paging or swapping statistics were not seen, the amount of memory present can probably handle a moderate increase in workload. The busiest disk can support a workload increase of at least 100 percent at peak times. For more information on peak CPU and disk utilization, refer to the Resource Analysis section of this report.
All system tables measured by sar -v can hold at least twice as many entries as were seen.
The paragraphs above give a more detailed breakdown of remaining capacity. Again, please note that these numbers are approximate and will vary from day to day. After analyzing a number of sar reports, you will have a pretty good idea of how much capacity remains on your system.
Please note: In no event can Aurora Software Inc. be held responsible for any damages, including incidental or consequent damages, in connection with or arising out of the use or inability to use this software. All trademarks belong to their respective owners. Evaluation copy for: Your Company. This software expires on 02/28/2000 (mm/dd/yyyy). Code version: 3.00. Serial number: 00028784.
Thank you for trying this evaluation copy of SarCheck. To order a licensed version of this software, just type 'analyze -o' at the prompt to produce the order form, and follow the instructions.
This "Thank You" message appears on demo copies of the software. Licensed versions and Beta versions display a different message.
(c) copyright 1994-2000 by Aurora Software Inc., Plaistow NH, USA, All Rights Reserved.
The disclaimers, copyright notices, and expiration date (if any) are all important and you should read them.
Statistics for system: aurora
Statistics collected on: 01/15/00
Average CPU utilization: 28.7%
Peak CPU utilization: 31%
Average user CPU utilization: 20.2%
Average sys CPU utilization: 8.5%
Average waiting for I/O: 1.5%
Average run queue depth: 1.0
Peak run queue depth: 1.0
Actual DNLC hit percentage: 86.30%
Pct of phys memory unused: 30.3%
Average page scanning rate: 7.9/sec
Peak page scanning rate: 37.3/sec
Page scanning threshold: 5.0/sec
Average cache read hit ratio: 100.0%
Average cache write hit ratio: 93.0%
Disk device w/highest peak: sd011
Avg pct busy for that disk: 6.7%
Peak pct busy for that disk: 14.0%
Approx CPU capacity remaining: 100%+
Approx I/O bandwidth remaining: 100%+
Remaining process tbl capacity: 100%+
Can memory support add'l load: Moderate
This is the output that you'll see if you're using the -t or -tonly switches. This table is much easier to parse than the standard text-based SarCheck report. When used in conjunction with the -html switch (described in the next section), this information is formatted into a table and unusual values are flagged by coloring the cells which contain those values. It you have a browser handy, you'll want to try using the -html and -t switches together.
Thanks for your interest and support!
We're added the option of automatically inserting HTML tags into the output of SarCheck. This will enable you to post the output of SarCheck on your corporate intranet and will enable us to do some really amazing things in the future.
We hope that you'll try it out and make any suggestions you feel would be helpful. To use this feature, use the -html switch when running analyze. For example, the command
analyze -html sar12 > /tmp/rpt12.html
will produce a report with HTML tags which can be read by your favorite browser. The most important parts of the report will be printed in bold type, and headings are used to clarify what you're looking at.
The -t, -tonly, -dtbl, and -dtoo switches can be used in conjunction with the -html switch to produce HTML tables, and these can make it easier to understand what SarCheck is telling you. In some cases, cells of these tables will use a different background color as a means of highlighting interesting data.
Almost all of the new features that we add to SarCheck are based on feedback from our customers. If you have any ideas for making SarCheck even better, please contact us!
The number of switches and options available in SarCheck continues to grow, and this section is designed to help you decide how to do what you want. In order to maintain some level of clarity, all examples will analyze the sar report file /var/adm/sa/sar23. The output of the analyze program is stdout, so you'll probably want to pipe it to more or redirect it to a file.
Example 1: Analyzing a sar report. We're going to start with the simplest possible example. The command below will run the analyze program and tell it to analyze the sar report file /var/adm/sa/sar23.
analyze /var/adm/sa/sar23
Example 2: Removing the page breaks. The -p switch removes the page breaks. This is especially useful when piping the report to pg instead of more. A number of other switches, the -html switch for example, will automatically invoke -p where it makes sense.
analyze -p /var/adm/sa/sar23
Example 3: Analyzing ps -elf output in conjunction with the sar report. The /opt/sarcheck/bin/ps1 script is used to collect ps -elf data which can be used in conjunction with the sar report. The -ps switch tells the analyze program to search for the ps -elf data and include it in the report if possible.
analyze -ps /var/adm/sa/sar23
Example 4: Creating an HTML-formatted report. Let's combine a few switches this time. The -html switch makes the -p switch unnecessary, but we want to incorporate ps -elf data into the analysis and we want to see a quick table of some statistics at the end of report.
analyze -html -ps -t /var/adm/sa/sar23
Example 5: Same as above, but on a big system. This time we're analyzing sar data from a huge system, with a large number of disks. By default, SarCheck will filter out information on 'uninteresting' disks, but it will still produce a paragraph on each disk. This can get a little ridiculous and hard to read, so we'll use the -dtbl switch to format the disk information into an HTML table, and the -dbusy switch to sort the disk information so that the busiest disks are at the top of the table. As with the -t switch, cells will be colored if SarCheck wants to draw your attention to specific data. Note that the -dtbl switch requires the -html switch.
analyze -html -ps -t -dtbl -dbusy /var/adm/sa/sar23
Example 6: Emailing the output. If you manage a network of 200 systems, you may want to email interesting SarCheck reports to yourself, but you probably don't want to be spammed with 200 messages a day saying that everything's okay. The -r switch prevents SarCheck from producing any output at all if there are no recommendations. The -Q switch reduces verbiage to a minimum.
analyze -ps -r -Q /var/adm/sa/sar23 | mail root@wherever.com
Example 7: Suppressing the detection of memory leaks. False alarms are not uncommon when SarCheck attempts to detect memory leaks. Some programs, such as those found on systems running Oracle, will grow over time. This is apparently a deliberate memory leak, and depending on the behavior of programs running on your system, you may want to suppress the reporting of memory leaks, runaway processes, or unusually large processes. The -pml switch can be used to suppress memory leaks as follows:
analyze -ps -pml /var/adm/sa/sar23
Example 8: Specifying a different ps -elf file. If you move the sarcheck files to a directory other than /opt/sarcheck and you want to analyze ps -elf data, you have to tell the analyze program where to find the data. This example looks the sar report sar23 and the ps -elf data in pselffile:
analyze -ps -pv /tmp/pselffile /var/adm/sa/sar23
Q. Why do I get the message "sarcheck: not found"?
A. Your PATH variable does not contain the /usr/local/bin directory, or the symbolic link in /usr/bin/sarcheck is not there. The cause of the message "analyze: not found" is the same.
Q. Why do I get the message "/usr/local/bin/sarcheck: 2525 Memory fault - core dumped"?
A. This error is typically caused by a bug in sar which is triggered when you ask the sarcheck script to analyze sar data, but you provide the name of a sar report. Note that the number, which appears in the error message, will vary
Q. Why do I get the message "lc: sa[0-3][0-9] not found: No such file or directory (error 2)"?
A. This message indicates that there are no sar data files for sarcheck to analyze. The most common cause of this message is that there's a problem with the way sar is set up, or maybe the kernel was built at least 7 days ago and the system has not been rebooted.
Q. Why does SarCheck tell me that my fast new disks are slow?
A. The speed of disks (as reported by the manufacturer) is usually better than the speed reported by sar. Soft I/O errors, poor locality of reference, and problems with the disk controllers are frequently responsible. Unfortunately, sar doesn't give us enough information to identify the true cause or recommend a solution.
Q. Should I implement recommendations which only show up occasionally?
A. Feel free to try, but implement the regularly occurring recommendations first, since those will address the most frequently occurring problems. If SarCheck occasionally recommends increasing the amount of memory, you should certainly try it. On systems with some extra memory, SarCheck will be able to make additional recommendations that could not be made on systems where memory is "tight".
Q. Every time I make changes based on SarCheck's recommendations, it makes more recommendations. Why doesn't it just figure out the correct values for all the parameters?
A. That's not how real performance tuning works. There are no "correct" values because tuning is a series of compromises between various system resources. Performance tuning is trial and error, and gradual change is the only way to do it. Page 433 of Henriksen & Henriksen's "UnixWare 7 System Administration" provides some guidance on correct tuning methodology.
UNIX System V Performance Management. 1994. Englewood, Cliffs, NJ.: PTR Prentice Hall. ISBN 0-13-106429-1.
SCO Open Desktop/SCO Open Server System Administrators Guide. 1994. Englewood, Cliffs, NJ.: PTR Prentice Hall. ISBN 0-13-106808-3.
Henriksen & Henriksen UnixWare 7 System Administration. 1999.: Macmillan Technical Publishing. ISBN 1-57870-080-9.
Loukides, M. System Performance Tuning. 1991. Sebastopol, CA.: O'Reilly & Associates, Inc. ISBN 0-937175-60-9.
Majidimehr, A. Optimizing UNIX for Performance. 1996. Englewood Cliffs, NJ.: PTR Prentice Hall. ISBN 0-13-111551-0.
Miscovich, G. and Simons, D. The SCO Performance Tuning Handbook. 1994. Englewood, Cliffs, NJ.: PTR Prentice Hall. ISBN 0-13-102690-9. (Written for SCO UNIX version 3.2v4, but well worth the money)
SCO OpenServer Performance Guide. 1995. Santa Cruz, CA.: The Santa Cruz Operation, Inc. (Included with the OpenServer Release 5 documentation set)
We'd like to thank the following people for their suggestions, ideas, support, and yes, even a few bug reports:
D. J. Blackwood, William Drescher, Robert P. Fries, Steve Gardiner, Jeff Hyman, Berni Jubb, Peter Kettle, Bob Long, Nancy Lorenz, Bela Lubkin, Rich Marotta, Gene Martin, Tom Melvin, Jim Pazarena, Lee Penn, Tom Podnar, Jean-Pierre Radley, Charlie Russel, David Simons, Bob Willey, and a number of others.