SarCheck will convert sar reports into a CSV-formatted form which is used by many spreadsheets and graphing tools. It will also produce graphs if you have a version of gnuplot installed that supports PNG or JPEG output. If you ask for HTML output and the production of graphs, SarCheck will insert the graphs into the HTML document. The graphs will be inserted by using <img> tags complete with a description in the tag's alt attribute in order to meet accessibility requirements.
Important: SarCheck has not been designed to analyze sar data from one system on another. This is because not all of the data needed is in the sar and ps reports, and SarCheck has to look for additional data in the kernel. If SarCheck uses sar data from one system and kernel data from another system, it will issue a warning.
SarCheck's recommendations are designed to produce incremental improvements, so SarCheck should be run regularly. No attempt is made to guess the ultimately correct value for any parameter based on a single day's sar data. Instead, SarCheck will recommend that you increase or decrease values based on the data available, and will continue to recommend changes until there is no more room for improvement. Performance tuning is, by definition, a process of trial and error. SarCheck will not only help you to make those changes, but will also explain the reasons for each recommendation.
SarCheck is different from other performance tools because it does not monitor system activity. In much the same way that a UNIX performance expert would approach the problem, SarCheck analyzes the data available with sar, ps, and other utilities you've already paid for. Since sar is included with the operating system, we didn't see a need to create yet another monitor for you to buy.
SarCheck can be run from the command line, as a cron job, or from a menu-driven front end script. For reasons of safety and security, SarCheck will not attempt to change tunable parameters or anything else in the kernel.
Based on its analysis of the resources and statistics described above, SarCheck may recommend a variety of steps which can be taken to improve system performance.
The first step should be to decide where SarCheck will reside.
You may accept all of SarCheck's defaults. We recommend this to eliminate confusion with the installation of future updates. If you have a sarcheck_parms file created in /usr/local/etc you will need to copy it to /opt/sarcheck/etc. You should also move the ps -elf data to /opt/sarcheck/ps.
As an alternative, you may use SarCheck in /opt/sarcheck but leave the ps -elf data in /usr/local/ps. If this is the scenario create a sarcheck_parms file in the /opt/sarcheck/etc/ directory. Add this entry for where to find the ps -elf data.
PSELFDIR /usr/local
Finally, you may want to keep everything as it is now by moving SarCheck to /usr/local. The best way to do this is to move SarCheck from /opt/sarcheck to /usr/local and then use the SARCHECKDIR environment variable to point to /usr/local. For more information, see the section entitled How to move SarCheck to another directory.
To install the software, log in as root, put the compressed file in /tmp, then uncompress and detar it. This only takes a few seconds.
dosread scaix.taz | zcat | tar xvf -
This will install SarCheck on your system. See the section entitled Files included in this release for details. The installation of SarCheck does not require rebuilding the kernel. This is important because it means that SarCheck will not increase the size of your kernel, and you won't have to reboot your system. Setting up sar may require a reboot but usually it doesn't.
To test SarCheck, type
/opt/sarcheck/bin/analyze /opt/sarcheck/etc/aixsar22 | more
Warning: Do not implement the recommendations produced by analyzing the test file aixsar22! This file has been included for test purposes only.
To reduce typing, you may want to add /opt/sarcheck/bin to root's PATH.
To install the software, log in as root and then uncompress and detar it. This only takes a few seconds.
zcat < scaix.taz | tar xvf -
This will install SarCheck on your system. See the section entitled Files included in this release for details.
To test SarCheck, type
/opt/sarcheck/bin/analyze /opt/sarcheck/etc/aixsar22 | more
Warning: Do not implement the recommendations produced by analyzing the test file aixsar22! This file has been included for test purposes only.
To reduce typing, you may want to add /opt/sarcheck/bin to root's PATH
Uncomment the entries in adm's crontabs file which run the sa1 and sa2 scripts. These can be found in the file /var/spool/cron/crontabs/adm and as a rule, that file should be edited with smit or crontab -e when possible.
Here are some recommended cron entries. These entries will capture data once an hour at non-peak times and every 20 minutes during the system's busiest times. Feel free to modify these entries to best capture statistics from your system's busiest times. We recommend capturing sar data every 10 to 60 minutes.
The entries you'll uncomment should look something like this:
#0 8-17 * * 1-5 /usr/lib/sa/sa1 1200 3 &
#0 * * * 0,6 /usr/lib/sa/sa1 &
#0 18-7 * * 1-5 /usr/lib/sa/sa1 &
#5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 3600 -ubcwyaqvm &
For best results, change the line that runs the sa2 script to look like this:
5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 3600 -A &
The -A switch will cause all of the data collected by sar to be reported. The -ubcwyaqvm switch prevents some data from being reported.
As an alternative, the following cron entries are oriented towards the 24x7 monitoring that many administrators prefer:
0 * * * * /usr/lib/sa/sa1 1200 3 &
45 23 * * * /usr/lib/sa/sa2 -A &
If you have edited the crontab entries with vi instead of crontab -e or smit, it may be necessary to reboot the system for the changes to take effect.
0,20,40 8-17 * * 1-5 /opt/sarcheck/bin/ps1
5 17 * * 1-5 /opt/sarcheck/bin/ps2
As an alternative, the following cron entries are oriented towards the 24x7 monitoring that many administrators prefer:
0,20,40 * * * * /opt/sarcheck/bin/ps1
45 23 * * * /opt/sarcheck/bin/ps2
We recommend using smit or crontab -e to modify the crontab file.
The ps -elf data collected on large systems can take up a considerable amount of space. If you want to store this data somewhere other than /opt/sarcheck/ps, you can specify a different directory with the PSELFDIR keyword in the sarcheck_parms file.
WARNING: If you choose to specify a different directory, be sure to pick a directory that is not used for anything else. The purpose of the sa2 script is to remove any file in the ps -elf directory which is more than 14 days old and you don't want to accidentally remove files which contain something other than ps -elf data.
For the sake of consistency and convenience, we use the precompiled binary built by Ready to Run Software, Inc. It is available on our web site at http://www.sarcheck.com/gnuplot/
To set up gnuplot, please follow the instructions on the web site.
SarCheck was not designed to be regularly moved from one system to another, in an effort to provide "quick fixes" to a number of systems. Quick fixes will not allow you to take advantage of the long term iterative tuning that SarCheck makes possible.
/opt/sarcheck/bin/sarcheck
To reduce typing, you may want to add /opt/sarcheck/bin to root's PATH.
A series of choices will appear on the screen. If you accept all the defaults by pressing the Enter key, the previous day's sar data will be analyzed, and this is the easiest way to get started. For security reasons, your account must have permission to access the sar data or report files that you wish to analyze.
The first question will ask you whether you want to analyze sar data or a sar report. Sar data is usually found in /var/adm/sa/sann or /var/adm/sa/sann, where nn is the day of the month. Sar reports are already reduced into a readable form and are usually found in /var/adm/sa/sarnn or /var/adm/sa/sarnn.
Analyzing the reports will be marginally faster than analyzing the data, but an advantage to analyzing the data is that you can control the start and end times by changing sarcheck's defaults. To change any of the defaults, see the section How to change the SarCheck menu defaults.
Analyze what? d sar data (sa files) r A sar report (sar files) c Concatenate existing sar reports * Accept all defaults x Exit SarCheck (keyword = DR, default = d): _
After you pick the d or r options, you will be prompted to enter the name of the data or report file. In either case the default will be the statistics from the previous business day. The c option will concatenate all of the sar reports present and will not ask you for the name of a file. The sarcheck script will change your working directory, so you do not have to use the absolute address of the file. To accept all defaults, enter an asterisk; to exit sarcheck, enter an x. To change any of the defaults, see the section How to change the SarCheck menu defaults.
Sar data is usually found in /var/adm/sa/sann. Based on user-definable defaults, data from 08:00 to 17:00 will be analyzed. Enter the name of the sar data file that you wish to analyze.
Available data files in /var/adm/sa:
sa20, sa21, sa25, sa26, sa27, sa28, sa29, sa30
(default = sa29):
Note that if you run sarcheck on a Saturday, Sunday or Monday, Friday's statistics will be analyzed by default. This is because weekend statistics are usually not representative of a "loaded" system, and there is a possibility that misleading recommendations would be generated. To change the default of excluding the analysis of weekend data, see the section How to change the SarCheck menu defaults.
The next option allows you to pick formatting. The default will produce a report with page numbers and page breaks (ctrl-L) included. For users that prefer to paginate the report with another tool, such as pg, the p option will suppress these page breaks. You can also choose to produce an HTML document at this point, and can decide whether to format the disk analysis in table form. HTML documents are best viewed with a web browser. If you wish to exit sarcheck, enter an x.
Pick formatting: n Normal, with page breaks p Page breaks suppressed h Create HTML document, no disk table t Create HTML document, with disk table o Create HTML document, disk table only * Accept remaining defaults x exit sarcheck (keyword = OPT, default = n): _
The verbosity option controls how verbose the SarCheck report is. The default verbose mode may produce a report 5 pages long, while superquiet mode may only contain 5 lines of text. Please note that instructions for implementing recommendations, explanations, and alternate tuning strategies may be suppressed by the quiet modes. When you're first using SarCheck, we recommend using the verbose mode so that you don't miss anything. The superquiet mode will automatically suppress page breaks
Verbosity level: v Verbose mode q Quiet, most verbiage suppressed Q Superquiet, all verbiage suppressed * accept all defaults x Exit SarCheck (keyword = VERBOSE, default = v): _
Analysis of ps -elf data will provide you with a closer look at memory bottlenecks and the ability to detect runaway processes and memory leaks. The enhanced sensitivity option increases the probability of generating "false alarms". See the section entitled Setup of ps -elf data collection for more information. If you wish to exit sarcheck, enter an x.
Analyze ps -elf n No, analyze sar data only data? y Yes, analyze sar and ps data e Enhanced sensitivity of ps data analysis * accept all defaults x exit sarcheck (keyword = PSELFOPT, default = y): _
The disk filtration option is useful for large systems and has no effect on smaller systems. If more than 12 disks are present, sarcheck will only print a paragraph on each disk which has significant activity. SarCheck will try to determine what "significant" means on your system. If you find that SarCheck's filtration is too aggressive, use "z" to show all disks with some activity. We have discovered that on systems with several thousand disk devices, it is common for most of them to be completely idle. The "z" option will filter out only disks with no activity at all.
Disk filtration: y Filter disk analysis more than 12 disks. n Analyze all disk activity if more than 12 seen. z Analyze all disks with some activity. * Accept all defaults x Exit sarcheck (keyword = DISKFLTR, default = y):
The tabular summary is used to print a summary of statistics in table form at the end of the report, and an example is included later in the manual. If HTML output has been selected, an HTML table is created. This option is useful for transferring statistics to a spreadsheet or graphics program, producing output which can be easily parsed by other programs. and for generating an easy to read table at the end of an HTML page.
Tabular Summary? y Print a tabular summary at the end of the report i Print a tabular summary instead of the report n Print the report without a summary * Accept remaining defaults x exit sarcheck (keyword = TABULAR, default = y):
This option is used to decide where to send the analysis. Note that some of these choices will be different based on the pager you use and any modifications made to the defaults.
If you choose to send the output to a file, you'll be prompted for the name of the file. The default file name is /tmp/yyyymmddhhmmss, which is a date/time stamp. You can modify the default by editing the sarcheck script. If you wish to exit sarcheck, enter an x.
Send output to: 1 more (the screen) 2 lp -s (a printer) 3 A file x exit sarcheck (keyword = OUTOPT, default = 1): _
/opt/sarcheck/bin/analyze sar12
To reduce typing, you may want to add /opt/sarcheck/bin to root's PATH.
For best results, pipe the output to more so that you can read it, or redirect it to a file if you want to save it. A report will be produced which contains information about your system, a brief summary, a recommendations section (if applicable), a resource analysis section, and a capacity planning section (if not suppressed). For more information, see the section entitled How to Interpret the Analysis.
For users that prefer to paginate the report with another utility, such as pg, the -p option will suppress page numbers and page breaks. To take advantage of this option, type:
/opt/sarcheck/bin/analyze -p sar12|pg
In order to print a SarCheck analysis every weeknight, use the following entry:
5 18 * * 1-5 /opt/sarcheck/bin/analyze /var/adm/sa/sar`date +\%d` | lp -s
To keep all of SarCheck's recommendations in the /usr/ops directory, use the following entry:
5 18 * * 1-5 /opt/sarcheck/bin/analyze /var/adm/sa/sar`date +\%d` > /usr/ops/`date +\%y\%m\%d`
Because the output of the analyze program is stdout, you can pipe or redirect it in lots of ways. It can be printed, mailed, stored... whatever works best in your environment.
cat /var/adm/sa/sar0[1-7] > /tmp/multisar
/opt/sarcheck/bin/analyze /tmp/multisar | more
Please note that the analyze program does not work if wildcard characters are used as a filename. Wildcard characters should be used with the cat command in order to produce a single file for the analyze program. Once you've become used to working with concatenated sar reports, you'll probably discover that the find command in /usr/lib/sa/sa2 removes the sar reports too quickly, and the naming convention used in /usr/lib/sa/sa1 is too restrictive. Make copies of the original sa1 and sa2 scripts, and then modify them to meet your needs.
The keywords for sarcheck's menu options can be found when running the sarcheck script, and the value should be one of the choices on the menu. For example, if you want the output of SarCheck to include a tabular summary at the end of the report, here is the menu selection that you will see:
Tabular Summary? y Print a tabular summary at the end of the report i Print a tabular summary instead of the report n Print the report without a summary * Accept remaining defaults x exit sarcheck (keyword = TABULAR, default = n):
You can see the name of the keyword and the options available. To change the default from 'n' to 'y' for this menu item, add the following line to the sarcheck_parms file:
TABULAR y
Now when you run the sarcheck script, the default behavior will be to print a tabular summary at the end of the report.
After those two fields are parsed by the sarcheck script, the rest of the line is ignored and is available as a comment. Any line that starts with something other than a valid keyword is also treated as a comment and is ignored.
Once you have decided to change the defaults, create or edit the sarcheck_parms file. Here is an example of a sarcheck_parms file where the starting and ending times used for analysis have been changed, page numbering is suppressed, and a tabular summary is printed at the end of the report. Note that since the sarcheck script only looks at the first two fields on each line, the rest of the line is treated as a comment and lines that don't start with valid keywords are also treated as comments:
# file to customize sarcheck created by
# Jess the sys admin on March 23, 2002
#
ST 06:00 starting time is 6AM
EN 15:00 ending time is 3PM
OPT p suppress page numbering
TABULAR y add a tabular summary
GRAPHDIR /diskfarm/sarcheck/images
A complete list of keywords supported in the sarcheck_parms file can be found in Appendix A.
The first way is to let SarCheck build graphs with the gnuplot utility. By adding the -png, -jpg, or -jpeg switches, SarCheck can use gnuplot to produce PNG or JPEG graphs and can insert those graphs in the HTML output of SarCheck. This will enable you to post some really interesting SarCheck reports on your corporate intranet. To produce an HTML report with PNG graphs, use the -html and -png switches when running analyze. For example, the command
analyze -html -png sar12 > rpt12.html
will produce an HTML report which can be read by your favorite browser. The most important parts of the report will be printed in bold type and headings are used to clarify what you're looking at. Graphs are inserted in appropriate places in the body of the report and some additional text is added to help explain the significance of the graphs. For more information, see Appendix B: Options available when running 'analyze'.
If you want SarCheck to produce graphs without the accompanying SarCheck report, use the -gonly switch to produce "graphs only".
The second way to produce graphs is by exporting CSV (Comma Separated Value) formatted data to a graphing program. The -gr switch in the analyze program will turn a sar report into output in CSV format. This format is easily understood by most spreadsheets. It's more work, but it is the best way to go if you don't have gnuplot or want to produce custom graphs.
Example 1: Analyzing a sar report. We're going to start with the simplest possible example. The command below will run the analyze program and tell it to analyze the sar report file /var/adm/sa/sar23.
analyze /var/adm/sa/sar23
Example 2: Removing the page breaks. The -p switch removes the page breaks. This is especially useful when piping the report to pg instead of more. A number of other switches, the -html switch for example, will automatically invoke -p where it makes sense.
analyze -p /var/adm/sa/sar23
Example 3: Analyzing ps -elf output in conjunction with the sar report. The /opt/sarcheck/bin/ps1 script is used to collect ps -elf data which can be used in conjunction with the sar report. The -ps switch tells the analyze program to search for the ps -elf data and include it in the report if possible.
analyze -ps /var/adm/sa/sar23
Example 4: Creating an HTML-formatted report. Let's combine a few switches this time. The -html switch makes the -p switch unnecessary, but we want to incorporate ps -elf data into the analysis and we want to see a quick table of some statistics at the end of report.
analyze -html -ps -t /var/adm/sa/sar23
Example 5: Creating an HTML-formatted report with embedded graphs. This is the same as the previous example, except that graphs are now embedded in the HTML output and they will be visible when the output is viewed with a browser. For this to work properly, you must have a copy of gnuplot installed, SarCheck must be able to find it, and the output must be redirected to a file. The file is then opened with a browser that can display PNG graphs. You can also use the -jpeg or -jpg switches if your version of gnuplot supports jpeg output.
analyze -html -ps -t -png /var/adm/sa/sar23 > sar23.html
Example 6: Creating an HTML-formatted report, but this time on a big system. This time we're analyzing sar data from a large system, and there are 100 disks on the system. By default, SarCheck will filter out information on 'uninteresting' disks, but it will still produce a paragraph on each disk. This can get a little ridiculous and hard to read, so we'll use the -dtbl switch to format the disk information into an HTML table, and the -dbusy switch to sort the disk information so that the busiest disks are at the top of the table. The -ptbl switch is also being used to format ps -elf statistics into a table. As with the -t switch, cells will be colored if SarCheck wants to draw your attention to specific data. Note that the -dtbl and -ptbl switches are most useful with the -html switch.
analyze -html -ptbl -t -dtbl -dbusy /var/adm/sa/sar23
Example 7: Emailing the output. If you manage a network of 200 systems, you may want to email interesting SarCheck reports to yourself, but you probably don't want to be spammed with 200 messages a day saying that everything's okay. The -r switch prevents SarCheck from producing any output at all if there are no recommendations. The -Q switch reduces verbiage to a minimum.
analyze -ps -r -Q /var/adm/sa/sar23 | mail root@wherever.com
Example 8: Suppressing the detection of memory leaks. False alarms are not uncommon when SarCheck attempts to detect memory leaks. Some programs, such as those found on systems running Oracle, will grow over time. This is apparently a deliberate memory leak, and depending on the behavior of programs running on your system, you may want to suppress the reporting of memory leaks, runaway processes, or unusually large processes. The -pml switch can be used to suppress memory leaks as follows:
analyze -ps -pml /var/adm/sa/sar23
Example 9: Specifying a different ps -elf file. If you move the sarcheck files to a directory other than /opt/sarcheck and you want to analyze ps -elf data, you have to tell the analyze program where to find the data. This example looks at the sar report sar23 and the ps -elf data in pself file:
analyze -pf /tmp/pselffile /var/adm/sa/sar23
Example 10: Generating graphs from a sar report. A sar report can be turned into a .csv file which can be graphed with a spreadsheet using the -gr switch:
analyze -gr /var/adm/sa/sar23 > graph.csv
Example A will use an environment variable called SARCHECKDIR.
SARCHECKDIR=/tmp/sarcheck
Step 2: Create the new directories for SarCheck. /tmp/sarcheck/bin, /tmp/sarcheck/etc, /tmp/sarcheck/ps, /tmp/sarcheck/doc
Step 3: Move the existing files to the new directories
mv /opt/sarcheck/bin/* /tmp/sarcheck/bin
mv /opt/sarcheck/etc/* /tmp/sarcheck/etc
mv /opt/sarcheck/ps/* /tmp/sarcheck/ps
mv /opt/sarcheck/doc/* /tmp/sarcheck/doc
Step 4: If SarCheck is running in a process created after this change, it should recognize the environment variable. For details, see the man page for 'environment'.
Step 5: Type echo $SARCHECKDIR to verify new location. To eliminate typing you may want to add the new location to your PATH:
PATH=$PATH:/tmp/sarcheck/bin
Step 2: Move the existing files to the new directories
mv /opt/sarcheck/bin/* /tmp/sarcheck/bin
mv /opt/sarcheck/etc/* /tmp/sarcheck/etc
mv /opt/sarcheck/ps/* /tmp/sarcheck/ps
mv /opt/sarcheck/doc/* /tmp/sarcheck/doc
Step 3: Create the file /opt/sarcheck/etc/sarcheck_parms (yes, you might have just moved a file with this name) and add the following line to the file with your favorite editor:
SARCHECKDIR /tmp/sarcheck
Analyze the data or report files that represent the busiest days. Determine the busiest times of the day, and if necessary, modify the menu defaults to analyze data from only that time period. To change any of the defaults, see the section How to change the SarCheck menu defaults.
For example, if your system is in use from 8:00AM to 5:00PM, and runs batch jobs and backups at night with plenty of time to spare, performance is probably most important during the day. To help SarCheck produce the most useful analysis, use a command such as:
sar -A -s8 -e17 > reportfile
This command will use all options (-A), and will only include data collected between 8:00AM (-s8) and 5:00PM (-e17) in the report. If the busiest day in the last few weeks was the 29th of the month, and you want to produce a report of system activity between 8:00 and 5:00 on that day, use the following command:
sar -A -s8 -e17 -f /var/adm/sa/sa29 > report29
Of course, this will only work if /usr/adm/sa/sa29 actually exists. Sar data should be regularly collected by a crontab entry.
It's important to analyze reports from days when the processing load was greatest. It may be that on those days, SarCheck will find resource bottlenecks which did not exist on days when the system did less work.
If you analyze data from the weekend, SarCheck may tell you how to optimize the system for weekend processing. Whether that makes sense or not in your environment (and it frequently won't) is up to you.
SarCheck does not need (and can not use) sar reports produced with sar's -P switch. Please do not use the -P switch when producing reports for SarCheck to analyze.
/opt/sarcheck/bin/analyze -h
This sends a subset of the instructions found in this manual to standard output (stdout), which defaults to the screen.
A FAQ section can also be found at the end of this manual, and updated information can be found on the SarCheck web site:
Important: When data from one system is analyzed on another, it is likely to result in incorrect or misleading recommendations. A warning message will appear if the name of the system on the sar report is different than the name of the system running SarCheck, or if the operating system version recorded by sar is different than the version reported by the uname command. This is because the values of tunable parameters, memory size, etc., are likely to be different on different systems.
Warning messages will appear if impossible data is seen in the sar report. Examples would be CPU utilization of 313% or a swap queue occupancy of -88%. The type of sar data which contains the problem will be identified. Sarcheck will still produce a report, but you should realize that the analysis of anomalous data is, as always, likely to follow the rule of 'garbage in, garbage out'.
The Summary section will highlight any bottlenecks that were seen in the areas of CPU, memory, or I/O, and will indicate if any kernel parameters need to be changed. If no bottlenecks are seen, the summary will say so, and point out that no recommendations will be made.
If runaway processes, memory leaks, or suspiciously large processes have been detected, a message will appear at the end of the Summary section.
The Recommendations section is present only if SarCheck has recommendations to make. If SarCheck thinks that everything is fine, no recommendations will be made. This is a normal condition and once the system is properly tuned, you should not be surprised to see a lack of recommendations.
The recommendations are based solely on the data contained in the sar file and the values of various tunable parameters, and should be taken in that context. For example, if batch jobs are run on Saturdays, and SarCheck analyzes statistics from that day, it may decide that an I/O bottleneck existed and spare memory was present, and therefore, an increase in buffer size may be appropriate. Following these recommendations may improve performance on Saturdays, but could hurt performance during the week by reducing the amount of memory available to users.
The changes to tunable parameters recommended by SarCheck are designed to cause slow, gradual improvement in order to prevent surprises. These gradual changes are designed to prevent any unanticipated side effects of a major change in a tunable parameter.
Due to the interrelationships between tunable parameters and system resources, sarcheck goes beyond the basic rules of thumb whenever possible.
The Resource Analysis section translates the data contained in the sar report into English. Much of this data is provided for reference, and explanations are given where appropriate. The implications of various statistics regarding CPU utilization, buffer sizing, memory utilization, system table sizes, and disk I/O bandwidth are presented in this section.
The times when key resources are most heavily used appear in this section. If these times correlate well with the times that performance degradation was reported, it can be inferred that exhaustion of these resources may be a cause of performance problems. Peak usage statistics are also used by the capacity planning section.
The Capacity Planning section can be used to approximate the amount of capacity left on the system, based solely on the sar data being analyzed. CPU, memory, and disk statistics are examined in order to determine which resource is likely to become exhausted first.
This section is not meant to perform the same functions as the more expensive tools available for large systems. It is designed to help meet the needs of system administrators, many of which are managing growing systems and need to know how much "room" is left before various resources become exhausted.
The exhaustion of resources is defined as any single interval in which CPU usage exceeded 90 percent, a disk was busy more than 75 percent of the time, or swapping was detected. Because the interval with the greatest resource usage is used, the capacity planning report will be less accurate if peak resource use occurred during an interval of less than 10 minutes. Note that these thresholds can be overridden with the sarcheck_parms file.
The Custom Settings section is where both successful and unsuccessful changes to SarCheck's default thresholds are reported. See the sections How to change the menu defaults and How to change SarCheck's algorithms for more information.
Disclaimers, trademark information, etc. At the end of the report is a disclaimer, trademark and copyright information, your software serial number, code version, licensee, and if applicable, the software's expiration date.
In some parts of the world, local resellers may charge prices which are higher than our list price because they pay for the currency conversions, international shipping, duties, support, etc. We urge our customers to support their resellers.
Call us at +1-603-382-4200,
fax us at +1-603-382-4247,
write to us at PO Box 1033, Plaistow NH 03865, USA,
use our email address: support@sarcheck.com,
visit our web site at http://www.sarcheck.com/
or contact the party from whom you purchased SarCheck.
/opt/sarcheck/bin/analyze: This program performs the analysis.
/opt/sarcheck/bin/sarcheck: This is the front end for analyze. It's a simple Bourne shell script which allows you to analyze the previous business day's sar data by pressing the enter key a few times. Create a sarcheck_parms file if you want to customize this script. See the section How to change the menu defaults for more information.
/opt/sarcheck/etc/analyze.txt: This file contains the text used to produce the analysis. In general, we recommend that you do not modify this file, because it may leave us unable to support the software. Users outside of the United States may modify the spelling of certain words in the file if they wish. For example, the word 'utilization' can be changed to 'utilisation'. If you would like a non-English version of SarCheck, please call us.
/opt/sarcheck/etc/analyze.key: This file contains the activation key. This file is not meant to be edited directly and tampering with it may permanently disable SarCheck.
/opt/sarcheck/bin/ps1: This is a script that collects ps -elf data, and is roughly analogous to the sa1 script used by sar.
/opt/sarcheck/bin/ps2: This is a script that cleans up ps -elf data, and is roughly analogous to a subset of the sa2 script used by sar.
/opt/sarcheck/bin/vmsparse: This is program called by the ps1 script. It collects data from vmstat -s and the data is stored in the data files found in /opt/sarcheck/ps. We anticipate collecting more data in this way in the future as we continue to need data which has not been collected by sar. Feel free to run this program from the command line if you want to see the output or measure its resource utilization.
/opt/sarcheck/bin/ondemand: This is a script which can be used to get recommendations that are almost "real time". If your system is slow and you want to collect and analyze data while the system is slow, this script will enable you to do it. We are trying to determine if this script meets your needs and haven't received much feedback. Please let us know what you think.
/opt/sarcheck/doc/aixdoc600.html: This is the document that you are reading.
/opt/sarcheck/etc/aixsar22: A sample sar report.
/opt/sarcheck/etc/sarcheck_parms: This file is not actually included with the SarCheck distribution but you might want to create it in order to modify the SarCheck menu defaults or the thresholds used by SarCheck's algorithms.
SarCheck(TM): Automated Analysis of AIX sar and ps data (English text version 6.02.01)
NOTE: The features and functionality of this beta software may not match the documentation exactly. Please contact us or visit our web site if you have any questions.
This is an analysis of the data contained in the file ./sar22. The data was collected on 06/22/2005, from 00:20:00 to 17:20:00, from the system 'ux3005'. There were 17 data records used to produce this analysis. The operating system used to produce the sar report was Release 5.3 of AIX. The operating system used to produce this report is AIX Release 4.3.3.0. The accuracy of this analysis will be compromised if the sar report and the analysis did not use the same operating system. The system configuration data in the sar report indicated that 8.0 processors were configured. 12288 megabytes of memory were seen in the system configuration data.
Data collected by the ps -elf command on 06/22/2005 from 00:20:00 to 17:20:00, and stored in the file /opt/sarcheck/ps/20050622, will also be analyzed. This program will attempt to match the starting and ending times of the ps -elf data with those of the sar report file named ./sar22.
If the operating system version reported by sar does not match the one you're using to do the analysis or the system names don't match, a warning will be printed.
Table of Contents
When the data was collected, no CPU bottleneck could be detected. A memory bottleneck was seen. No significant I/O bottleneck was seen. A change to at least one tunable parameter has been recommended. Limits to future growth have been noted in the Capacity Planning section.
Some of the defaults used by SarCheck's rules have been overridden using the sarcheck_parms file. See the Custom Settings section of the report for more information.
All recommendations contained in this report are based solely on the conditions which were present when the performance data was collected. It is possible that conditions which were not present at that time may cause some of these recommendations to result in worse performance. To minimize this risk, analyze data from several different days, implement only regularly occurring recommendations, and implement them one at a time.
Additional memory may improve performance. More than half of the system's memory was pinned, making less memory available to meet the needs of processes running on the system. If possible, borrow some memory for test purposes, and monitor system performance and resource utilization before and after its installation.
Change the value of maxfree from 1088 to 1984 with the command 'vmo -o maxfree=1984'. The -o flag changes the value of a parameter only until the next reboot. To make the change permanent, use the command 'vmo -p -o maxfree=1984'. The magnitude of this change has been limited to prevent the recommendation of very large changes. Changing this parameter in smaller increments is a much safer way to tune the system. This change is recommended based on formulas discussed at IBM's pSeries Technical University and at the UserBlue conference. The following data was used in this calculation: The maxpgahead value used was 8. The value of lcpu reported by sar was 8.0. The number of active CPUs reported by sysconf is 1.
Change the value of minfree from 960 to 1920 with the command 'vmo -o minfree=1920'. The -o flag changes the value of a parameter only until the next reboot. To make the change permanent, use the command 'vmo -p -o minfree=1920'. The magnitude of this change has been limited to prevent the recommendation of very large changes. Changing this parameter in smaller increments is a much safer way to tune the system. This change is recommended based on formulas discussed at IBM's pSeries Technical University and at the UserBlue conference. The following data was used in this calculation: The number of memory pools seen was 4. The value of lcpu reported by sar was 8.0. The number of active CPUs reported by sysconf is 1.
Change the value of the maxperm parameter to 60 with the command 'vmo -o maxperm%=60'. The -o flag changes the value of a parameter only until the next reboot. To make the change permanent, use the command 'vmo -p -o maxperm%=60'. This should bring the value of maxperm down in the direction that will improve performance. This change will not be helpful if the system's primary function is an nfs server or it is doing a lot of raw database I/O. The recorded value for maxperm was 80.0 percent.
Change the value of the minperm parameter to 15 with the command 'vmo -o minperm%=15'. The -o flag changes the value of a parameter only until the next reboot. To make the change permanent, use the command 'vmo -p -o minperm%=15'. This recommended value has been set to match that of the maxperm recommendation. If you choose not to change maxperm because the system is an nfs server or it is performing a lot of raw database I/O, minperm should not be changed either. The recorded value for minperm was 20.0 percent.
Please note that the formulas typically used to set many parameters can cause problems when manual adjustments are being made.
A CPU upgrade is not recommended because the current CPU had significant unused capacity.
No disk recommendations have been made because no bottleneck was seen.
An average of 29.64 percent of this partition's entitled CPU capacity (%entc) was used during the monitoring period. The percentage peaked at 54.50 from 04:20:03 to 05:20:02. There were 0.82 physical processors in use when the percentage of entitled CPU capacity was at its peak.
The average number of physical processors consumed by this partition (physc) was 0.44. The peak number of physical processors consumed was 0.82 from 04:20:03 to 05:20:02.
Information in this paragraph is taken from the sar -u report. This information may not be completely accurate on a micropartitioned POWER5 system and is provided because people are used to seeing it. Average CPU utilization (%usr + %sys) was only 22.5 percent. This indicates that spare CPU capacity exists. If any performance problems were seen during the entire monitoring period, they were not caused by a lack of CPU power. CPU utilization peaked at 46 percent from 04:20:03 to 05:20:02. The CPU was waiting for I/O (%wio) an average of 6.8 percent of the time. The time that the system was waiting for I/O peaked at 19 percent from 01:20:01 to 03:20:03.
The preceding graph shows the relationship between %entc data and the sum of %usr and %sys. The %entc data is more accurate and should be used instead of the traditional %usr and %sys metrics. The %wio column is probably not very accurate but higher values are likely to indicate times of greater I/O activity. Because the %usr, %sys, and %wio data is not accurate on micropartitioned POWER5 systems, it has not been used to calculate the percent of time that the system was idle.
Information in this paragraph is taken from the runq-sz and %runocc columns in the sar -q report and may not be completely accurate on a micropartitioned POWER5 system. The run queue had an average length of 1.4 which indicates that processes were generally not bound by latent demand for CPU resources. The run queue was usually occupied, despite the lack of a significant run queue length. This condition is usually seen when the number of CPU-intensive processes is low. It is likely that the performance of these processes is closely related to CPU speed.
No buffer cache activity was seen in the sar -b data. This is normal for AIX systems, which typically do not use the traditional buffer cache.
The average rate at which I/O was blocked because an LVM had to wait for pbufs was 0.20 per second. The peak rate was 6.69 per second from 01:00:00 to 01:20:01. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst when LVMs had to wait for pbufs, then a problem may be that the number of pbufs was insufficient. A recommendation to increase the number of pbufs was not made because a memory-poor environment was seen. A recommendation to increase the number of pbufs was not made because the amount of pinned memory was close to the maximum permitted by the maxpin tunable parameter.
The average rate at which I/O was blocked because the kernel had to wait for a free bufstruct (called fsbuf in vmstat -v) was 2.51 per second. The peak rate was 41.73 per second from 00:40:00 to 01:00:00. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst when the kernel had to wait for bufstructs, then a problem may be that bufstructs could not be allocated quickly enough to meet the I/O load. A recommendation to increase the number of bufstructs was not made because a memory-poor environment was seen.

The above graph shows when the rate of I/O blocking was highest. If these times are the ones when performance was poor, if may be possible to improve performance by increasing the appropriate number of buffers.
The average context switch rate (cswch/s) was 5276.59 per second. The context switch rate (cswch/s) peaked at 11278.0 per second from 10:20:01 to 11:20:03. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst during the period of peak context switching, then a problem may be that too many processes were blocked for I/O or IPC.
There was no indication of swapped out processes in the ps -elf data. Processes which have been swapped out are usually found only on systems that have a very severe memory shortage.
The average number of page replacement cycles per second (cycle/s) was 0.015. The number of page replacement cycles per second peaked at 0.06 from 00:20:00 to 01:20:01. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst during the period of peak replacement cycle activity, then a shortage of physical memory may be performance bottleneck.
The average number of kernel threads waiting to be paged in (swpq-sz) was 1.85. The average number of kernel threads waiting to be paged in (swpq-sz) peaked at 2.5 from 02:20:06 to 03:20:03. When the peak was reached, the swap queue was occupied 95 percent of the time. A more useful statistic is sometimes available by multiplying the swpq-sz data by the percent of time the queue was occupied. In this case, the average was 0.83 and the peak was 2.38 from 02:20:06 to 03:20:03. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst when the number of kernel threads waiting to be paged in was at its peak, then a shortage of physical memory may be performance bottleneck.
The following graph shows any significant statistics relating to page replacement cycle rate, number of kernel threads waiting to be paged in, and number of swapped processes.

The average page out rate to the paging spaces was 23.36 per second. The paging space page out rate peaked at 78.22 from 02:00:02 to 02:20:01. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst when the paging space page out rate was at its peak, then a shortage of physical memory may be performance bottleneck. The following graph shows the rate of paging operations to the paging spaces.

The recorded setting for maxpin leaves 2457.60 megabytes of memory unpinnable. A memory-poor environment was seen and more than half of the system's memory was pinned. When most of the memory is pinned, less memory can be freed to meet the needs of processes running on the system.

A change to the value of maxperm has been recommended in order to bring it down toward the calculated target value of 16.0 percent. The target is based on the average value of numperm but it is too far from the recorded value of maxperm for the change to be reasonably implemented in one step. By avoiding very large changes, this program minimizes the problems seen in large, traumatic changes. Because this recommendation will not cause a change to the relationship between maxperm or minperm values and the average numperm value, performance improvement is not likely to be dramatic. The calculated average value of numperm was 20.6 percent.
A recommendation has been made to change the value of minperm to 15. This change will preserve the relationship between the minperm and maxperm parameters.
The following graph shows the relationship between numperm, minperm, and maxperm.

A change to the value of maxfree and/or minfree have been recommended based on formulas discussed at IBM's pSeries Technical University. The magnitude of these changes has been limited to prevent the recommendation of very large changes. Changing these parameters in smaller increments is a much safer way to tune the system. The maxpgahead value used was 8. The number of memory pools seen was 4. The value of lcpu reported by sar was 8.0. The number of active CPUs reported by sysconf is 1.
No I/O bottleneck was seen in the sar statistics, therefore no changes are recommended for maxpgahead.
The value of numclust is 1. If fast disk devices, disk arrays, or striped logical volumes are in use, the performance of disk writes could be improved by increasing this value. SarCheck does not have access to enough information about the system's disk devices to make any specific recommendation for tuning numclust.
The minimum multiprogramming level has been set to 2. This is a safe value for small configurations and may be low for larger configurations. This parameter is very dependent of workload and the correct value cannot be determined with sar and ps data. A memory shortage has been seen and a value which is too low may cause performance problems. More information can be found on page 23 of Frank Waters' "AIX Performance Tuning".
The average rate of System V semaphore calls (sema/s) was 0.001 per second. No problems have been seen, and no changes have been recommended for System V semaphore parameters. Note that SarCheck only checks these parameter's relationships to each other since semaphore usage data is not available.
No System V message activity (msg/s) was seen. No problems have been seen, and no changes have been recommended for System V message parameters. Note that SarCheck only checks these parameter's relationships to each other since message usage data is not available.
Semaphore and message parameters are among the most confusing of all kernel parameters. Whenever possible, the complex relationships between parameters are checked automatically by SarCheck, and problems are reported.
There were no times when enforcement of the process threshold limit (kproc-ov) prevented the creation of kernel processes. This indicates that no problems were seen in this area.
The ratio of exec to fork system calls was 1.00. This indicates that PATH variables are efficient.
The average system-wide local I/O rate as measured by the r+w/s column in the sar -d data was 370.53 per second. This I/O rate peaked at 1038 per second from 01:20:01 to 02:20:06.

The following graph shows the average percent busy and service time for up to 5 disks, not sorted with the -dbusy or -dserv switches. Only the first 5 disks to appear in the sar report appear in the graph and these may not be the ones that you want to see. A more useful graph can be created by using the -dbusy or -dserv switches.

Note: 45 disks were present. By default, the presence of more than 12 disks causes SarCheck to only report on the busiest disks. This is meant to control the verbosity of this report. To see all disks included in the report, use the -d option.
The -dtoo switch has been used to format disk statistics into the following table.
| Disk Device Statistics | ||||
|---|---|---|---|---|
| Disk Device | Average Percent Busy | Peak Percent Busy | Queue Depth when occupied | Average Service Time |
| hdisk1 | 17.18 | 57.0 | 1.8 | 10.7 |
| hdisk0 | 26.12 | 74.0 | 1.8 | 9.6 |
| hdisk8 | 2.41 | 34.0 | 313.1 | 3.7 |
| hdisk22 | 3.24 | 24.0 | 0.3 | 3.6 |
| hdisk11 | 2.18 | 36.0 | 242.4 | 2.4 |
| hdisk23 | 4.29 | 22.0 | 0.3 | 2.3 |
| hdisk5 | 5.00 | 74.0 | 1293.2 | 12.2 |
| hdisk24 | 1.82 | 27.0 | 0.5 | 3.5 |
| hdisk7 | 2.35 | 32.0 | 350.9 | 5.7 |
| hdisk27 | 7.88 | 19.0 | 0.3 | 3.4 |
The disk device hdisk1 was busy an average of 17.18 percent of the time and had an average queue length of 1.8 (when occupied). This indicates that the device is not a performance bottleneck. During the peak interval from 01:20:01 to 02:20:06, the disk was 57.0 percent busy. Peak disk busy statistics can be used to help understand performance problems. If performance was worst when the disk was busiest, then a performance bottleneck may be that disk. The average service time reported for this device and its accompanying disk subsystem was 10.7 milliseconds. This is relatively fast. Service time is the delay between the time a request was sent to a device and the time that the device signaled completion of the request.
The disk device hdisk0 was busy an average of 26.12 percent of the time and had an average queue length of 1.8 (when occupied). This indicates that the device is not a performance bottleneck. During the peak interval from 01:20:01 to 02:20:06, the disk was 74.0 percent busy. The average service time reported for this device and its accompanying disk subsystem was 9.6 milliseconds. This is relatively fast.
The disk device hdisk8 was busy an average of 2.41 percent of the time and had an average queue length of 313.1 (when occupied). This usage pattern is typical of corrupted sar data. During the peak interval from 01:20:01 to 02:20:06, the disk was 34.0 percent busy. The average service time reported for this device and its accompanying disk subsystem was 3.7 milliseconds. This is indicative of a very fast disk or a disk controller with cache. Queue length on this device peaked at an unlikely 377.5. This data is surprising and may indicate a problem with the sar -d statistics.
The disk device hdisk22 was busy an average of 3.24 percent of the time and had an average queue length of 0.3 (when occupied). This indicates that the device is not a performance bottleneck. The average service time reported for this device and its accompanying disk subsystem was 3.6 milliseconds. This is indicative of a very fast disk or a disk controller with cache.
The disk device hdisk11 was busy an average of 2.18 percent of the time and had an average queue length of 242.4 (when occupied). This usage pattern is typical of corrupted sar data. During the peak interval from 03:20:03 to 04:20:03, the disk was 36.0 percent busy. The average service time reported for this device and its accompanying disk subsystem was 2.4 milliseconds. This is indicative of a very fast disk or a disk controller with cache. Queue length on this device peaked at an unlikely 249.1. This data is surprising and may indicate a problem with the sar -d statistics.
The disk device hdisk23 was busy an average of 4.29 percent of the time and had an average queue length of 0.3 (when occupied). This indicates that the device is not a performance bottleneck. The average service time reported for this device and its accompanying disk subsystem was 2.3 milliseconds. This is indicative of a very fast disk or a disk controller with cache.
The disk device hdisk5 was busy an average of 5.00 percent of the time and had an average queue length of 1293.2 (when occupied). This usage pattern is typical of corrupted sar data. During the peak interval from 00:20:00 to 01:20:01, the disk was 74.0 percent busy. The average service time reported for this device and its accompanying disk subsystem was 12.2 milliseconds. This service time is acceptable. Queue length on this device peaked at an unlikely 1485.4. This data is surprising and may indicate a problem with the sar -d statistics.
The disk device hdisk24 was busy an average of 1.82 percent of the time and had an average queue length of 0.5 (when occupied). This indicates that the device is not a performance bottleneck. The average service time reported for this device and its accompanying disk subsystem was 3.5 milliseconds. This is indicative of a very fast disk or a disk controller with cache.
The disk device hdisk7 was busy an average of 2.35 percent of the time and had an average queue length of 350.9 (when occupied). This usage pattern is typical of corrupted sar data. During the peak interval from 01:20:01 to 02:20:06, the disk was 32.0 percent busy. The average service time reported for this device and its accompanying disk subsystem was 5.7 milliseconds. This is indicative of a very fast disk or a disk controller with cache. Queue length on this device peaked at an unlikely 438.2. This data is surprising and may indicate a problem with the sar -d statistics.
The disk device hdisk27 was busy an average of 7.88 percent of the time and had an average queue length of 0.3 (when occupied). This indicates that the device is not a performance bottleneck. The average service time reported for this device and its accompanying disk subsystem was 3.4 milliseconds. This is indicative of a very fast disk or a disk controller with cache.
Data collected by ps -elf indicated that at 14:00:00 there were a peak of 972 processes present. This was the largest number of processes seen with ps -elf but it is not likely to be the absolute peak because the operating system does not store the true "high-water mark" for this statistic. There were an average of 968.1 processes present.

No runaway processes, memory leaks, or suspiciously large processes were detected in the data contained in file /opt/sarcheck/ps/20050622.
No table was generated because no unusual resource utilization was seen in the ps -elf data.
This section is designed to provide the user with a rudimentary linear capacity planning model and should be used for rough approximations only. These estimates assume that an increase in workload will affect the usage of all resources equally. These estimates should be used on days when the load is heaviest to determine approximately how much spare capacity remains at peak times.
Based on the limited data available in this single sar report, the system should be able to support a limited increase in workload at peak times before the first resource bottleneck affects performance. See the following paragraphs for additional information.

The CPU can support an increase in workload of approximately 65 percent at peak times. Since page outs and/or swapping were detected, an increase in workload should be accompanied by an increase in memory. The busiest disk can support a workload increase of approximately 1 percent at peak times. For more information on peak resource utilization, refer to the Resource Analysis section of this report.
The default GRAPHDIR was changed with the -gd switch to /graphs.
The default HSIZE value was changed in the sarcheck_parms file from 0.70 to 1.20 times the default gnuplot width.
Please note: In no event can Aptitune Corporation be held responsible for any damages, including incidental or consequent damages, in connection with or arising out of the use or inability to use this software. All trademarks belong to their respective owners.
This is beta quality software and is to be used only in conjunction with a beta test program. This software is likely to contain defects and its recommendations should be regarded skeptically. This software provided for the exclusive use of: test. This software expires on 08/19/2005 (mm/dd/yyyy). Code version: 6.02.01. Serial number: 39495969.
(c) copyright 1995-2005 by Aptitune Corporation, Plaistow NH, USA, All Rights Reserved. http://www.sarcheck.com
| Statistics for system, ux3005 | ||||
|---|---|---|---|---|
| Start of peak interval | End of peak interval | Date of peak interval | ||
| System ID on sar report, | 00C9F42E4C00 | |||
| System ID of this system, | 000481674C00 | |||
| System model number is, | IBM Model 7042/7043 (ED) | |||
| Statistics collected on, | 06/22/2005 | |||
| Average phys processors consumed, | 0.44 | |||
| Peak phys processors consumed, | 0.82 | 04:20:03 | 05:20:02 | 06/22/2005 |
| Average entitled capacity consumed, | 29.64% | |||
| Peak entitled capacity consumed, | 54.5% | 04:20:03 | 05:20:02 | 06/22/2005 |
| Average CPU utilization, | 22.5% | |||
| Peak CPU utilization, | 46% | 04:20:03 | 05:20:02 | 06/22/2005 |
| Average user CPU utilization, | 7.1% | |||
| Average sys CPU utilization, | 15.4% | |||
| Average waiting for I/O, | 6.8% | |||
| Average run queue length, | 1.4 | |||
| Peak run queue length, | 1.6 | 04:20:03 | 05:20:02 | 06/22/2005 |
| Average run queue occupancy, | 53.4% | |||
| Average swap queue length, | 0.83 | |||
| Peak swap queue length, | 2.4 | 02:20:06 | 03:20:03 | 06/22/2005 |
| Peak page replacement cycle rate, | 0.06 | 00:20:00 | 01:20:01 | 06/22/2005 |
| Max paging space page outs, | 78.22 | 02:00:02 | 02:20:01 | 06/22/2005 |
| Max paging space page ins, | 99.74 | 02:00:02 | 02:20:01 | 06/22/2005 |
| Max swapped processes seen by ps, | 0 | |||
| Avg number of processes seen by ps, | 968.1 | |||
| Max number of processes seen by ps, | 972 | 14:00:00 | 06/22/2005 | |
| Average numperm value, | 20.58% | |||
| Average context switch rate, | 5276.59/sec | |||
| Number of kproc overflows seen, | 0 | |||
| Disk device w/highest peak, | hdisk0 | |||
| Avg pct busy for that disk, | 26.1% | |||
| Peak pct busy for that disk, | 74.0% | 01:20:01 | 02:20:06 | 06/22/2005 |
| Avg I/Os blocked for pbuf, | 0.20/sec | |||
| Peak I/Os blocked for pbuf, | 6.69/sec | 01:00:00 | 01:20:01 | 06/22/2005 |
| Avg I/Os blocked for fsbuf, | 2.51/sec | |||
| Peak I/Os blocked for fsbuf, | 41.73/sec | 00:40:00 | 01:00:00 | 06/22/2005 |
| Approx CPU capacity remaining, | 65.1% | |||
| Approx I/O bandwidth remaining, | 1.4% | |||
| Can memory support add'l load, | Limited | |||
Thanks for your interest and support!
A: We support AIX 4.2 through 5.3 and SarCheck ought to work with 4.1. One binary works with both 32- and 64-bit versions of AIX.
Q: If I have other kinds of UNIX systems, can I try SarCheck on those too?
A: Sure! Fill out our order form and we'll send you eval copies of our released products. SarCheck is also available for Solaris SPARC 2.5 and up, HP-UX versions 10 and 11, and most Linux 2.2 through 2.6 kernels. Even if your primary platform is AIX, we encourage you to try SarCheck on other platforms so that you can see what it does and understand what we want to do to help AIX sys admins.
Q. Why doesn't SarCheck tell me about my disks?
A. SarCheck uses sar to collect data on disk activity and not all versions of sar do this. In some cases, operating system patches available from IBM may help to make sar -d data available.
Q. Should I implement recommendations that only show up occasionally?
A. Feel free to try, but first implement the regularly occurring recommendations, since those will address the most frequently occurring problems. If SarCheck occasionally recommends increasing the amount of memory, you should certainly try it. On systems with some extra memory, SarCheck will be able to make additional recommendations that could not be made on systems where memory is "tight".
Q. Every time I make changes based on SarCheck's recommendations, it makes more recommendations. Why doesn't it just figure out the correct values for all the parameters?
A. That's not how real performance tuning works. There are no "correct" values because tuning is a series of compromises between various system resources. Performance tuning involves a certain degree of trial and error, and gradual change is the only way to do it.
Q. When I try to run sarcheck, I get the message "sarcheck: not found". What's wrong?
A. Check the following:
Q. Why did SarCheck stop producing reports?
A. Usually this is because the software has expired. Run '/opt/sarcheck/bin/analyze' and look for the expiration date at the bottom of the usage text. If you've licensed SarCheck and the expiration date doesn't make sense to you, run 'analyze -s' and send us the output.
Q. How do I collect data over a 24 hour period?
A. The crontab entries should look like this:
0 * * * * /usr/lib/sa/sa1 1200 3 &
45 23 * * 1-5 /usr/lib/sa/sa2 -i 1200 -A &
Q. How do I collect data every 10 minutes from 08:00 to 18:00?
A. The crontab entries should look like this:
0 8-17 * * 1-5 /usr/lib/sa/sa1 600 6 &
0 18-7 * * 1-5 /usr/lib/sa/sa1 &
5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 600 -A &
Chukran, R. Accelerating AIX. 1998. Reading, MA.: Addison Wesley. ISBN 0-201-63382-5.
Majidimehr, A. Optimizing UNIX for Performance. 1996. Englewood Cliffs, NJ.: PTR Prentice Hall. ISBN 0-13-111551-0.
Musumeci, G. & Loukides, M. System Performance Tuning, 2nd Edition. 2002. Sebastopol, CA.: O'Reilly & Associates, Inc. ISBN 0-596-00284-X.
Waters, F. AIX Performance Tuning. 1996. Upper Saddle River, NJ.: PTR Prentice Hall. ISBN 0-13-386707-2.
D. J. Blackwood, Calvin Breyley, William Drescher, Robert P. Fries, Steve Gardiner, Jeff Hyman, Berni Jubb, Peter Kettle, Bob Long, Nancy Lorenz, Bela Lubkin, Rich Marotta, Gene Martin, Tom Melvin, Lee Penn, Tom Podnar, Jean-Pierre Radley, Charlie Russel, David Simons, Dave Venus, and a number of others.
| PAGER | The pager to be used to display the analysis on the screen. The default is more, but pg or less are common alternatives. |
| LPS | The command for printing the analysis. The default is lp -s. |
| PSELFDIR | The directory where SarCheck will look for the ps -elf data. The analyze program and the ps1 and ps2 scripts will use this new directory.
WARNING! Please pick a directory that contains nothing but ps -elf data! The ps2 script will use the find command to remove any file in the specified directory which is more than 14 days old. We have tried to limit the potential damage by adding the -name switch to the find command but you should still be very careful with this. |
| SARCHECKDIR | The directory where the SarCheck program resides. This can be changed from the default directory of /opt/sarcheck/bin. If this is used, all other entries in the /opt/sarcheck/etc/sarcheck_parms file will be ignored. Refer to the section entitled, How to move SarCheck to another directory. |
| ETCDIR | The location of the file analyze.dlr. This file will be used if we ever use resellers who want to offer their own support. There is no point in changing this parameter at this time. |
| ST | The starting time for the analysis. The default is 08:00 and this should be entered in 24 hour format. |
| EN | The ending time for the analysis. The default is 17:00 and this should be entered in 24 hour format. |
| DR | Whether to analyze sar data or a sar report. The default is 'd'. For a list of options, run the sarcheck script and see what options are on the screen when the keyword is DR. |
| OPT | How to format the report. The default is 'n'. |
| VERBOSE | Whether the output should be verbose or quiet. The default is 'v'. |
| PSELFOPT | How verbose the ps -elf output should be. This option is used primarily to increase the SarCheck's sensitivity to problems in the ps -elf data. The default is 'n'. |
| DISKFLTR | Whether or not to filter the disk analysis. Filtering sar's disk data is useful on large system with many disks when you're not using the HTML disk table option. The default is 'y'. |
| TABULAR | Whether or not to print a tabular summary at the end of the report or print a tabular summary instead of the report. The default is 'n'. |
| OUTOPT | This option controls where the output of the sarcheck script should go. The default is '1'. |
| GNUPLOT | The version of gnuplot present on your system. The default value is 3.7. |
| GNUPLOTDIR | The directory in which you've installed gnuplot. |
| GRAPHDIR | The directory in which the graphs will be stored. |
| HSIZE | Change the default width of the graphs generated by gnuplot. If you want to see graphs that are wider than the ones produced by the default width of 0.7, this keyword can be used to produce wider graphs. |
| HTMLGRAPHDIR | The directory referenced in the HTML <img> tag. |
| DMY | Change the default date format to dd/mm/yyyy |
| YMD | Change the default date format to yyyy/mm/dd |
| NORMSS | Disable the running of the rmss utility, to see if any memory is being kept from use. A few systems don't respond to rmss, causing SarCheck to hang. |
The sarcheck_parms file can also be used to change the defaults used to generate HTML output.
| Keyword | Allowed range | Default |
| BGCOLOR | Any valid color | #FFEE88 |
| TEXTCOLOR | Any valid color | black |
| REDCOLOR | Any valid color | #FF9999 |
| PINKCOLOR | Any valid color | #FFCC99 |
BGCOLOR: The background color specified in the bgcolor attribute of the HTML
tag.TEXTCOLOR: The text color specified in the text attribute of the HTML
tag.REDCOLOR: The background color specified in the bgcolor attribute of certain
PINKCOLOR: The background color specified in the bgcolor attribute of certain
These changes can be implemented using the /opt/sarcheck/etc/sarcheck_parms file.
Please note that the default values of SarCheck's thresholds have been established based on feedback from hundreds or thousands of systems and these values should not be overridden without good reason. Here is a list of thresholds which can be overridden, and the meaning of each is described below:
| Keyword | Allowed range | Default |
| AVGCPU | 50 - 100 | 80 |
| MAXCPU | 50 - 100 | 95 |
| AVGWIO | 1 - 50 | 7 |
| AVGRQ | 1 - 50 | 3.5 |
| MAXRQ | 1 - 500 | 5.0 |
| CAPCPU | 25 - 100 | 90 |
| CAPDSK | 10 - 100 | 75 |
| AVSWPQ | 0.01 - 100 | 1.0 |
| PSPGOUT | 0.01+ | 10 |
| CPULIM | 0.05 - 100 | 20 |
| MLRATE | 1+ | 200 |
| LGPROC | 32+ | formula |
| SYSUSR | 0 - 999 | 2.5 |
AVGCPU: When average CPU utilization exceeds this value, SarCheck considers the system to be busy enough to cause concern.
MAXCPU: When Peak CPU Utilization exceeds this value, SarCheck assumes that performance degradation is likely.
AVGWIO: When the average value of the sar -u %wio column exceeds this value, SarCheck looks for evidence to corroborate an I/O bottleneck.
AVGRQ: When the average length of the run queue exceeds this value, SarCheck considers it to be an indication of a CPU bottleneck.
MAXRQ: When the maximum length of the run queue exceeds this value, SarCheck assumes that performance degradation is likely.
CAPCPU: The value used to calculate the increase in CPU load that the system can support at peak times.
CAPDSK: The value used to calculate the increase in I/O load on the busiest disk that the system can support at peak times.
AVSWPQ: When the average length of the swap queue reported by sar exceeds this value, SarCheck considers memory pressure to be excessive.
PSPGOUT: The value used to decide that the page out rate to the paging spaces is high enough to indicate a shortage of memory. Some people believe that any page outs to the paging spaces indicate a lack of memory but a small value is likely to indicate a brief problem that may not have a noticeable impact on performance.
CPULIM: The threshold in computed CPU utilization SarCheck uses to decide if a runaway process has been detected in ps -elf data.
MLRATE: The threshold in kb of memory per hour used by SarCheck to decide if a memory leak has been detected in ps -elf data.
LGPROC: The minimum size of a process which SarCheck will report as being suspiciously large.
SYSUSR: The threshold used to decide if it's worth mentioning if there is an unusual amount of %sys activity relative to %usr activity. The default of 2.5 means that %sys activity needs to be at least 2.5 times greater than %usr activity for this to be reported.
It is possible to set these parameters to values which can make SarCheck's recommendations meaningless or incorrect. Please override the default values with care.
| -c | Turn off the capacity planning section. |
| -csv | Produce output in comma separated value (CSV) format. If the -html switch is used in conjunction with the -csv switch, these statistics will be printed as two HTML tables. If the -html switch is not used, the -csv switch will cause a SarCheck report to be generated with CSV output of statistics only. Disk statistics will be generated as if the -dtbl switch was used, and a tabular summary will be generated as if the -t switch was used.
Please note that the -csv switch puts parts of the SarCheck analysis into CSV format. The -gr switch is used to put the sar report and the tabular summary (see the -t switch) into CSV format. |
| -d | Print info on all disks if more than 12 disk drives were seen in sar. Because the SarCheck report will produce a paragraph on each disk, reports may get too verbose on systems with 30 or more disk devices. Without this option, SarCheck will "filter out" information on disks which are lightly used. Please note that not all versions of AIX support the collection of disk data by sar. |
| -dblp | Suppress warnings about suspiciously large database processes. |
| -dbml | Suppress warnings about possible memory leaks in database processes. |
| -dbrp | Suppress warnings about possible runaway database processes. |
| -dbusy | If the -dtbl switch is used, -dbusy will sort the disk information by average percent busy. Please note that not all versions of AIX support the collection of disk data by sar. |
| -diag | This option will add a paragraph to the report showing how full SarCheck's internal tables have become. If a table comes too close to becoming full, a message should appear in the SarCheck report asking you to send a copy of the report to support@sarcheck.com This switch will also print the exact command used to produce the report. |
| -dmy | This switch causes the date format used in the SarCheck report to appear in the format dd/mm/yyyy. |
| -dnz | Suppress the reporting of disks with no activity. This option is most likely to be useful when SarCheck is used on systems with thousands of disk devices. In one case where data on all 2,505 disks were reported in an HTML report using both tables and text, the size of report approached one megabyte, The size of the report was reduced by 90 percent with this switch. |
| -dserv | If the -dtbl switch is used, -dserv will sort the disk information by average service time. Please note that not all versions of AIX support the collection of disk data by sar and versions that do report disk statistics still have problems with service time. When the AIX implementation of sar supports this, we will be ready. |
| -dtbl | If the -html switch is used, -dtbl will produce a table of disk statistics instead of generating a paragraph on each disk. Cells in the table will be color coded to highlight interesting disk statistics. This option is recommended for large systems where 50 or more individual paragraphs on disk activity would be hard to comprehend. Please note that not all versions of AIX support the collection of disk data by sar.
If the -html switch is not used, -dtbl will cause disk statistics to be output in a comma separated value (CSV) format. CSV output should generally be produced with the -csv switch, but it can be done by using -dtbl too. |
| -dtoo | If the -html switch is used, -dtoo will produce a table of disk statistics in addition to generating a paragraph on each disk. Cells in the table will be color coded to highlight interesting disk statistics and will link to the appropriate paragraph. Please note that not all versions of AIX support the collection of disk data by sar.
If the -html switch is not used, -dtoo will cause disk statistics to be output in a comma separated value (CSV) format. In addition, the -csv switch generates a paragraph on each disk. CSV output should generally be produced with the -csv switch, but it can also be done by using -dtoo. |
| -en | Specify the ending time for data to be analyzed in a 24 hour format. Specifying 17 will cause data through 17:00:00 to be analyzed, and specifying 17:30 will cause analysis to stop with any data after 17:30:00. This switch will work on single day or multiple days of data and is usually used in conjunction with the -st switch. The default for this value is controlled by the sarcheck_parms keyword EN. |
| -g24 | This switch will change the appearance of multiday graphs. It changes the graph to be displayed with a 24-hour x axis and data from different days to be superimposed. This can help to spot activity that occurs at the same time each day. |
| -gd | Change the directory in which SarCheck puts the graphs generated by gnuplot. |
| -gonly | Produce graphs only. This switch should be used together with the -jpeg, -jpg, or -png switches. The names of the graphs produced will be sent to stdout and no report will be produced. |
| -gr | Produce output which can be used by graphing tools. While the output is in comma separated value (CSV) format, this option is different from the -csv switch because it reformats the sar report instead of reformatting SarCheck's analysis. |
| -h | Displays brief instructions and shows all of the possible switches. |
| -hg | How to produce graphs using the -jpg, -jpeg, and -png switches. |
| -hgd | Change the directory where the graphs appear to be in the HTML output's <img> tags. |
| -hm | How to analyze multiple days of sar data. |
| -hp | How to analyze supplemental ps -elf data. |
| -html | Insert HTML tags in text for use by a browser. The -dtbl, -dtoo, -dserv, -dbusy, -ptbl, -ptoo, -t, -png and -jpg switches are likely to be of interest to you if you're using -html. |
| -jpeg or -jpg | These switches will cause SarCheck to look for gnuplot and use it to produce graphs in JPEG format. The naming convention used by SarCheck will append either ".jpeg" or ".jpg" to the file name of the graph, depending on the switch you use. The creation of JPEG formatted graphs uses less CPU time than the creation of PNG formatted graphs. JPEG formatted graphs are also larger and do not look as crisp as PNG graphs, but they are much more likely to display correctly with older browsers. |
| -k | Allows you to change the activation key and software expiration date. |
| -mdy | Force the default mm/dd/yyyy date format to be used if it's overridden by the use of a non-English text file or entries of DMY or YMD in the sarcheck_parms file. |
| -noparms | Ignore the contents of the sarcheck_parms file when generating the report. |
| -normss | Don't run the rmss program to see if any memory is being kept from use. A few systems don't respond to rmss, causing SarCheck to hang. |
| -o | Prints an order/registration form for those wishing to purchase a software license, or register their licensed software. |
| -p | Suppress page numbering & page breaks. This is especially useful when the output is piped to pg. |
| -ps | Incorporate the analysis of a single ps -elf file called /opt/sarcheck/ps/yyyymmdd where the date is extracted from the sar data. |
| -pd | Change the directory in which SarCheck expects to find ps -elf data. SarCheck will still determine the name of the ps -elf data file and the purpose of this switch is to allow you to store ps -elf data wherever you want. This data can take up a considerable amount of space. |
| -pf | Include analysis of a specified file containing ps -elf data |
| -pv | Verbose analysis of ps -elf data, overridden by the -Q and -q switches. |
| -plp | Suppress warnings about suspiciously large processes. |
| -pml | Suppress warnings about possible meory leaks. |
| -prp | Suppress warnings about possible runaway processes. |
| -png | This switch will cause SarCheck to look for gnuplot and use it to produce graphs in PNG format. The naming convention used by SarCheck will append ".png" to the file name of the graph. The creation of PNG formatted graphs takes more CPU time on AIX. PNG formatted graphs are also smaller and look cleaner than JPEG graphs, but may not display correctly with older browsers. |
| -ptbl | If the -html switch is used, -ptbl will produce a table of ps -elf statistics instead of generating a paragraph on each process whose resource utilization exceeds the threshold. Cells in the table will be color coded to highlight the interesting statistics. This option is recommended for systems where a large number of individual paragraphs would be hard to comprehend.
If the -html switch is not used, -ptbl will cause ps -elf statistics to be output in a comma separated value (CSV) format. CSV output should generally be produced with the -csv switch, but it can be done by using -ptbl too. |
| -ptoo | If the -html switch is used, -ptoo will produce a table of ps -elf statistics in addition to generating a paragraph on each process whose resource utilization exceeds the threshold. Cells in the table will be color coded to highlight interesting statistics.
If the -html switch is not used, -ptoo will cause ps -elf statistics to be output in a comma separated value (CSV) format. In addition, the -csv switch generates a paragraph on each process whose resource utilization exceeds the threshold. CSV output should generally be produced with the -csv switch, but it can also be done by using -ptoo. |
| -Q | Print a non-verbose (super-Quiet) analysis. This option automatically sets the -p option. |
| -q | Print a less verbose (quiet) analysis. |
| -r | Print an analysis only if recommendations are made. |
| -ret0 | Force a return code of zero. The analyze program normally returns zero if no recommendations are made and one if it makes recommendations. This option exists because some scheduling tools report non-zero return codes as errors or exceptional conditions. |
| -s | Display all the information needed to activate SarCheck. |
| -st | Specify the starting time for data to be analyzed in a 24 hour format. Specifying 09 (or just 9) will cause data starting at 09:00:00 to be analyzed, and specifying 9:30 will cause analysis to start with any data collected at or after 09:30:00. This switch will work on a single day or multiple days of data and is usually used in conjunction with the -en switch. The default for this value is controlled by the sarcheck_parms keyword ST. |
| -summ | Display only the text summary at the beginning of the SarCheck report. |
| -t | This option will produce a summary of interesting statistics in a tabular format. This output can be parsed with relative ease. If the -html switch is used, the statistics will be presented in an HTML table, and cells in the table will be color coded to highlight noteworthy statistics. This option works well with -dtbl. |
| -tonly | This option will produce nothing but a summary of interesting statistics in a tabular format. All recommendations, analysis, and other hopefully interesting text will vanish. If the -html switch is used, the statistics will be presented in an HTML table, and cells in the table will be color coded to highlight noteworthy statistics. |
| -w | Suppress page breaks and newline characters, primarily for export to PC-based word processing programs. |
| -wide | Change the width of the graphs generated by gnuplot. If you want to see graphs that are wider than the ones produced by the default width of 0.7, this switch can be used to produce wider graphs. For more flexibility, use the sarcheck_parms keyword HSIZE. |
| -ymd | This switch causes the date format used in the SarCheck report to appear in the format yyyy/mm/dd. |