- Home
- Hardware
- Software
- Business Continuity
- Business Continuity Manager
- CCI
- Copy-on-Write Snapshot
- Hitachi Backup Services Manager
- Hitachi Data Protection Suite
- Hitachi Data Retention Utility
- Hitachi Dynamic Link Manager
- Hitachi Protection Manager
- Hitachi Replication Manager
- Hitachi ShadowImage In-System Replication
- Hitachi TrueCopy Remote Replication
- Hitachi Universal Replicator
- Hitachi Virtual Tape Library
- IBM Flashcopy replication
- PAV/HPAV
- XRC Replication
- Storage Management
- Basic Operating System
- Basic Operating System V
- Hitachi Device Manager
- Hitachi Tuning Manager
- Hitachi Replication Manager
- Hitachi Global Link Manager
- Hitachi Dynamic Provisioning
- Hitachi Tiered Storage Manager
- Performance Monitor
- Resource Monitor
- Server Priority Manager
- Storage Capacity Reporter
- Hitachi Command Director
- Storage Navigator Modular 2
- Storage Services Manager
- Universal Volume Manager
- Virtual Partition Manager
- Operating Systems
- Business Continuity
- Solutions
- Education
- Forums
Supportsave
When there are suspected problems in a (Brocade) fabric one of the first thing you should collect are the supportsave files from all relevant switches in that particular fabric.
This will give the support-guys at HDS and Brocade at least a status-quo of the current situation and might give insight where a particular problem comes from and what the root cause is.
This doesn't however explain why a particular connectivity problem occurred when there are intermittent IO errors seen on hosts. The cause is that very often this is congestion related somewhere else in the fabric which clobbers up some buffer credits here and there which will have immediate consequences to one or more other hosts as well, especially when they are very busy.
The statistics counters on a switch are not started from a particular error condition but they accumulate over a certain period of time. That either when the switch is rebooted or these statistics have been manually cleared.
If you encounter these intermittent issues and cannot really pinpoint this to a hardware defect or any other physical issue you have to start thinking of congestion and loss of frames inside a fibre channel fabric.
"???? Huhhhh ???? Loss of frames??? I thought this Fibre Channel was so reliable?!?!?!?!..."
:-) Yes, it is however everything is as reliable as its weakest link so if you want to push the Mississippi river through a garden hose you will most likely get wet. (Very to say the least)
"So how do I clear those counters and what do you need???"
The best way to set this up is to schedule this once a week (or once a day) at a fixed time. ssh and or telnet scripts fired off via cron or another scheduling application are very useful for this. This way it becomes very clear when the errors started. (Providing you also save the stats before you clear them)
To clear these counters you have to use two commands
1. statsclear
2. slotstatsclear
The slotstatsclear command is not documented however very important. This command also clears statistics for backend ports which connect internally to other ports over the backplane.
If you notice an increase in errors on hosts you need to start capturing the stats. The best way to do this is again to script this and fire this of at certain intervals.
There are mainly three reports we look at:
1. porterrshow
2. slotstatsshow
3. sloterrshow
Again the last two are not documented (yet) but are very important. Below the usage summary. (Be aware that this is FOS level dependant)
Usage: slotstatsshow/sloterrshow [options [] ..]
Options Default Explanation
===================================================================
-s 1-10 slots to poll. e.g. 1-3,5,7-9
-p 3 sec. period between polls
-c 1000000 stop after this #polls
-a disabled select all ctrs; override rule in config
-r disabled show ctrs if diff. between 2 polls > 0
-i all display only these asic counters
1 - Condor, 2 - Pinball, 3 - FCIP
-x none do not display these asic counters
-u enabled disable display control ctrs
-f alternate counter definition file.
default is /etc/fabos/slotstatsshow.conf
-v enable display runtime debug messages
-d parse config. file w/o execution
-m by port# display table entries by counter type.
-e none shell cmd to execute when > 0
e.g. \"tracedump -n\"
\"supportsave -n -c\"
supportshow asic_db
-t any trigger counter specification
er_ - any ctrs of form "er_*"
er_/3//0 - ctrs on slot 3/*/0
er_/3//1 - ctrs on slot 3/*/1
-m by port# display table entries by counter type.
-h display this help message
So in short
1. clear stats once a week or once a day
2. when you notice errors start capturing
A good command sequence might be
porterrshow
slotstatsshow -c 1
sloterrshow -c5 -v -p60
This gives a one time overview of front end port errors since last clearing, one overview of ports who are most busy and 5 samples of all ASIC ports with a 1 minute interval.
Include this overview in your communication to HDS support.
Cheers,
E