******************************************************************* * * * README for FMS: Fabric Management System for Myrinet Networks * * * ******************************************************************* =================================================== Overview The fabric management system is a collection of tools and processes used to manage a Myrinet network. This system relies on a database formed by a collection of files which describe how the network is connected. Since there is a description of how the network is supposed to look, it is possible to report discrepancies between the observed network state and the desired network state. For example, without this description, a failed switch-to-switch link could be routed around, but could not easily be reported as missing, since there would be no way of knowing the link is supposed to be there. The two primary processes in the Fabric Management System are the fm_server (Fabric Management Server) and fma (Fabric Management Agent). fm_server runs on one machine and serves as a focal point for all management activity on the fabric. All errors are reported to fm_server, and are available for viewing through a variety of means. There is one fma process on each Myrinet node. This process replaces the previous mapper (mx_mapper or gm_mapper) and expands its functionality. Any errors noticed by the fma process are reported to the fm_server and are then made available to system operators. The fabric description database consists of several files in a directory, all of which are easily human-readable and editable. These files are read and written by the fm_server process. The current file list is: enclosures.csv - a list of all enclosures in the system linecards.csv - a list of all linecards, and where they are hosts.csv - a list of all the hosts in the fabric nics.csv - a list of all the NICs in the fabric links.csv - a list of all the fiber connections in the system When error conditions are detected in the fabric, alerts are generated by the fm_server process. These can be queried on a regular basis, or fm_server can be configured to proactively report alerts through a user-defined mechanism. Error Reporting by FMS Errors detected by or reported to the FMS process are reported to system administrators through "alerts," which are discussed further in Appendix B below. Alerts can be queried remotely via the command "fm_show_alerts", via web CGI scripts, via log file monitoring, or the FMS can be configured to run a user-specified command whenever an alert occurs. The FMS can detect exceptional conditions, either errors or warnings, by monitoring the switch enclosures and through communication with the fma processes running on each node. Errors Detected Directly By FMS The FMS process periodically polls the switch enclosures to monitor link status and environmental conditions. The most common problems reported by the switches are down links, noisy links, and overly high operating temperatures. The FMS monitors the connection status of all fma processes in the system. The absense of an fma connection from a host which is expected to be present generates an alert, as does the loss of connectivity to any fma. Switch enclosures report the up/down status of each link. The FMS compares this with expected link status and generates an alert if appropriate. If a link transitions from up to down too many times in a given time period, the link is marked as "flaky" and an alert is generated. The badcrc counters on each link are monitored, and too many badcrcs within a given time period generates an alert. If the temperature reported by a linecard exceeds a set threshold, an alert is generated. This threshold defaults to a value that is below the shutdown temperature of the linecard, but if higher than should be seen in practice. The thresholds for all of these alerts can be controlled via the "fm_settings" command. Errors Detected By FMA The fma continuously monitors the NICs in a host for several conditions. A CRC error rate which exceeds a set threshold will generate an alert, as will SRAM parity errors in the NIC. The fabric is continually verified by the fma processes, and any change in fabric topology is reported to the FMS. Depending on the source of the change, this may result in an alert being generated. =================================================== Quickstart for FMS/FMA The FMS/FMA combination is used as a complete replacement for the mapper. The FMS is a process which manages the fabric. It need not have access to the Myrinet fabric, but it must be on a machine with IP access to each compute node and to the Myrinet switches used in the fabric. Since FMS has a socket connection to each fma and also to each switch enclosure, make sure the system running fm_server will allow enough open file descriptors for every node in your fabric, plus every enclosure, plus another fifty or more to be safe. (somtimes a disconnected fma may reconnect before the OS realizes the previously used socket is now closed) Pick a machine to run the FMS on, and install the FMS package as above. Start the program "fm_server" with the flag "-d" to cause it to run in the background as a daemon. fm_server will print the name of its logfile for confirmation. $ fm_server -d Each compute node should also have a copy of the installation directory (which is read-only) but does not need the writeable run directory. Start the FMA on each compute node as: $ fma -d -s You can restart either the fm_server or any or all fmas without needing to restart the other processes, so you should really only need to start the fm_server once. Use the fm_status command to see the current status of the fm_server. $ fm_status =================================================== Appendix A - Program Usage The following is a list of programs that work within the FMS environment. fm_server - The Fabric Management Server fma - The Fabric Management Agent fm_switch - Manage the list of enclosures in the database. fm_status - Show current fabric status. fm_db2wirelist - prints out a list of switch connections in a nice format. fm_show_alerts - Print a list of currently active alerts. fm_ack_alert - Acknowledge an alert. fm_maint - Place a linecard into or out of maintenance mode. fm_walkroute - show the path through the fabric a given route will take fm_fixup_db - May need to be run after a software upgrade which changes the database format. All tools will look for these files by default in /var/run/fms/database. The location and name of the database directory can be overridden by environment variables or command line arguments. In order to easily support new hardware types produced by Myricom in the future, the description of all the hardware products is table driven. These tables are included as part of the fabric management installation, and their location can also be changed through environment variables or command line arguments. The default directory for the fabric management system is /opt/fms. Most tools have the following in common: Fabric Management System run directory defaults to "/var/run/fms" The environment variable FMS_RUN overrides this default. The command line option "-R" overrides both of these. The fabric description is kept in / database name defaults to "database" The environment variable FMS_DB_NAME overrides this default. The command line option "-N" overrides both of these. 1. Server/Agent processes fm_server - The fabric server, fm_server, is run on a server which need not be part of the Myrinet fabric, but must have IP connectivity to all nodes in the fabric and to the monitoring linecards in all the switches. The fm_server process must have filesystem access to the database files. fm_server [ -d ] - run in background as daemon [ -p ] - port on which to listen, default 3333 [ -s ] - use syslog for messages in addition to log file [ -D ] - enable debug output [ -R ] [ -N ] - run dir and DB name [ -h ] - print usage message [ -V ] - print FMS version fma - This runs as a persistent agent on each Myrinet node. The fma node must have IP connectivity to the node on which fm_server is running, but need not have access to the database files. The name of the fm_server's host is obtained from, in order of increasing precedence: --with-fms= from configure command FMS_SERVER= from "make install" command The environment variable "FMS_SERVER" The command line argument "-s" fma [ -s ] - name of the node running the fm_server [ -d ] - run in background as daemon [ -m ] - Myrinet only network (no Myri-over-ethernet) [ -p ] - run using partitions (no standalone mode) [ -x ] - fabric consists of ID-less xbars only [ -i ] - fabric consists of xbars with IDs only [ -l ] - set mapping level [ -D ] - enable debug output [ -R ] - run dir [ -L ] - log filename, in if relative, "-" for stdout [ -h ] - print usage message [ -V ] - print FMS version -i can be used to speed mapping with you have a fabric consisting of only xbars with IDs (16-port 2G xbars are the only xbars without IDs) -x can be used to speed mapping if your fabric consists of ONLY xbars with no IDs (all 16-port 2G xbars) -m can be used to speed mapping when you have a fabric that has Myrinet Ethernet Gateways in it, but you are not running Myrinet-over-Ethernet on any nodes. 2. Database commands - All of these commands are run on a node which has filesystem access to the database files. fm_switch - fm_switch is used to view and manage the list of enclosures in the fabric. This command should not be used to modify the list of enclosures while fm_server is running - it is intended as a setup command to be used before the fm_server is started. fm_switch [ -a ] - add to the database of switches [ -d ] - remove from the database of switches [ -R ] [ -N ] - run dir and DB name [ -h ] - print usage message [ -V ] - print FMS version fm_db2wirelist - This reads the database of connections and prints out a list of the contents of each switch's slots and where everything is connected. Running and reviewing this as a first step after creating the database is a good way to notice links that are out. This is run on a node which has filesystem access to the database files. Since this program does not modify the database, it can be run at any time. fm_db2wirelist [ -R ] [ -N ] - run dir and DB name [ -h ] - print usage message [ -V ] - print FMS version fm_settings - This is used to view and change settings for the FMS. This should be run to change parameters while the FMS is not running, and the FMS will pick up the new values when it restarts. These parameters are saves in the database table "fms_settings". fm_settings [ -p ] - set to [ -l ] - list all parameters and their values [ -R ] [ -N ] - run dir and DB name [ -h ] - print usage message [ -V ] - print FMS version fm_walkroute - This is used to see the exact path a packet takes through the network given a starting host and a sequence of route bytes. This is run on a node which has filesystem access to the database files. Since this program does not modify the database, it can be run at any time. fm_walkroute [ -f ] - the host from which to start [ -n ] - which NIC to use [ -p ] - which port on the NIC to use [ -g ] - give extra-gory details about internal links [ -R ] [ -N ] - run dir and DB name [ -h ] - print usage message [ -V ] - print FMS version Example usage: $ fm_walkroute -f host1 -- bd bf 90 91 Walking route: -3 -1 16 17 from host host1 nic 0 host host1 nic 0, rail 0 - switch1, slot 15, port 9 switch1, slot 14, port 6 - switch2, slot 9, port 28 switch2, slot 6, port 22 - host host37, nic 0, rail 0 fm_fixup_db - May need to be run after a software upgrade which changes the database format. The database is read and re-written if any changes are needed. If no changes are needed, the database is left untouched. fm_fixup_db [ -R ] [ -N ] - run dir and DB name [ -h ] - print usage message [ -V ] - print FMS version 3. FMS Client Commands - These programs make IP queries to the fm_server and need only be run on nodes which have IP access to the fm_server. The fm_server must be running for these commands to work. fm_status - Print a summary of fm_server status. fm_status [ -s ] - address of node with fm_server process [ -h ] - print usage message [ -V ] - print FMS version Example: $ fm_status FMS Fabric status 32 hosts known 31 FMAs found 1 un-ACKed alerts Mapping is complete, last map generated by fog20 Database is complete "hosts known" is the count of all hosts in the database "FMAs found" is the number of FMAs currently in contact with fm_server "un-ACKed alerts" is a count of alerts not yet ACKed (see fm_show_alerts) "Mapping is complete/in progress" tells whether mapping activity is occurring at this moment. "Database is/is not complete" tells whether the resolution of xbars found by mapping into specific linecards is complete yet or not. fm_show_alerts - Print a list of active alerts. By default, this prints only alerts which have not been ACKed and are not relics. (See appendix on "Alerts" below). Each alert has a unique index which can be passed to fm_ack_alert to acknowledge the alert. fm_show_alerts [ -a ] - show ACKed alerts also (marked with [A]) [ -r ] - show relic alerts also (marked with [R]) [ -s ] - address of node with fm_server process [ -h ] - print usage message [ -V ] - print FMS version fm_ack_alert - Acknowledge an alert. This marks an alert as ACKed, possibly causing its deletion. (See appendix on "Alerts" below). fm_ack_alert [ -i ] - ACK alert with ID [ -s ] - address of node with fm_server process [ -h ] - print usage message [ -V ] - print FMS version fm_maint - Place a linecard into or out of maintenance mode. While in maintenance mode, the xbars on a linecard will be treated as though they do not exist. fm_maint [ -h ] - print this help message -e - switch enclosure name -l - slot number of linecard to maintain [ -p ] - port number to maintain -m { up | down } - set the state to up or down [ -s ] - address of node with fm_server process [ -h ] - print usage message [ -V ] - print FMS version This is useful for working on a linecard while jobs are in progress since no remapping is caused. If optional port number is not specified, then all ports on the linecard are affected. Appendix B - Alerts Alerts are created when certain exceptional events occur and are reported to the fm_server. Alerts persist within the fms until they are cleared. Clearing usually requires the alert to be acknowledged (ACKed) and for the condition which caused the alert to have cleared. Once the alert has been acknowledged, it is marked as "ACKed". Once the condition that caused the alert has cleared, we mark it as a "relic." Most alerts are deleted only after they have been both relic-ed and ACKed. The following is a list of all alerts and their meanings. The "flags" line for each alert type may contain "NEEDS_ACK" or "ACK_ONLY" or both. If NEED_ACK is present, once the alert becomes a relic, it still needs an ACK before it is deleted entirely. If NEED_ACK is not present, the alert is deleted as soon as it becomes a relic. If ACK_ONLY is specified, the event is deleted as soon as it is ACKed. Without this flag, the alert will persist until becoming a relic, even after it has been ACKed. HOST_NO_INITIAL_FMA Description: No FMA connection has been established since FMS was started Initiated_by: timeout waiting for FMA connection Cancelled_by: attachment to FMA Flags: struct { lf_string_t hostname; } Format: "Have never gotten FMA contact from %s" Args: hostname HOST_LOST_FMA Description: Connectivity was lost to the FMA on a host Initiated_by: A connection to a running FMA was lost Cancelled_by: re-attachment to FMA Flags: NEED_ACK struct { lf_string_t hostname; } Format: "Lost FMA contact from %s" Args: hostname HOST_LINK_DOWN Description: A myrinet link between a host and switch is disconnected Initiated_by: inability to pass traffic through a link Cancelled_by: resumption of traffic through the link or removal of the link Flags: NEED_ACK struct { lf_string_t hostname; uint16_t nic; uint16_t nic_interface; lf_string_t enclosure; uint16_t slot; uint16_t port; uint16_t subport; } Format: "Link from %s, nic %d:p%d to %s, slot %d, port %d:%d is down" Args: hostname nic nic_interface enclosure slot port subport HOST_PORT_DOWN Description: A myrinet port on a host is down, other end unknown Initiated_by: inability to pass traffic through a link Cancelled_by: resumption of traffic through the link Flags: NEED_ACK struct { lf_string_t hostname; uint16_t nic; uint16_t nic_port; } Format: "Link from %s, nic %d, p%d is disconnected" Args: hostname nic nic_port HOST_SRAM_PARITY_ERROR Description: A NIC on a host has experienced an SRAM parity error Initiated_by: SRAM parity error reported by NIC Cancelled_by: ACK Flags: ACK_ONLY NEED_ACK struct { lf_string_t hostname; uint32_t nic_id; lf_string_t serial_no; } Format: "%s, NIC %d (serial_no=%s) got an SRAM Parity Error" Args: hostname nic_id serial_no HOST_FIRMWARE_DIED Description: A NIC on a host has stopped responding Initiated_by: NIC error reported by Myri interface Cancelled_by: ACK Flags: ACK_ONLY NEED_ACK struct { lf_string_t hostname; uint32_t nic_id; lf_string_t serial_no; } Format: "%s, NIC %d (serial_no=%s), is not responding" Args: hostname nic_id serial_no HOST_SWITCH_LINK_BADCRC_COUNT Description: A NIC port has a badcrc count that is too high. Initiated_by: An NIC port accumulates too many badcrcs over the sample period. Cancelled_by: ACKed by user Flags: NEED_ACK ACK_ONLY struct { lf_string_t hostname; uint16_t nic; uint16_t nic_interface; lf_string_t enclosure; uint16_t slot; uint16_t port; uint16_t subport; uint32_t badcrc_count; uint32_t seconds; } Format: "Link from %s, nic %d:p%d to %s, slot %d, port %d:%d: %d Bad CRC packets in %d seconds" Args: hostname nic nic_interface enclosure slot port subport badcrc_count seconds HOST_UNRECOGNIZED_NIC_TYPE Description: A NIC on a host has an unrecognized product ID Initiated_by: Inspection of NIC product ID reported by fma Cancelled_by: ACK Flags: ACK_ONLY NEED_ACK struct { lf_string_t hostname; lf_string_t product_id; uint32_t nic_id; } Format: "%s, NIC %d, unrecognized product ID \\\"%s\\\"" Args: hostname nic_id product_id SWITCH_XBARPORT_DISABLED Description: An xbar port on an enclosure has been manually disabled Initiated_by: xbar port is seen to be disabled Cancelled_by: xbar port is no longer disabled Flags: struct { lf_string_t enclosure; uint32_t slot; uint32_t xbar; uint32_t port; } Format: "Enclosure %s, slot %d, xbar %d, port %d disabled" Args: enclosure slot xbar port SWITCH_EXT_LINK_DOWN Description: An external myrinet link between two switches is disconnected Initiated_by: inability to pass traffic through a link Cancelled_by: resumption of traffic through the link or removal of the link Flags: NEED_ACK struct { lf_string_t enclosure1; uint16_t slot1; uint16_t port1; uint16_t subport1; lf_string_t enclosure2; uint16_t slot2; uint16_t port2; uint16_t subport2; } Format: "Link from %s, slot %d, port %d:%d to %s, slot %d, port %d:%d is down" Args: enclosure1 slot1 port1 subport1 enclosure2 slot2 port2 subport2 SWITCH_INT_LINK_DOWN Description: Am internal myrinet link between two xbars is disconnected Initiated_by: inability to pass traffic through a link Cancelled_by: resumption of traffic through the link or removal of the link Flags: NEED_ACK struct { lf_string_t enclosure; uint16_t slot1; uint16_t xbar1; uint16_t port1; uint16_t slot2; uint16_t xbar2; uint16_t port2; } Format: "Internal link from %s, slot %d, xbar %d, port %d to slot %d, xbar %d, port %d is down" Args: enclosure slot1 xbar1 port1 slot2 xbar2 port2 SWITCH_XBARPORT_DOWN Description: An xbar port on an enclosure is down Initiated_by: xbar port is seen to be down Cancelled_by: xbar port is no longer down Flags: NEED_ACK struct { lf_string_t enclosure; uint32_t slot; uint32_t xbar; uint32_t port; } Format: "Enclosure %s, slot %d, xbar %d, port %d is down" Args: enclosure slot xbar port SWITCH_XBARPORT_UPDOWN_COUNT Description: An xbar port has toggled state too frequently Initiated_by: xbar port changes to "down" too many times w/in sample period Cancelled_by: ACKed by user Flags: NEED_ACK struct { lf_string_t enclosure; uint32_t slot; uint32_t xbar; uint32_t port; uint32_t updown_count; uint32_t seconds; } Format: "Enclosure %s, slot %d, xbar %d, port %d: %d state changes in %d seconds, port disabled" Args: enclosure slot xbar port updown_count seconds SWITCH_XBARPORT_BADCRC_COUNT Description: An xbar port has a badcrc count that is too high. Initiated_by: An xbar accumulates too many badcrcs over the sample period. Cancelled_by: ACKed by user Flags: NEED_ACK ACK_ONLY struct { lf_string_t enclosure; uint32_t slot; uint32_t xbar; uint32_t port; uint32_t badcrc_count; uint32_t seconds; } Format: "Enclosure %s, slot %d, xbar %d, port %d: %d Bad CRC packets in %d seconds." Args: enclosure slot xbar port badcrc_count seconds SWITCH_EXT_LINK_BADCRC_COUNT Description: An link has a badcrc count that is too high. Initiated_by: An xbar port accumulates too many badcrcs over the sample period. Cancelled_by: ACKed by user Flags: NEED_ACK ACK_ONLY struct { lf_string_t enclosure1; uint16_t slot1; uint16_t port1; uint16_t subport1; lf_string_t enclosure2; uint16_t slot2; uint16_t port2; uint16_t subport2; uint32_t badcrc_count; uint32_t seconds; lf_string_t extra_text; } Format: "Link from %s, slot %d, port %d:%d to %s, slot %d, port %d:%d: %d Bad CRC packets in %d seconds%s" Args: enclosure1 slot1 port1 subport1 enclosure2 slot2 port2 subport2 badcrc_count seconds extra_text SWITCH_INT_LINK_BADCRC_COUNT Description: An internal link has a badcrc count that is too high. Initiated_by: An xbar port accumulates too many badcrcs over the sample period. Cancelled_by: ACKed by user Flags: NEED_ACK ACK_ONLY struct { lf_string_t enclosure; uint16_t slot1; uint16_t xbar1; uint16_t port1; uint16_t slot2; uint16_t xbar2; uint16_t port2; uint32_t badcrc_count; uint32_t seconds; lf_string_t extra_text; } Format: "Internal link from %s, slot %d, xbar %d, port %d to slot %d, xbar %d, port %d: %d Bad CRC packets in %d seconds%s" Args: enclosure slot1 xbar1 port1 slot2 xbar2 port2 badcrc_count seconds extra_text SWITCH_HOST_LINK_BADCRC_COUNT Description: A host link has a badcrc count that is too high. Initiated_by: An xbar port accumulates too many badcrcs over the sample period. Cancelled_by: ACKed by user Flags: NEED_ACK ACK_ONLY struct { lf_string_t enclosure; uint16_t slot; uint16_t port; uint16_t subport; lf_string_t hostname; uint16_t nic; uint16_t nic_interface; uint32_t badcrc_count; uint32_t seconds; } Format: "Link from %s, slot %d, port %d:%d to %s, nic %d:p%d: %d Bad CRC packets in %d seconds" Args: enclosure slot port subport hostname nic nic_interface badcrc_count seconds SWITCH_XCVR_DISABLED Description: A transceiver port on an enclosure has been manually disabled Initiated_by: transceiver port is seen to be disabled Cancelled_by: transceiver port is no longer disabled Flags: struct { lf_string_t enclosure; uint32_t slot; uint32_t port; } Format: "Enclosure %s, slot %d, port %d disabled" Args: enclosure slot port SWITCH_XCVR_SIGNAL_LOST Description: A transceiver port on an enclosure has lost signal Initiated_by: transceiver signal_lost noted Cancelled_by: transceiver signal_lost condition clears Flags: NEED_ACK struct { lf_string_t enclosure; uint32_t slot; uint32_t port; } Format: "Enclosure %s, slot %d, transceiver port %d lost signal" Args: enclosure slot port SWITCH_LINECARD_HOT Description: A linecard is too hot Initiated_by: observed temp is over threshold Cancelled_by: all temps are less than threshold - hysteresis value Flags: NEED_ACK struct { lf_string_t enclosure; uint32_t slot; } Format: "Enclosure %s, slot %d is running hot" Args: enclosure slot SWITCH_LINECARD_OVERTEMP Description: A linecard is so hot it shut down Initiated_by: overtemp count increased Cancelled_by: ACK only Flags: NEED_ACK ACK_ONLY struct { lf_string_t enclosure; uint32_t slot; } Format: "Enclosure %s, slot %d has experienced an overtemp shutdown" Args: enclosure slot SWITCH_CANNOT_READ Description: Cannot read data from the monitoring line card Initiated_by: inability to contact switch Cancelled_by: contact restored to switch Flags: NEED_ACK struct { lf_string_t enclosure; } Format: "Cannot contact monitoring linecard on %s" Args: enclosure SWITCH_MAINTENANCE_MODE Description: A linecard is in maintenance mode Initiated_by: User brings down a linecard for maintenance Cancelled_by: User takes linecard out of maintenance mode Flags: NEED_ACK struct { lf_string_t enclosure; uint32_t slot; } Format: "Slot %d on enclosure %s is down for maintenance." Args: slot enclosure Appendix C - Database File Formats Every DB file starts with 2 rows of column headers. The first row defines the data type of each column, and the second row defines the name of each column. enclosures.csv - defines the name and type of each enclosure string,string name,product_id clos0,M3-E128 clos1,M3-E128 spine0,M3-E128 linecards.csv - defines the type and location of every linecard string,integer,string,string enclosure_name,enclosure_slot,product_id,serial_no clos0,1,M3-SW16-8F,4936 clos0,9,M3-SW16-8F,26848 clos1,1,M3-SW16-8F,4937 spine0,1,M3-SPINE-8F,22781 hosts.csv - names every host string,string hostname,sw_version host0000,GM host0001,GM host0002,GM host0003,GM nics.csv - type and location of every NIC string,integer,MAC,integer,integer,string,string hostname,host_nic_id,mac_addr,ports,subports,serial_no,product_id host0000,0,00:60:dd:49:97:01,1,1,26848,M3F-PCIXD-2 host0001,0,00:60:dd:49:97:02,1,1,31875,M3F-PCIXD-2 host0002,0,00:60:dd:49:97:03,1,1,6878,M3F-PCIXD-2 links.csv - every link in the system string,integer,integer,integer,string,integer,integer,integer name_1,slot_1,port_1,subport_1,name_2,slot_2,port_2,subport_2 host0000,0,0,0,clos0,9,8,0 host0001,0,0,0,clos0,9,9,0 host0002,0,0,0,clos0,9,10,0 clos0,1,8,0,spine0,1,0,0 clos1,1,8,0,spine0,1,1,0 Appendix D - FMS Settings The following parameters may be set using fm_settings to control the behavior of the Fabric Management System. low_freq_monitor_interval 120 seconds [default] This specifies the "low frequency" interval for summing certain counts and comparing them to thresholds. For example, if too many badcrc counts are seen on a switch during this period, an alert will be raised. lf_badcrc_threshold 5 badcrcs [default] Maximum number of badcrcs allowed on a link during the low-frequency interval before an alert is generated. lf_fatal_badcrc_threshold 100 badcrcs [default] If more than this many badcrcs are seen on a link during the low-frequency interval, an alert is raised and the link may be disabled to traffic. very_low_freq_monitor_interval 1800 seconds [default] This specifies the "very low frequency" interval for summing certain counts and comparing them to thresholds. vlf_portflip_threshold 10 transitions [default] If a port goes up and down more than this many times during the very-low- frequency interval, an alert is raised. switch_query_interval 30 seconds [default] Interval between querying the monitoring linecards on the switch enclosures. link_verify_interval 30 seconds [default] Interval for the FMAs to veryify each link in the fabric. link_verify_timeout 100 ms [default] Time allowed for a response to a link verification packet to return before trying again or marking the link down. link_verify_retries 3 retries [default] Number of times to retry probing a link before it is marked down. nic_scout_timeout 250 ms [default] Amount of time to wait for a NIC to reply to a scout packet. nic_scout_retries 3 retries [default] Number of retries before giving up on scouting a NIC. nic_query_interval 60 seconds [default] Frequency with which NICs should verify each other's presence on the fabric. map_request_timeout 90 seconds [default] When the FMS requests a map from an fma, this is the maximum amount of time it should wait before re-requesting from another fma. resolve_packet_send_count 500 packets [default] resolve_packet_min 240 packets [default] resolve_packet_max 1200 packets [default] resolve_retries 5 retries [default] The setting control fabric resolution when using anonymous 2G 16-port xbars. They should only be adjusted with guidance from Myricom support. alert_exec_cmd [default] The path to a command which will be executed everytime an alert is generated by fm_server. The text of the alert will be passed into stdin of this command. This command should return as soon as possible rather than lingering (for user input, for example) as the fm_server will wait() for this command to finish. preferred_mapper [default] A comma-seperated list of nodes which we prefer the FMS choose to create fabric topology maps. If empty, the FMS is free to choose any fma to generate a map. Appendix E - Legacy tools These are not in general use any longer, but still exist. fm_create_db - This takes a map file generated by gm_mapper and a list of switch names and generates the database files for use by other tools. This is much the same as the existing wirelist tool. The resulting fabric database will contain only the hosts and links included in the specified map file. Any links or hosts not present in the map file can easily be added manually afterwards. fm_create_db -s - list of switch names -l - low "invalid-route" threshold -m - gm_mapper map file [ -R ] [ -N ] - run dir and DB name switch_list is a file with one switch name or IP address per line. This should be the complete list of switches providing connectivity for the hosts in the map file, and each must have a monitoring line card installed. map_file is a map file generated by gm_mapper. For best results, it should be as complete as possible, with no links out. fm_watch_switches - fm_watch_switches periodically reads the information from all switch enclosures specified in the fabric database and reports information which may require attention. It monitors certain counts for each port, and reports any whose delta since the last iteration exceeds a specified threshold. Variables monitored for each crossbar port are: badcrcs - the number of badcrcs seen rx_timeouts - the number of receive timeouts tx_timeouts - the number of transmit timeouts fm_watch_switches can also be used to report which ports, either transceiver or crossbar ports, have been manually disabled. fm_watch_switches [ -a ] - show absolute values used to compute deltas [ -t ] - reporting threshold, default 5 [ -i ] - polling interval in seconds, default 30 [ -g ] - watch goodcrc counts [ -r ] - Show RX Timeout counts [ -d ] - report disabled links [ -R ] [ -N ] - run dir and DB name [ -V ] - print FMS version Usage: fm_watch_switches is a good diagnostic test to run when looking for links with high crc counts, especially while jobs are running since load will be high. fm_watch_switches does not interact with the fabric at all, so it is completely non-intrusive. If any links start showing excessively high badcrc rates, diagnostics should be performed (replace cable, etc.) fm_linktest - fm_linktest is used to look for failed or marginal interswitch links. It takes as input the fabric description and tests every interswitch link in the fabric individually. This includes links between crossbars inside a switch. Marginal or failed links are identified and reported. fm_linktest [ -n ] - the which NIC to use, default 0 [ -i ] - which NIC interface to use, default all [ -l ] - length of test packets in KB, default 4 [ -R ] [ -N ] - run dir and DB name -i specifies which interface to use on multi-interface NICs like E cards. This option is not normally used since all interfaces are tested by default. Usage: fm_linktest can be run periodically as a health check of the fabric. Example output: The link from clos1 / slot 2 / port 5 to spine3 / slot 6 / port 13 seems completely down, no traffic passes. Or The link from clos1 / slot 2 / port 5 to spine3 / slot 6 / port 13 seems marginal, 50 / 80 packets were successfully transmitted. Or The internal link from clos1 / slot 9 / xbar 3 / port 17 to clos1 / slot 0 / xbar 0 / slot 1 is XXX. This latter message indicates that there is definitely a card or enclosure problem. The first 2 messages indicate that cable diagnostics need to be run. fm_linktest can be run anytime to look for dead links, but when run on an active network, false positives may appear on marginal links. The option -l 4 (just a few packets) is good for looking for dead links and can be performed anytime. The option -l 4000 is primarily used for finding marginal links and should be run on an unloaded network. ===================== Appendix F - Building and Setup Building FMS with MX The FMS is integrated into the MX install package. To build and use FMS with MX, you need to specify "--enable-fms" on the MX configure line. FMS will then be installed wherever you install MX, which is /opt/mx by default. To specify the name of the FMS server (the node on which the fm_server process will be run) at configure time, use "--with-fms-server=". To defer this specification until install time, or to override it, you may install MX with "make install FMS_SERVER=". Finally, you may override the location of the FMS server manually by editing /opt/mx/bin/mx_start_mapper. The optional ":port" is used to override the default port on which to connect the fm_server process. $ configure --enable-fms --with-fms-server= $ make install =================================================== To build and install FMS as a standalone module: Download FMS. http://www.myri.com/ftp/pub/reese/fms_src.tar.gz $ tar zxf fms_src.tar.gz $ cd fma $ ./configure $ make $ make install By default, FMS assumes that MX is the low-level firmware, and that MX is installed in the directory (/opt/mx). (If the GM firmware is used, the default installation directory for GM is /opt/gm). The FMS package is installed in the default directory, /opt/fms. If you would like to specify a different FMS installation directory, you need to pass the --prefix option to configure. E.g., $ ./configure --prefix= To use GM instead of MX, or to specify an alternate install directory for GM or MX, pass the following option(s) to configure: $ ./configure --with-myri-api=gm --with-myri-install-dir= where is the installation directory for MX or GM. Note: As previously mentioned, if you would like to install FMS into a different directory, you can pass the --prefix argument to configure or specify the instalation directory at the "make install" step. $ ./configure --prefix= $ make install or: $ make install prefix= =================================================== Setup Add /opt/fms/bin to your path. FMS needs a writeable directory for keeping the fabric database, /var/run/fms is default and preferred. $ mkdir -p /var/run/fms If you need to use different directories for the FMS install dir (/opt/fms) please set and export the environment variable FMS_INSTALL. This will allow all the tools to find the directory automatically. Similarly, the default directory of /var/run/fms can be overridden by FMS_RUN. Define the Myricom switch enclosures by using the "fm_switch" command: $ fm_switch -a Where is the DNS name or IP address for the monitoring linecard for each enclosure in your fabric. Repeat for each enclosure until all are added. To see just a list of the enclosures currently defined, run: $ fm_switch If you need to remove a switch from the list, use the "-d" option: $ fm_switch -d