Network reliability and controller availability - Metasys - LIT-12012458 - Field Device - 13.0

Metasys IP Networks for BACnet/IP Controllers Technical Bulletin

Document type
Technical Bulletin
Document number
LIT-12012458
Version
13.0
Revision date
2024-08-20
Product status
Active

In addition to determining the best network architecture for a deployment, it is important to determine the best method for connecting the individual IP devices to the network. For information about methods of connecting devices see Network topologies.

Connecting a device directly to the switch provides the greatest availability as the device is not affected by any other device. However, it is also the most expensive option as it requires a switch port for each device as well as a dedicated cable run from the device back to the switch. Direct connections are recommended for the following devices:
  • Devices that are mission critical (that is, air handling units serving many VAV box controllers , devices in regulated environments, VAV box controllers serving IT server rooms, and others).
  • Devices that are remotely located from other Metasys devices.

Connecting devices in a daisy-chain is a much more cost-friendly approach as it reduces both the number of required switch ports and the amount of Ethernet cable required to connect the devices. However, the availability of a device in a daisy-chain can be affected by the other devices in the chain. If a device fails or is removed from a daisy-chain, communication is lost with all the devices behind the failed device in the daisy-chain. While the devices can continue to operate independently, they do not have access to external points such as outside air temperature, nor are they able to send or receive BACnet messages to/from other controllers or the network engine .

Connecting devices using MRP (ring topology) offers the cost advantages of a daisy-chain by reducing both the number of required switch ports and the length of Ethernet cable required. However, MRP mitigates the connectivity problem by offering a redundant network connection. Under normal operating conditions, the Media Redundancy Manager sends all traffic through one of the ring ports. If there is a communication fault, the Media Redundancy Manager begins sending messages through both ports. If there is only one failure, no other controller loses network connectivity. The ring operates like two short chains until the original fault is corrected.

Redundancy is useful in the case of controller failure; it is also useful for routine maintenance such as software upgrade. Most maintenance procedures can be arranged so that only one device is off line at a time. This is a situation in which MRP provides for maximum availability and performs much better than chain.

It is possible to calculate the effect of random controller failures on the availability of the system. The figures below show the average number of controllers that are still online after a small number of random controller failures. All the data points are for a system with 100 controllers divided into chains or rings of a fixed size. The graphs show that using more chains or rings reduces the risk due to a failure. The risk for a single failure is reduced because most controllers are on chains or rings that do not have a failure. However, using more chains and rings increases the number of switch ports required, and hence the total cost of the network.

A comparison of the graphs in Figure 1 shows the advantage of rings over chains. The figure on the left is for chains. If all 100 controllers are connected as a single chain, then on average, after the first random failure only 50 controllers remain online. The figure on the right is for rings. Because MRP provides a redundant network connection, if there is one controller failure, then 99 controllers are still online. This advantage for ring is leveraged in a system with multiple rings. If the 100 controllers are arranged as two rings of 50, it is possible that the first two failures occur on different rings, so that 98 controllers are still online. If the failures occur on the same ring, more controllers are offline. However, on average two rings of 50 with two random failures still leaves the system with 90 controllers online. Of course, two rings of 50 requires four switch ports. A fair comparison requires that the system based on chains have same number of switch ports, hence same network costs. For four chains of 25 controllers with two random failures, there are on average 76 controllers still online (plotted as a gray triangle on the left figure). Thus the average number of on-line controllers much higher (90 > 76) for MRP than for chain.

As another example, using Figure 1, supposes that the system requirement is that 80% of controllers remain online after four random controllers failures, these graphs can be used to set a design rule. For chains, the network needs to be 10 chains of 10 controllers. For MRP, the network needs to be four rings of 25 controllers. Because MRP requires two ports per ring, there is a net savings of two ports (10 for chain, 8 for MRP), or 20% of the network overhead.
Figure 1. Comparison of reliability

Lab testing indicates that it is technically feasible to build long daisy-chains and MRP rings, so long that the maximum length of chain or ring will be determined by other factors. One factor is the physical extent of the network. If a chain or ring had 100 controllers separated by fifty-meter cables, then there are five kilometers of cable (a little more than three miles). There are some buildings at this scale, but most are not. And reliability becomes an issue. If five kilometers of cable carry data back to a single switch port, there is an appreciable risk that some length of cable will experience a catastrophic event that makes it unable to carry data. The trade-off between the value of reliability and the cost of a failure pushes practical systems to use shorter chains and rings.

ynn1721908215301.html#oyg1558692408693__fig_7693F171B6E94CC5B84FB3B07B9F65B7 and ynn1721908215301.html#oyg1558692408693__fig_D0851362B17F469BBAC7F23941E590EA are a guide to the relative reliability of star, chain, and MRP. They show that shorter chains/rings are more reliable than longer chains/rings; that more chains/rings is more reliable than fewer chain/rings; and that one ring is more reliable than two chains of half the length. The following factors must also be considered for both chain and ring.

  • The criticality of network connectivity to the device. Devices that are not dependent on input from other devices such that loss of network connectivity has minimal impact on the device’s ability to perform its HVAC function are good candidates to be daisy-chained or put into large rings. Conversely, devices that require a high level of connectivity availability (for example, air handling units, devices in a validated environment, and others) are not good candidates. If such devices are included in a daisy-chain or rings, they must be placed near the switch port, or for the most critical devices, connected directly to the switch.
  • The proximity of a device to other devices versus the switch. If a device is closer to the switch than to another device, then the cost of the additional cable to connect the device through a daisy-chain or through MRP must be weighed against the cost of a dedicated switch port for the device.
  • The physical capability of the device. Only devices with two Ethernet ports are candidates for daisy-chain and MRP. Devices with a single Ethernet port can only be connected at the end of the daisy-chain, which is the position at the most risk of losing connectivity to the network. These devices cannot be connected to rings.
  • The local IT policies. Many IT departments apply security templates to their switch ports that limit the number of devices that can be served by a single switch port. If a daisy-chain is to be connected to the customer’s IT network, the number of devices that can be daisy-chained may depend on IT’s willingness to loosen or waive such policies. It is unlikely that the switches that are owned by IT departments support MRP. However, by providing our own infrastructure, we have the opportunity to provide appropriate policies.
  • Leave capacity for future expansion. It is usually easier to add to a chain or ring than to find an unused switch port. An additional consideration for ring is ease of replacement. If failed components can be repaired or replaced quickly, then rings can be made larger. MRP makes the system tolerant to single failures. The maximum benefit is received when problems are addressed quickly such that there is never more than one failure in the system.