Traffic Monitor Administration

Installing Traffic Monitor

The following are hard requirements requirements for Traffic Monitor to operate:

  • CentOS 7+
  • Successful install of Traffic Ops (usually on a separate machine)
  • Administrative access to the Traffic Ops (usually on a separate machine)

These are the recommended hardware specifications for a production deployment of Traffic Monitor:

  • 8 CPUs
  • 16GB of RAM
  • It is also recommended that you know the geographic coordinates and/or mailing address of the site where the Traffic Monitor machine lives for optimal performance
  1. Enter the Traffic Monitor server into Traffic Portal

    Note

    For legacy compatibility reasons, the ‘Type’ field of a new Traffic Monitor server must be ‘RASCAL’.

  2. Make sure the FQDN of the Traffic Monitor is resolvable in DNS.

  3. Install Traffic Monitor, either from source or by installing a traffic_monitor-version string.rpm package generated by the instructions in Building Traffic Control with yum(8) or rpm(8)

  4. Configure Traffic Monitor according to Configuring Traffic Monitor

  5. Start Traffic Monitor, usually by starting its systemd(1) service

  6. Verify Traffic Monitor is running by e.g. opening your preferred web browser to port 80 on the Traffic Monitor host.

Configuring Traffic Monitor

Configuration Overview

Traffic Monitor is configured via two JSON configuration files, traffic_ops.cfg and traffic_monitor.cfg, by default located in the conf directory in the install location. traffic_ops.cfg contains Traffic Ops connection information. Specify the URL, username, and password for the instance of Traffic Ops of which this Traffic Monitor is a member. traffic_monitor.cfg contains log file locations, as well as detailed application configuration variables such as processing flush times and initial poll intervals. Once started with the correct configuration, Traffic Monitor downloads its configuration from Traffic Ops and begins polling cache server s. Once every cache server has been polled, Health Protocol state is available via RESTful JSON endpoints and a web browser UI.

Cache Polling URL

The cache servers are polled at the URL specified in the health.polling.url parameter, on the cache server’s profile.

This parameter must have the config file rascal.properties.

The value is a template with the text ${hostname} being replaced with the cache server’s Network IP (IPv4), and ${interface_name} being replaced with the cache server’s network Interface Name. For example, http://${hostname}/_astats?application=&inf.name=${interface_name}.

If the template contains a port, that port will be used, and the cache server’s HTTPS and TCP Ports will not be added.

If the template does not contain a port, then if the template starts with https the cache server’s HTTPS Port will be added, and if the template doesn’t start with https the cache server’s TCP Port will be added.

Examples:

Template http://${hostname}/_astats?application=&inf.name=${interface_name} Server IP 192.0.2.42 Server TCP Port 8080 HTTPS Port 8443 becomes http://192.0.2.42:8080/_astats?application=&inf.name=${interface_name}. Template https://${hostname}/_astats?application=&inf.name=${interface_name} Server IP 192.0.2.42 Server TCP Port 8080 HTTPS Port 8443 becomes https://192.0.2.42:8443/_astats?application=&inf.name=${interface_name}. Template http://${hostname}:1234/_astats?application=&inf.name=${interface_name} Server IP 192.0.2.42 Server TCP Port 8080 HTTPS Port 8443 becomes http://192.0.2.42:1234/_astats?application=&inf.name=${interface_name}. Template https://${hostname}:1234/_astats?application=&inf.name=${interface_name} Server IP 192.0.2.42 Server TCP Port 8080 HTTPS Port 8443 becomes https://192.0.2.42:1234/_astats?application=&inf.name=${interface_name}.

Stat and Health Flush Configuration

The Monitor has a health flush interval, a stat flush interval, and a stat buffer interval. Recall that the monitor polls both stats and health. The health poll is so small and fast, a buffer is largely unnecessary. However, in a large CDN, the stat poll may involve thousands of cache server s with thousands of stats each, or more, and CPU may be a bottleneck.

The flush intervals, health_flush_interval_ms and stat_flush_interval_ms, indicate how often to flush stats or health, if results are continuously coming in with no break. This prevents starvation. Ideally, if there is enough CPU, the flushes should never occur. The default flush times are 200 milliseconds, which is suggested as a reasonable starting point; operators may adjust them higher or lower depending on the need to get health data and stop directing client traffic to unhealthy cache server s as quickly as possible, balanced by the need to reduce CPU usage.

The stat buffer interval, stat_buffer_interval_ms, also provides a temporal buffer for stat processing. Stats will not be processed except after this interval, whereupon all pending stats will be processed, unless the flush interval occurs as a starvation safety. The stat buffer and flush intervals may be thought of as a state machine with two states: the “buffer state” accepts results until the buffer interval has elapsed, whereupon the “flush state” is entered, and results are accepted while outstanding, and processed either when no results are outstanding or the flush interval has elapsed.

Note that this means the stat buffer interval acts as “bufferbloat,” increasing the average and maximum time a cache server may be down before it is processed and marked as unhealthy. If the stat buffer interval is non-zero, the average time a cache server may be down before being marked unavailable is half the poll time plus half the stat buffer interval, and the maximum time is the poll time plus the stat buffer interval. For example, if the stat poll time is 6 seconds, and the stat buffer interval is 4 seconds, the average time a cache server may be unhealthy before being marked is \(\frac{6}{2} + \frac{4}{2} = 6\) seconds, and the maximum time is \(6+4=10\) seconds. For this reason, if operators feel the need to add a stat buffer interval, it is recommended to start with a very low duration, such as 5 milliseconds, and increase as necessary.

It is not recommended to set either flush interval to 0, regardless of the stat buffer interval. This will cause new results to be immediately processed, with little to no processing of multiple results concurrently. Result processing does not scale linearly. For example, processing 100 results at once does not cost significantly more CPU usage or time than processing 10 results at once. Thus, a flush interval which is too low will cause increased CPU usage, and potentially increased overall poll times, with little or no benefit. The default value of 200 milliseconds is recommended as a starting point for configuration tuning.

Troubleshooting and Log Files

Traffic Monitor log files are in /opt/traffic_monitor/var/log/.