1. Description

The purpose of this software is to monitor various properties and metrics of networked computers, such as CPU and memory usage, network interface utilization, hard drive load, and so forth. The monitoring server runs an OpenGL accelerated graphics engine, which is meant to be run on a dedicated monitor in full screen. Clients connect to the server via a TCP/IP connection, and keep the server notified about their state. The TCP/IP connections can be established over the internet, but as no authentication is yet implemented, its intended use is to monitor private subnets.

Client modules can be programmed for any platform, providing whatever information is desired. Currently, I have implemented clients for a couple of the most popular x86 Unices (see implemented clients). The clients are very easy to write after the desired information is dug out (from the kernel, for example), so more clients are to come as I feel a need for them.

thumbnail Currently there's only one screen capture, as it adequately demonstrates the current look of the software. The visual elements are background shaded based on which host's information it represents. The additional red background on thunder's CPU utilization is because it has exceeded its critical limit (50%, adjustable).

2. Visualisation

2.1. Graphics engine

As the software is often ran in a server or a workstation that's already in use and has a spare video output (thus avoiding an extra dedicated computer), the graphics engine needs to be as light as possible to avoid any unneeded load on the host. The graphics rendering was originally done in software, but as the project matured the engine began to occupy unreasonable amounts of CPU time (around 20% in my server at that time), and I decided to port it to OpenGL, and let a GPU do some of the work.

The OpenGL version of the engine is indeed a lot lighter for the CPU. In fact, when I ran the program on my workstation, its CPU utilization was below 0.1%. It seems, however, that some OpenGL implementations use busy loops to wait for the vertical sync of the display, thus increasing the CPU utilization to around 5-7%. Obviously, this is highly dependent on the OpenGL implementation and the hardware used.

There are several optimizations done in the graphics engine. For example, there are no OpenGL texture changes during the rendering; all bitmaps are dynamically packed in one larger texture during initialization. This includes font bitmaps, as fonts are handled internally, without any dependencies on font or text libraries. Visual elements are made as dynamic as possible, and disappearing and reappearing of the objects on the display are animated. Graphics code is written in C++.

2.2. Client visualisation

Client information can be visualised using meters, lists, bars, etc. See the screen capture for demonstration. Clients report discrete information about their state, such as CPU ticks or transferred bytes of a network interface, in adjustable intervals. The graphics engine then visualises the discrete information in a continuous behaviour of a meter, for example.

The change of a value can be handled as a static and unpredictable change, which is slidden smoothly over a sin()-based transition. This is useful for a metric that remains relatively static, such as memory or file system usage. When changes are predictable, e.g. CPU ticks that are reported every second, a meter can be told to dynamically adjust to the updates so that it remains one update behind and smoothly interpolates the values between updates. Client backgrounds are colored according to the host it belongs to. Client information that is considered of critical importance, is highlighted with an additional red background.

2.3. Prioritizing

As several different hosts with multiple clients can be attached to the server at the same time, prioritizing is needed. When clients connect to the server, they define likely limits for the data they supply, and a critical limit. Clients build up priority based on their value (such as CPU utilization), changes of their value (such as a change in a file system usage), or both. This means that a host's CPU monitor, for example, will build up priority rapidly when its utilization goes from 0% to 100%, usually exceeding a threshold and resulting in it being brought on the screen.

When a client is shown on screen, it burns up its priority based on how crowded the screen is. If there are several important monitors occupying space, priority gets burned faster and the monitor doesn't get as much screen time as it would, if the screen was idling. Screen time on a server that monitors more hosts is usually more valuable, and clients earn less screen time with a given amount of priority.

If a meter continuously changes its state, it begins to get less priority for its changes. This is natural - if you constantly see lots of changes in CPU utilization or memory usage, for example due to ongoing compiler processes, the corresponding meters can be considered predictable, and their "interestingness" factor decreases. If unpredictable changes happen in idling hosts, they generate more priority and initially earn more attention than the ones being constantly stressed.

Some types of metrics also get constant increases of priority, making them appear on the screen every now and then and burning up the cumulated priority. This is ideal for file system usages, for example, which stay practically idle all the time, but a user might want to see the statistics regularly. All in all, the purpose of the prioritizing system is to display the most important information without bothering the user with excessive amounts of useless information (the level can be adjusted).

3. Clients

3.1. Operation principle

Clients are responsible for gathering the monitored information. During a client startup, the client informs its name, used visual objects (any meters, bars, etc it wants to use), and limits to the server. The client then proceeds to update the metric or metrics in question by supplying new values for the server through the established connection, until explicitly shut down.

Client-server communication is established through TCP/IP connection, using a custom, text based, protocol. If the monitored metric is increasing in nature, such as total transferred bytes through a network interface, but throughput monitoring is desired, the server can (on request) compute the throughputs automatically from the supplied raw data.

3.2. Implemented clients

Currently the clients provide the following information:

System time, user space time and total time [%]
Cache memory, buffer memory, used memory and swap usage
Receive throughput, send throughput and total throughput for the selected interfaces
Status (online/offline/lags), hostname and ping [ms] for the selected hosts
File system usage
Host name of the file system owner, mount point, total capacity, usage [%] and available capacity for local file systems
File system device activity
Throughput of the devices for the selected file systems

Currently the following clients are implemented for the listed operating systems:

CPU, Memory, Network, Ping, File system usage, File system device activity
CPU, Memory, File system usage (using GNU df)
CPU, File system usage (using GNU df)