Operation

There are two main scenarios involved in starting up and interacting with katgpucbf and its constituent engines:

the instantiation and running of a complete end-to-end correlator, and
the invocation of individual engines (dsim, fgpu, xbgpu) for more fine-grained testing and debugging.

The first requires a mechanism to orchestrate the simultaneous spin-up of a correlator’s required components - that is, some combination of dsim(s), F-Engine(s) and XB-Engine(s). For this purpose, katgpucbf utilises the infrastructure provided by katsdpcontroller - discussed in the following section.

Regarding the testing and debugging of individual engines, more detailed explanations of their inner-workings are discussed in their respective, more dedicated-discussion documents.

The main thing to note is that, in both methods of invocation (via orchestration and individually), the engines support control via katcp commands issued to their <host>:<port>. netcat (nc) is likely the most readily-available tool for this job, but ntsh neatens up these exchanges and generally makes it easier to interact with.

katsdpcontroller

This package (katgpucbf) provides the components of a correlator (engines and simulators), but not the mechanisms to start up and orchestrate all the components as a cohesive unit. That is provided by katsdpcontroller.

For production use it is strongly recommended that katsdpcontroller is used to manage the correlator. Nevertheless, it is possible to run the individual pieces manually, or to implement an alternative controller. The remaining sections in this chapter describe the interfaces that are used by katsdpcontroller to communicate with the correlator components.

There are two parts to katsdpcontroller: a master controller and a product controller. There is a single product controller per instantiated correlator. It is responsible for:

starting up the appropriate correlator components with suitable arguments, given a high-level description of the desired correlator configuration;
monitoring the health of those components;
registering them with Consul, so that infrastructure such as Prometheus can discover them;
proxying their katcp sensors, so that clients need only subscribe to sensors from the product controller rather than individual components;
in some cases, aggregating or renaming those sensors, to present a correlator-wide suite of sensors, without clients needing to know about the individual engines;
providing additional correlator-wide katcp sensors;
providing correlator-wide katcp requests, which are implemented by issuing similar but finer-grained requests to the individual engines.

The master controller manages product controllers (and hence correlators), starting them up and shutting them down on request from the user. In a system supporting subarrays, there will typically be a single master controller and zero or more product controllers at any one time.

It is worth noting that katsdpcontroller was originally written to control the MeerKAT Science Data Processor and later extended to control correlators, so it has a number of features, requests and sensors that are not relevant to correlators.

Starting the correlator

The katgpucbf repository comes with a scratch/ directory, under which you will find handy scripts for correlator and engine invocation. Granted, the layout and usage of these scripts is tailored to SARAO DSP’s internal lab development environment (e.g. host and interface names) and don’t necessarily go through the same reviewing rigour as the actual codebase. For these reasons, it is recommended that these scripts are used more as an example of how to run components of katgpucbf, rather than set-in-stone modi operandi.

End-to-end correlator startup

If you intend on starting up a correlator with sim_correlator.py, you will require a running master controller in accordance with katsdpcontroller. The script itself provides an array of options for you to start your correlator; running ./sim_correlator.py --help gives a brief explanation of the arguments required. Below is an example of a full command to run a 4k, 4-antenna, L-band correlator:

./sim_correlator -a 4 -c 4096 -i 0.5
--adc-sample-rate 1712e6
--name my_test_correlator
--image-override katgpucbf:harbor.sdp.kat.ac.za/dpp/katgpucbf:latest
lab5.sdp.kat.ac.za

The execution of this command contacts the master controller to request a new correlator product to be configured. The master controller figures out how many of each respective engine is required based on these input parameters, and launches them accordingly across the pool of processing nodes available.

Individual engine startup

The arguments required for individual engine invocation can be seen by running one of {dsim, fgpu, xbgpu} --help in an appropriately-configured terminal environment. There are a few mandatory ones, and ultimately stitching the entire incantation together by hand can become tiresome. For this reason, the scripts under scratch/{fgpu, xbgpu} have been shipped with the module.

The scripts for standalone engine usage are prepopulated with typical configuration values for your convenience, and are usually named run-{dsim, fgpu, xbgpu}.sh. It is important to note that the F- and XB-Engines can run in a standalone manner, but will require some form of stimulus to truly exercise the engine. For example, fgpu requires a corresponding dsim to produce data for ingest. Similarly, xbgpu requires an appropriately-configured fsim. Basically, the engines will do nothing until explicitly asked to.

Todo

NGC-730 Update scratch directory to have a single config sub-directory. Also add comments on the scripts themselves to make it easier to follow.

Note

Before considering which engine you intend on testing, note the number of GPUs available in the target processing node. The CUDA library acknowledges the presence of a CUDA_VISIBLE_DEVICES environment variable, similar to that discussed by katsdpsigproc. You can simply export CUDA_VISIBLE_DEVICES=0 in your terminal environment for the engine invocation to acknowledge your intention of using a particular GPU.

To test a 4k, 4-antenna XB-Engine processing L-band data, use the following commands in separate terminals on two separate servers. This will launch a single F-Engine Packet Simulator on host1 and a single xbgpu instance on host2:

[Connect to host1 and activate the local virtual environment]
(katgpucbf) user@host1:~/katgpucbf$ spead2_net_raw fsim --interface <interface name> --ibv \
                                    --array-size 4 --channels 4096 \
                                    --channels-per-substream 1024 \
                                    239.10.10.10+1:7148
.
.
.
[Connect to host2 and activate the local virtual environment]
(katgpucbf) user@host2:~/katgpucbf$ spead2_net_raw numactl -C 1 xbgpu \
                                    --src-affinity 0 --src-comp-vector 0 \
                                    --dst-affinity 1 --dst-comp-vector 1 \
                                    --src-interface <interface name> \
                                    --dst-interface <interface name> \
                                    --src-ibv --dst-ibv \
                                    --adc-sample-rate 1712e6 --array-size 4 \
                                    --channels 4096 \
                                    --channels-per-substream 1024 \
                                    --samples-between-spectra 8192 \
                                    --katcp-port 7150 \
                                    239.10.10.10:7148 239.10.11.10:7148

Naturally, it is up to the user to ensure command-line parameters are consistent across the components under test, e.g. using the same --array-size is for the data generated (in the fsim) and the xbgpu instance.

Note

ibverbs requires CAP_NET_RAW capability on Linux hosts. See spead2’s discussion on ensuring this is configured correctly for your usage.

Pinning thread affinities

Todo

NGC-730 Update run-{dsim, fpgu, xbgpu}.sh scripts to standardise over usage of either numactl or taskset.

spead2’s performance tuning discussion outlines the need to set the affinity of all threads that aren’t specifically pinned by --{src, dst}-affinity. This is often the main Python thread, but libraries like CUDA tend to spin up helper threads.

Testing without a high-speed data network

katgpucbf allows the user to develop, debug and test its engines without the use of a high-speed e.g. 100GbE data network. The omission of --{src, dst}-ibv command-line parameters avoids receiving data via the Infiniband Verbs API. This means that if you wish to e.g. capture engine data on a machine that doesn’t support ibverbs, you could use tcpdump(8).

Note

The data rates you intend to process are still limited by the NIC in your host machine. To truly take advantage of running engines without a high-speed data network, consider reducing the --adc-sample-rate by e.g. a factor of ten as this value greatly affects the engine’s data transmission rate.

Controlling the correlator

The correlator components are controlled using katcp. A user can connect to the <host>:<port> and issue a ?help to see the full range of commands available. The <host> and <port> values for individual engines are configurable at runtime, whereas the <host> and <port> values for the correlator’s product controller are yielded by the master controller after startup. Standard katcp requests (such as querying and subscribing to sensors) are not covered here; only application-specific requests are listed. Sensors are described in katcp sensors.

dsim

?signals spec [period]

Change the signals that are generated. The signal specification is described in Signal specification. The resulting signal will be periodic with a period of period samples. The given period must divide into the --max-period command-line argument, which is also the default period if none is specified.

The dither that is applied is cached on startup, but is independent for the different streams. Repeating the same command thus gives the same results, provided any randomised terms (such as wgn) use fixed seeds.

It returns an ADC timestamp, which indicates the next sample which is generated with the new signals. This is kept for backwards compatibility, but the same information can be found in the steady-state-timestamp sensor.

?time

Return the current UNIX timestamp on the server running the dsim. This can be used to get an approximate idea of which data is in flight, without depending on the dsim host and the client having synchronised clocks.

fgpu

?gain stream input [values...]: Set the complex gains. This has the same semantics as the equivalent katsdpcontroller command, but input must be 0 or 1 to select the input polarisation.
?gain-all stream values...: Set the complex gains for both inputs. This has the same semantics as the equivalent katsdpcontroller command.
?delays stream start-time values...: Set the delay polynomials. This has the same semantics as the equivalent katsdpcontroller command, but takes exactly two delay model specifications (for the two polarisations).

xbgpu

?capture-start, ?capture-stop: Enable or disable transmission of output data. This does not affect transmission of descriptors, which cannot be disabled. In the initial state transmission is disabled, unless the --tx-enabled command-line option has been passed.

Shutting down the correlator

End-to-end correlator shutdown

A user can issue a ?product-deconfigure command to the correlator’s product controller by connecting to its <host>:<port>. This command triggers the stop procedure of all engines and dsims running in the target correlator. More specifically:

the product controller instructs the orchestration software to stop the containers running the engines,
which is received by the engines as a SIGTERM,
finally triggering a halt in the engines for a graceful shutdown.

The shutdown procedures are broadly similar between the dsim, fgpu and xbgpu. Ultimately they all:

finish calculations on data currently in their pipelines,
stop the transmission of their SPEAD descriptors, and
in the case of fgpu and xbgpu, stop their spead2 receivers, which allows for a more natural ending of internal processing operations.

Individual engine shutdown

Once you’ve sufficiently tested, debugged and/or reached the desired level of confusion, there are two options for engine shutdown:

simply issue a Ctrl + C in the terminal window where the engine was invoked, or
connect to the engine’s <host>:<port> and issue a ?halt.

After either of these approaches are executed, the engine will shutdown cleanly and quietly according to their common Shutdown procedures. As the F-Engine Packet Simulator is a simple CLI utility, the fsim just requires a Ctrl + C to end operations - no katcp commands supported here.