Digitiser Packet Simulator
The digitiser simulator (dsim for short) is a tool that provides the same heap format as the MeerKAT digitisers, with a configurable payload.
Usage
The dsim process generates an arbitrary number of single-pol data streams.
However, it only uses a single sending thread, so in practice it does not
scale well beyond two streams (a dual-pol antenna) for typical MeerKAT
bandwidths. Instead, one can use multiple instances.
It also relies heavily on the ibverbs support in spead2 for performance at
typical MeerKAT bandwidths. It can nevertheless be used without it, but the
bandwidth will most likely need to be reduced. Pass --ibv
to
use the ibverbs acceleration.
When using multiple processes, it is usually necessary to synchronise them.
The --sync-time
specifies a time (in the past) that will correspond
to a zero timestamp. The synchronisation is accurate to about a millisecond,
provided that all threads are pinned to specific CPU cores and real-time
scheduling is used to prevent other tasks from sharing those cores. Streams
sent by the same process are interleaved in a single transmit queue, so will
be perfectly synchronised as they leave the NIC (but could be desynchronised
by a multi-path network).
By default the content of the signal is a sine wave with a fixed frequency.
However, the signal is highly configurable with the --signals
option. A domain-specific language (DSL) allows continuous waves and Gaussian
noise to be combined with basic operators (see below). The signals to send can
also be changed on the fly by issuing the ?signals
command over katcp.
Signal specification
Basics
To specify a signal, one writes an expression followed by a semi-colon. This provides the signal for a single polarisation, so must be repeated for the number of single-pol streams. For example, the following [1] generates a continuous wave on the first polarisation and noise on the second polarisation:
cw(1.0, 1e9);
wgn(0.05);
Note that the semi-colons are required. A common mistake is to forget the final semi-colon. The following functions are available:
cw(amplitude, frequency)
Continuous wave with the given amplitude and frequency (in Hz). There is currently no way to directly control phase, although the
delay
function below gives limited control.comb(amplitude, frequency)
A comb of impulses, each one sample wide with the given amplitude. Note that if the frequency doesn’t correspond to an integer number of samples, these will not be precisely periodic as each impulse time will be rounded to the nearest sample.
wgn(std [, entropy])
White Gaussian noise with given standard deviation. Optionally, one may provide a non-negative integer seed in entropy to give reproducible results.
delay(signal, delay)
Delay another signal expression by delay samples. For example,
delay(cw(1.0, 1e9), 10)
would shift the phase of a CW. Only integer numbers of samples are supported (including negative values).constant
A real number can be used as a signal, which will be used for all samples (DC).
The output magnitude is limited to the range -1 to 1, so typically the
amplitude for cw
and comb
should be at most 1, and the std for
wgn
should be much less than 1.
Operators
In addition to these basic building blocks, signals can be combined with the
operators +
, -
and *
. It should be noted that these operators can
only be used on signals: the scalar arguments like amplitude and std must
be literal constants rather than expressions.
Variables
As in Python, it is also possible to assign an expression to a variable, and use the variable several times later. This has several advantages:
It saves typing.
The common part only needs to be computed once, speeding up evaluation.
Random choices (such as in
wgn
) are “locked in” to the variable. That simplifies creation of correlated signals without needing to explicitly choose entropy values.
As an example, the following specification defines two signals which share a sine wave and some noise, and adds further noise that is uncorrelated between the polarisations:
base = cw(1.0, 1e9) + wgn(0.1);
base + wgn(0.05);
base + wgn(0.05);
Variables can only be defined once, and must be defined before they are used. As before, statements that don’t define a variable define one of the outputs, and there must be exactly one such statement per single-pol stream.
Dithering
By default, the signal is dithered as a final step, by adding random values uniformly selected from the interval [-0.5, 0.5) least significant bits. The dither values are chosen independently for each single-pol stream, so that they are uncorrelated.
Dithering can be disabled for an output by wrapping the expression in
nodither()
. A nodither
signal can be assigned to a variable, but it
cannot be combined with other signals using operators nor modified using
delay
.
Design
Signal generation
It would be extremely challenging for a CPU to simulate a signal in real-time,
particularly given the need to pack the results into 10-bit samples. Instead,
a window of signal is generated on startup, or on request to change the
signal, and then replayed over and over. The length of this window is
determined by the --signal-heaps
command-line option.
This has a few implications:
The frequency resolution for
cw
andcomb
is limited by the inverse of the window length. For example, a sinusoidal signal must have an integer number of cycles per window, which means that the frequency is rounded to a multiple of \(\frac{\text{adc-sample-rate}}{\text{signal-heaps}\times \text{heap-samples}}\).Noise is correlated in time, and when averaging over long periods of time (longer than the window), the standard deviation does not decrease with the square root of the integration time. Similarly, the sample mean converges to the mean of the generated window rather than the population mean.
To speed up the signal generation, dask is used to parallelise the process
across multiple CPU cores. Dask presents a numpy-like interface, but
internally splits arrays into chunks and performs computations for each chunk
in parallel. The chunk size is determined by the constant
katgpucbf.dsim.signal.CHUNK_SIZE
.
The simulator also populates the saturation count and flag in the
digitiser_status
SPEAD item. A per-heap saturation count is computed with
dask (prior to bit-packing), and then packed into the digitiser_status
bit-field with serial code. This accumulation could be done in parallel for
better efficiency, but the current approach is reasonably performant and does
not require the sampling code to have any knowledge of the format of the
digitiser_status
item.
Generating reproducible random signals needs to be done carefully when
parallelising. The given random seed is first used to produce a
SeedSequence
for each chunk, and each chunk then uses
an independent generator seeded with its corresponding sequence. This ensures
that different instances of the simulator will produce the same sequence given
the same entropy (hence giving correlated noise). Note that the result is
dependent on the chunk size.
Transmission
Most of the heavy work of transmission is handled by spead2. To minimise
overheads, the heaps are pre-defined, and put into a
spead2.send.HeapReferenceList
for bulk transmission with
spead2.send.asyncio.AsyncStream.async_send_heaps()
. Additionally,
spead2’s rate limiting is used to control the simulated digitiser clock
speed. Since spead2 sends data in small bursts (64 KiB) between sleeps, the
delivery of packets will not be as smooth as from a real MeerKAT digitiser.
To avoid stalling transmission, it is important that spead2’s C++ worker thread always has more data to send, as the latency of signalling end-of-transmission to Python and then waiting for Python to respond with new heaps would be significant. To accommodate this, the window is split in half, and each call to spead2 sends only half the window. As soon as one half finishes transmission, the Python code prepares it to be sent again, in parallel with spead2 starting transmission of the other half.
Although the signal is recycled, some work is still needed to prepare a half-window for retransmission, because the timestamps need to be updated. To make this as efficient as possible, all the timestamps are allocated in a single numpy array, and each heap references the appropriate entry of the array. This allows a range of timestamps to be updated with a single numpy operation, rather than a Python loop.
Allowing the signal to be changed mid-flow is done with double-buffering. The new signal is computed asynchronously into a spare second window. Once that’s completed, the spare and active windows are swapped. The new spare window may still be referenced by in-flight heaps, so it is necessary to await transmission of those heaps before allowing the signal to be changed again.