katgpucbf.xbgpu.correlation module

Module wrapping the ASTRON Tensor-Core Correlation Kernels in the MeerKAT katsdpsigproc framework.

Todo

Eventually modify the classes to support 4 and 16 bit input samples. The kernel supports this, but it is not exposed to the reader. There is no use case for this at the moment, so this is a low priority.

class katgpucbf.xbgpu.correlation.Correlation(template: CorrelationTemplate, command_queue: AbstractCommandQueue, n_batches: int)[source]

Bases: Operation

Tensor-Core correlation kernel.

Specifies the shape of the input sample and output visibility buffers required by the kernel. The parameters specified in the CorrelationTemplate object are used to determine the shape of the buffers. There is an outer-most dimension called “batches”, over which the operation is parallelised. Not all batches need to be processed every time: set the first_batch and last_batch attributes to control which batches will be computed.

The input sample buffer must have the shape: [n_batches][n_ants][channels][spectra_per_heap][polarisations]

There is an alignment requirement for spectra_per_heap due to the implementation details of the kernel. For 8-bit input mode, spectra_per_heap must be a multiple of 16.

Each input element is a complex 8-bit integer sample. numpy does not support 8-bit complex numbers, so the dimensionality is extended by 1, with the last dimension sized 2 to represent the complexity.

With 8-bit input samples, the value -128i is not supported by the kernel as there is no 8-bit complex conjugate representation of this number. Passing -128i into the kernel will produce incorrect values at the output.

The output visibility buffer must have the shape [channels][baselines][COMPLEX]. In 8-bit mode, each element in this visibility matrix is a 32-bit integer value.

Calling this object does not directly update the output. Instead, it updates an intermediate buffer (called mid_visibilities). To produce the output, call reduce(). This function can also flag data that was missing during the accumulation, by writing a special value. This is controlled by the present_baselines slot, which has one boolean entry per baseline (antenna pair).

Currently only 8-bit input mode is supported.

static get_baseline_index(ant1: int, ant2: int) → int[source]

Get index in the visibilities matrix for baseline (ant1, ant2).

The visibilities matrix indexing is as follows:

     ant2 = 0  1  2  3  4
         +---------------
ant1 = 0 | 00 01 03 06 10
       1 |    02 04 07 11
       2 |       05 08 12
       3 |          09 13
       4 |             14

This function requires that \(ant2 \ge ant1\)

reduce() → None[source]: Finalise computation of the output visibilities from the internal buffer.

zero_visibilities() → None[source]: Zero all the values in the internal buffer.

class katgpucbf.xbgpu.correlation.CorrelationTemplate(context: AbstractContext, n_ants: int, n_channels_per_substream: int, n_spectra_per_heap: int, input_sample_bits: int)[source]

Bases: object

Template class for the Tensor-Core correlation kernel.

The template creates a Correlation that will run the compiled kernel. The parameters are used to compile the kernel and by the Correlation to specify the shape of the memory buffers connected to this kernel.

The number of baselines calculated here is not the canonical way that it is done in radio astronomy:

\[n_{baselines} = \frac{n_{ants} * (n_{ants} + 1)}{2}\]

Because we have a dual-polarised telescope, we calculate four ‘baselines’ for each canonical baseline as calculated above, namely \(h_1 h_2\), \(h_1 v_2\), \(v_1 h_2\), and \(v_1 v_2\). So the list of baselines appears four times as long as you might expect.

Parameters:

n_ants – The number of antennas that will be correlated. Each antennas is expected to produce two polarisations.
n_channels_per_substream – The number of frequency channels to be processed.
n_spectra_per_heap – The number of time samples to be processed per frequency channel.
input_sample_bits – The number of bits per input sample. Only 8 bits is supported at the moment.

instantiate(command_queue: AbstractCommandQueue, n_batches: int) → Correlation[source]: Create a Correlation using this template to build the kernel.

katgpucbf.xbgpu.correlation.MIN_COMPUTE_CAPABILITY = (7, 2): Minimum CUDA compute capability needed for the kernel (with 8-bit samples)

katgpucbf.xbgpu.correlation.MISSING = array([-2147483648, 1], dtype=int32): Magic value indicating missing data

katgpucbf.xbgpu.correlation.device_filter(device: AbstractDevice) → bool[source]: Determine whether a device is suitable for running the kernel.