|
| 1 | +.. Copyright 2025-2026 |
| 2 | +
|
| 3 | +.. _Compression: |
| 4 | + |
| 5 | +Compression |
| 6 | +########### |
| 7 | + |
| 8 | +Lossy compression is a widely used data reduction technique in scientific computing. By accepting a controlled loss |
| 9 | +of precision, compressors such as SZ and ZFP can achieve significant reductions in data volume while preserving the |
| 10 | +features that matter for downstream analysis. DTLMod models the performance impact of compression on in situ |
| 11 | +workflows without actually compressing any data: it simulates the computational cost of compression and |
| 12 | +decompression and adjusts the volume of data transported through the DTL according to a compression ratio. |
| 13 | + |
| 14 | +How compression works in DTLMod |
| 15 | +------------------------------- |
| 16 | + |
| 17 | +Unlike decimation, compression does not change the **shape** of a variable. A :math:`1000 \times 1000` array remains |
| 18 | +a :math:`1000 \times 1000` array after compression. What changes is the **byte-size** of the variable: the number |
| 19 | +of bytes transported through the DTL is divided by the compression ratio. This reflects the fact that real-world |
| 20 | +lossy compressors produce a bitstream that is smaller than the original data but still represents all the elements |
| 21 | +of the array. |
| 22 | + |
| 23 | +Compression is a **publisher-side only** operation. Applying compression on the subscriber side is not meaningful |
| 24 | +because compression aims to reduce the volume of data that needs to be transported---which requires intervention |
| 25 | +before the data leaves the publisher. |
| 26 | + |
| 27 | +Compressor profiles |
| 28 | +------------------- |
| 29 | + |
| 30 | +DTLMod provides three ways to determine the compression ratio for a variable: |
| 31 | + |
| 32 | +**Fixed ratio.** The simplest option: you directly specify the desired compression ratio. This is useful when you |
| 33 | +already know, from experiments or from the literature, the compression ratio achieved by a particular compressor |
| 34 | +on data similar to yours. The ratio must be at least 1.0 (a ratio of 1 means no size reduction). |
| 35 | + |
| 36 | +**SZ profile.** This profile is inspired by the `SZ lossy compressor <https://szcompressor.org/>`_, a |
| 37 | +prediction-based algorithm. SZ achieves high compression ratios on smooth scientific data because it can accurately |
| 38 | +predict neighboring values and only store the (small) prediction errors. The compression ratio is derived from two |
| 39 | +user-specified parameters: |
| 40 | + |
| 41 | +- **accuracy** (or error bound): the maximum acceptable pointwise error. Tighter accuracy requirements reduce the |
| 42 | + compression ratio because more bits are needed to represent the prediction residuals. |
| 43 | + |
| 44 | +- **data smoothness**: a value between 0 and 1 that characterizes how regular the data is. Smooth data |
| 45 | + (e.g., temperature fields) yields higher compression ratios because predictions are more accurate. Noisy or |
| 46 | + turbulent data yields lower ratios. |
| 47 | + |
| 48 | +The model computes the ratio as: |
| 49 | + |
| 50 | +.. math:: |
| 51 | +
|
| 52 | + r = \max\!\Big(1,\;\alpha \cdot \left(-\log_{10} \varepsilon\right)^{\beta} \cdot (0.5 + \sigma)\Big) |
| 53 | +
|
| 54 | +where :math:`\varepsilon` is the accuracy, :math:`\sigma` is the data smoothness, and :math:`\alpha = 3.0`, |
| 55 | +:math:`\beta = 0.8` are empirical parameters fitted from published benchmarks on scientific datasets. |
| 56 | + |
| 57 | +**ZFP profile.** This profile is inspired by the `ZFP compressor <https://computing.llnl.gov/projects/zfp>`_, |
| 58 | +a transform-based algorithm. ZFP organizes data into small blocks, applies a near-orthogonal transform, and |
| 59 | +encodes the resulting coefficients with a fixed number of bits per value. The compression ratio depends primarily |
| 60 | +on the requested accuracy: |
| 61 | + |
| 62 | +.. math:: |
| 63 | +
|
| 64 | + \text{rate} = \max(1,\;-\log_2 \varepsilon + 1) \quad;\quad r = \frac{64}{\text{rate}} |
| 65 | +
|
| 66 | +where the rate represents the number of bits per double-precision value after compression. Higher accuracy |
| 67 | +requirements increase the rate and therefore decrease the compression ratio. |
| 68 | + |
| 69 | +Compression and decompression costs |
| 70 | +------------------------------------ |
| 71 | + |
| 72 | +Two independent cost parameters control the simulated computational overhead of compression: |
| 73 | + |
| 74 | +- **compression cost per element**: the number of floating-point operations incurred per array element when |
| 75 | + compressing the data on the publisher side. |
| 76 | + |
| 77 | +- **decompression cost per element**: the number of floating-point operations incurred per array element when |
| 78 | + decompressing the data on the subscriber side, after it has been received. |
| 79 | + |
| 80 | +Both parameters default to 1.0. The total compression cost for a variable is computed as: |
| 81 | + |
| 82 | +.. math:: |
| 83 | +
|
| 84 | + C_{\text{compress}} = c_{\text{comp}} \times \frac{N_{\text{local}}}{\text{element\_size}} |
| 85 | +
|
| 86 | +.. math:: |
| 87 | +
|
| 88 | + C_{\text{decompress}} = c_{\text{decomp}} \times \frac{N_{\text{local}}}{\text{element\_size}} |
| 89 | +
|
| 90 | +where :math:`N_{\text{local}}` is the local size of the variable in bytes and :math:`\text{element\_size}` is the |
| 91 | +size of one array element. The compression cost is incurred by the publisher right before putting the variable into |
| 92 | +the DTL, and the decompression cost is incurred by the subscriber right after receiving it. |
| 93 | + |
| 94 | +Per-transaction variability |
| 95 | +--------------------------- |
| 96 | + |
| 97 | +In practice, the compression ratio achieved on a given variable varies from one time step to the next as the data |
| 98 | +evolves. DTLMod can model this variability through an optional **ratio variability** parameter that introduces a |
| 99 | +bounded, deterministic perturbation around the nominal compression ratio at each transaction. This enables the |
| 100 | +simulation of realistic scenarios in which the effectiveness of compression fluctuates over the course of a run. |
| 101 | + |
| 102 | +Re-parameterization |
| 103 | +------------------- |
| 104 | + |
| 105 | +As with decimation, compression parameters can be updated between transactions. You can change the compression |
| 106 | +ratio, switch compressor profiles, adjust accuracy or smoothness, or modify the cost parameters for a variable that |
| 107 | +is already being compressed. Only the parameters that are explicitly provided in the update are modified; the |
| 108 | +others retain their previous values. This supports the simulation of adaptive compression strategies that adjust |
| 109 | +their settings in response to changes in the data. |
0 commit comments