Sparsification¶

Audience: readers who want to understand the pruning criterion and its effect on accuracy/performance.

Sparsification reduces memory load by pruning history with controlled error. This is particularly relevant for long simulations run on GPUs. The pruning criterion and reconstruction are implemented in src/sparsify/.

What is pruned: interior time‑nodes of the 1D history grid t1grid along $t_1$. Endpoints are always kept.

Pruning criterion (CPU and GPU): for each interior i ≥ 2 with i + 1 < N, compute a smoothness measure using QK, QR and their t‑derivatives on the stencil {i−2, i−1, i, i+1} with non‑uniform gaps Δt:

Let $t_{\text{left}} = t[i-2],\; t_{\text{mid}} = t[i]$; define $\Delta_1 = t[i-1] - t_{\text{left}},\; \Delta_2 = t_{\text{mid}} - t_{\text{left}},\; \Delta_3 = t[i+1] - t_{\text{mid}}$ and scale $s = \Delta_2/12$.
For each component $j = 1\dots \text{len}$, accumulate

$$ \displaystyle s\,\big\lvert 2\,(QK[i]-QK[i-2]) - \Delta_2\,\big(\tfrac{\mathrm d QK[i-1]}{\Delta_1} + \tfrac{\mathrm d QK[i+1]}{\Delta_3}\big) \big\rvert \; +\; s\,\big\lvert 2\,(QR[i]-QR[i-2]) - \Delta_2\,\big(\tfrac{\mathrm d QR[i-1]}{\Delta_1} + \tfrac{\mathrm d QR[i+1]}{\Delta_3}\big) \big\rvert. $$

If the total is below the threshold, node i is erasable; otherwise it is kept.

Index reconstruction and derivative scaling¶

Build the kept index list inds including $0$ and $N-1$.
Build indsD by shifting interior kept indices by $+1$ for derivative‑anchored data; set indsD[0]=0.
Compute tfac per kept chunk: tfac[0]=1, and for i > 0

$$ \displaystyle \text{tfac}[i] = \frac{t\big[\text{inds}[i]\big] - t\big[\text{inds}[i-1]\big]}{t\big[\text{indsD}[i]\big] - t\big[\text{indsD}[i]-1\big]}. $$

Gather arrays with these indices:
- QK, QR, r use inds.
- dQK, dQR, dr use indsD and are multiplied by tfac to preserve derivative consistency under grid compression.
Compress t1grid with inds and recompute $\Delta t$ and delta_t_ratio as $\Delta t_i/\Delta t_{i-1}$ for $i \ge 2$.

Cadence and modes¶

The GPU implementation evaluates flags at even indices for efficiency; CPU checks all interior indices.
Aggressive vs conservative modes only change the threshold value and sweep cadence; the mechanism is identical.
After sparsification, the code may try SERK2; whether this is enabled is a runtime configuration choice (see Usage).

Choosing a threshold¶

The default threshold should be safe in the context of the mixed spherical $p$-spin model.
If a new threshold is needed, start from the default (tuned for len and ε) and validate on short runs by comparing C and R slices and derived observables (energy, gFDR/FDT diagnostics) with sparsification off. Increase threshold for more compression; decrease for more accuracy.

Implementation references: include/sparsify/sparsify_utils.hpp, src/sparsify/sparsify_utils.cpp (CPU), src/sparsify/sparsify_utils.cu (GPU). Post‑prune, interpolation is re‑initialized automatically by downstream calls.

flowchart TD
  start([Start sparsification sweep])
  gather[Gather stencil<br/>&lbrace;i-2, i-1, i, i+1&rbrace;]
  compute[Compute smoothness<br/>metric using QK/QR/derivatives]
  threshold{Metric &lt; threshold?}
  erase[Mark node erasable]
  keep[Keep node]
  next{More interior nodes?}
  rebuild[Rebuild kept index lists<br/>inds, indsD]
  scale[Rescale derivatives<br/>with tfac]
  compress[Write compressed histories<br/>update Δt ratios]
  done([Sparsification complete])

  start --> gather --> compute --> threshold
  threshold -->|Yes| erase --> next
  threshold -->|No| keep --> next
  next -->|Yes| gather
  next -->|No| rebuild --> scale --> compress --> done