Sparsification¶
Audience: readers who want to understand the pruning criterion and its effect on accuracy/performance.
Sparsification reduces memory load by pruning history with controlled error. This is particularly relevant for long simulations run on GPUs. The pruning criterion and reconstruction are implemented in src/sparsify/.
What is pruned: interior time‑nodes of the 1D history grid t1grid along \(t_1\). Endpoints are always kept.
Pruning criterion (CPU and GPU): for each interior i ≥ 2 with i + 1 < N, compute a smoothness measure using QK, QR and their t‑derivatives on the stencil {i−2, i−1, i, i+1} with non‑uniform gaps Δt:
- Let \(t_{\text{left}} = t[i-2],\; t_{\text{mid}} = t[i]\); define \(\Delta_1 = t[i-1] - t_{\text{left}},\; \Delta_2 = t_{\text{mid}} - t_{\text{left}},\; \Delta_3 = t[i+1] - t_{\text{mid}}\) and scale \(s = \Delta_2/12\).
- For each component \(j = 1\dots \text{len}\), accumulate
$$ \displaystyle s\,\big\lvert 2\,(QK[i]-QK[i-2]) - \Delta_2\,\big(\tfrac{\mathrm d QK[i-1]}{\Delta_1} + \tfrac{\mathrm d QK[i+1]}{\Delta_3}\big) \big\rvert \; +\; s\,\big\lvert 2\,(QR[i]-QR[i-2]) - \Delta_2\,\big(\tfrac{\mathrm d QR[i-1]}{\Delta_1} + \tfrac{\mathrm d QR[i+1]}{\Delta_3}\big) \big\rvert. $$
- If the total is below the threshold, node i is erasable; otherwise it is kept.
Index reconstruction and derivative scaling¶
- Build the kept index list
indsincluding \(0\) and \(N-1\). - Build
indsDby shifting interior kept indices by \(+1\) for derivative‑anchored data; setindsD[0]=0. - Compute
tfacper kept chunk:tfac[0]=1, and fori > 0
$$ \displaystyle \text{tfac}[i] = \frac{t\big[\text{inds}[i]\big] - t\big[\text{inds}[i-1]\big]}{t\big[\text{indsD}[i]\big] - t\big[\text{indsD}[i]-1\big]}. $$
- Gather arrays with these indices:
- QK, QR, r use
inds. - dQK, dQR, dr use
indsDand are multiplied by tfac to preserve derivative consistency under grid compression.
- QK, QR, r use
- Compress
t1gridwithindsand recompute \(\Delta t\) anddelta_t_ratioas \(\Delta t_i/\Delta t_{i-1}\) for \(i \ge 2\).
Cadence and modes¶
- The GPU implementation evaluates flags at even indices for efficiency; CPU checks all interior indices.
- Aggressive vs conservative modes only change the threshold value and sweep cadence; the mechanism is identical.
- After sparsification, the code may try SERK2; whether this is enabled is a runtime configuration choice (see Usage).
Choosing a threshold¶
- The default threshold should be safe in the context of the mixed spherical \(p\)-spin model.
- If a new threshold is needed, start from the default (tuned for len and ε) and validate on short runs by comparing C and R slices and derived observables (energy, gFDR/FDT diagnostics) with sparsification off. Increase threshold for more compression; decrease for more accuracy.
Implementation references: include/sparsify/sparsify_utils.hpp, src/sparsify/sparsify_utils.cpp (CPU), src/sparsify/sparsify_utils.cu (GPU). Post‑prune, interpolation is re‑initialized automatically by downstream calls.
flowchart TD
start([Start sparsification sweep])
gather[Gather stencil<br/>{i-2, i-1, i, i+1}]
compute[Compute smoothness<br/>metric using QK/QR/derivatives]
threshold{Metric < threshold?}
erase[Mark node erasable]
keep[Keep node]
next{More interior nodes?}
rebuild[Rebuild kept index lists<br/>inds, indsD]
scale[Rescale derivatives<br/>with tfac]
compress[Write compressed histories<br/>update Δt ratios]
done([Sparsification complete])
start --> gather --> compute --> threshold
threshold -->|Yes| erase --> next
threshold -->|No| keep --> next
next -->|Yes| gather
next -->|No| rebuild --> scale --> compress --> done