How-to: Add a New Kernel¶

Audience: developers adding performance-critical CPU/CUDA kernels.

This repo has paired CPU/CUDA implementations for performance-critical kernels (convolution, interpolation/search helpers, etc.). This page shows the “house style” for adding a new one.

1) Decide the interface¶

Keep the call signature identical across CPU and GPU builds.

Typical pattern:

Public declaration in include/<module>/<name>.hpp.
Implementation in src/<module>/<name>.cpp.
Optional CUDA implementation in src/<module>/<name>.cu.

If the kernel is used from both host and device code, add a small wrapper API and keep device details internal.

2) Add the CPU implementation¶

Create the header under include/.
Implement in src/.
Prefer existing utilities:
- math helpers: include/math/*
- memory/telemetry: include/core/gpu_memory_utils.hpp, include/core/console.hpp
- convolution scaffolding: include/convolution/* and src/convolution/*

3) Add the CUDA implementation (optional)¶

Create src/<module>/<name>.cu.
Guard CUDA-only bits with #if DMFE_WITH_CUDA.
Reuse device utilities from include/core/* (and include/core/device_utils.cuh where appropriate).

Practical advice:

Keep kernel launches and device memory management localized.
If the kernel needs to run in both modes depending on --gpu, keep the decision at a higher level and provide both paths.

4) Wire it into the build¶

Register the new translation unit(s) in CMakeLists.txt:

add src/<module>/<name>.cpp to the CPU sources list
if applicable, add src/<module>/<name>.cu to the GPU sources list

Use existing kernel files as templates (the src/convolution/ and src/interpolation/ folders are good starting points).

5) Validate correctness and performance¶

Minimum checks:

CPU vs GPU: compare results on a small grid with --gpu false vs --gpu true.
Stability: ensure no NaNs/inf are produced.
Performance: time at least one representative call site (or check step_metrics.txt in debug mode).

If it’s a user-visible enhancement, add a short note to the relevant concept/tutorial page.