How-to: Add a New Kernel¶
Audience: developers adding performance-critical CPU/CUDA kernels.
This repo has paired CPU/CUDA implementations for performance-critical kernels (convolution, interpolation/search helpers, etc.). This page shows the “house style” for adding a new one.
1) Decide the interface¶
Keep the call signature identical across CPU and GPU builds.
Typical pattern:
- Public declaration in
include/<module>/<name>.hpp. - Implementation in
src/<module>/<name>.cpp. - Optional CUDA implementation in
src/<module>/<name>.cu.
If the kernel is used from both host and device code, add a small wrapper API and keep device details internal.
2) Add the CPU implementation¶
- Create the header under
include/. - Implement in
src/. - Prefer existing utilities:
- math helpers:
include/math/* - memory/telemetry:
include/core/gpu_memory_utils.hpp,include/core/console.hpp - convolution scaffolding:
include/convolution/*andsrc/convolution/*
- math helpers:
3) Add the CUDA implementation (optional)¶
- Create
src/<module>/<name>.cu. - Guard CUDA-only bits with
#if DMFE_WITH_CUDA. - Reuse device utilities from
include/core/*(andinclude/core/device_utils.cuhwhere appropriate).
Practical advice:
- Keep kernel launches and device memory management localized.
- If the kernel needs to run in both modes depending on
--gpu, keep the decision at a higher level and provide both paths.
4) Wire it into the build¶
Register the new translation unit(s) in CMakeLists.txt:
- add
src/<module>/<name>.cppto the CPU sources list - if applicable, add
src/<module>/<name>.cuto the GPU sources list
Use existing kernel files as templates (the src/convolution/ and src/interpolation/ folders are good starting points).
5) Validate correctness and performance¶
Minimum checks:
- CPU vs GPU: compare results on a small grid with
--gpu falsevs--gpu true. - Stability: ensure no NaNs/inf are produced.
- Performance: time at least one representative call site (or check
step_metrics.txtin debug mode).
If it’s a user-visible enhancement, add a short note to the relevant concept/tutorial page.