Profiler
Profiler flows for scans, trace replay, autotuning, and applying results to runtime setup.
Audience: Integration engineers and performance engineers who need to benchmark convolution performance, explore IRC compression quality, or generate runtime plans.
Use this chapter when you want to measure TDSE performance on your hardware, compare execution choices, or generate a runtime plan you can apply in code.
Use this chapter to answer "how should I measure?" Then use Backend and Performance to answer "how should I interpret those results, choose a backend, and decide what evidence is strong enough for PoC or deployment?"
This split is intentional: Telemetry covers continuous runtime observability, while this chapter covers short, explicit measurement passes and runtime-plan generation.
What the Profiler Does
The TDSE Profiler is a CLI module (tdse profiler) that:
- benchmarks convolution performance on your hardware
- derives an optimal execution plan (backend selection, thread count, precision)
- scans IRC compression parameters for the best speed/accuracy trade-off
- produces a
runtime_planthat can be passed directly totdse_backend_apply_plan()
The profiler does not modify your pack. It measures and recommends; Runtime stays in charge of execution.
Quick Start
Minimal IRC scan with a synthetic lowpass input:
tdse profiler irc-scan \
--np 8 --nh 257 \
--scan-prefix-counts 16,32,64 \
--scan-growth-values 2,4 \
--scan-input lowpass \
--scan-modes both
This scans all combinations of prefix counts [16, 32, 64] and growth values
[2, 4] for both V1 and V2 IRC modes, using a built-in lowpass filter as the
reference impulse response.
IRC Scan
What IRC Compression Does
Impulse Response Compression replaces a long impulse-response tail with a short
prefix plus an exponential growth approximation. This reduces the per-step
convolution cost from O(P^2 · nh) to O(P^2 · prefix_length) at the expense of
a controllable accuracy loss.
When to use IRC:
nhis large (hundreds or thousands of taps)- the impulse response has a smooth exponential tail
- per-step latency matters more than absolute spectral accuracy
When not to use IRC:
nhis small (under 50 taps)- the impulse response has sharp features in the tail
- the application requires bit-exact convolution
Scan Parameters
| Parameter | Meaning | Typical Values |
|---|---|---|
--np | port count for the synthetic model | your model's np |
--nh | history depth for the synthetic model | your model's nh |
--scan-prefix-counts | comma-separated prefix lengths to test | 16,32,64,128 |
--scan-growth-values | comma-separated growth factors to test | 2,4 |
--scan-input | input source: lowpass or trace | lowpass for quick scan |
--scan-modes | IRC mode(s): v1, v2, or both | both |
IRC Modes
| Mode | Description | Trade-off |
|---|---|---|
| V1 | single-growth exponential tail | simpler, slightly less accurate |
| V2 | piecewise growth with prefix | more accurate for complex tails |
Output Formats
| Flag | Output | Use When |
|---|---|---|
--out-md <path> | Markdown summary report | human review |
--out-csv <path> | CSV with per-combination metrics | spreadsheet analysis |
--out-recommend-json <path> | machine-readable recommendation | feeding into build pipeline |
Interpreting Results
The scan output includes per-combination metrics:
| Metric | Meaning |
|---|---|
rel_rms | relative RMS error of compressed vs full convolution |
peak_abs | peak absolute error |
speedup | speedup factor over uncompressed convolution |
spec_rel_rms | spectral-domain relative RMS (if spectral metrics enabled) |
spec_max_rel | spectral-domain maximum relative error |
A recommendation line appears if a feasible candidate exists:
RECOMMEND: mode=v2 prefix=64 growth=4 rel_rms=8.3e-4 speedup=3.2x
If no candidate meets all constraints:
NO_FEASIBLE: ... plus TOP1..TOP3 fallback rows
Trace Replay Scan
Instead of a synthetic lowpass, you can scan against a real simulation trace from your application. This produces more representative accuracy metrics.
CSV Input Format
Required columns: t, x0, x1, ..., x{np-1}
- One sample per row
- Time column must be strictly increasing
- Non-uniform intervals are allowed
- Missing or empty values are rejected
Example for np=3:
t,x0,x1,x2
0.000,1.0,0.0,0.5
0.001,0.95,0.05,0.48
0.002,0.90,0.10,0.46
Trace-Related Options
| Option | Meaning | Default |
|---|---|---|
--scan-trace-csv <path> | input CSV file (required with --scan-input trace) | - |
--scan-trace-time-col <name> | time column name | t |
--scan-trace-port-prefix <prefix> | port column prefix | x (matches x0..x{np-1}) |
--scan-trace-has-header 0|1 | CSV has header row | 1 |
--scan-trace-time-units s|ms|us|ns | time unit conversion | s |
--scan-trace-use-all 0|1 | use full trace length | 1 |
--scan-trace-start-index <k> | start from sample k | 0 |
--scan-trace-max-steps <n> | cap number of steps | 0 (unbounded) |
Example: Trace Replay with Autotune
tdse profiler irc-scan \
--np 8 --nh 257 \
--scan-prefix-counts 32,64,128 \
--scan-growth-values 2,4 \
--scan-input trace \
--scan-trace-csv ./my_trace.csv \
--scan-trace-time-col t \
--scan-trace-port-prefix x \
--scan-trace-has-header 1 \
--scan-trace-time-units s \
--scan-trace-use-all 1 \
--scan-autotune 1 \
--scan-target-rel-rms 1e-3 \
--scan-objective speedup \
--scan-prefer-mode either \
--out-md ./irc_scan_report.md \
--out-csv ./irc_scan_results.csv \
--out-recommend-json ./irc_recommend.json
Autotune
Autotune automatically selects the best IRC parameters subject to accuracy constraints.
Options
| Option | Meaning | Default |
|---|---|---|
--scan-autotune 0|1 | enable autotune | 0 (off) |
--scan-target-rel-rms <float> | maximum acceptable rel_rms | 1e-3 |
--scan-target-peak-abs <float> | maximum acceptable peak_abs | 0 (disabled) |
--scan-target-spec-rel-rms <float> | maximum acceptable spec_rel_rms | 0 (disabled) |
--scan-target-spec-max-rel <float> | maximum acceptable spec_max_rel | 0 (disabled) |
--scan-objective <type> | optimization goal (see below) | speedup |
--scan-prefer-mode v1|v2|either | preferred IRC mode | either |
Objective Functions
| Objective | Selects The Combination That... |
|---|---|
speedup | maximizes speedup subject to accuracy constraints |
maxspeed | maximizes raw speed regardless of nh reduction |
minnh | minimizes effective nh subject to accuracy constraints |
minerror | minimizes error subject to minimum speedup threshold |
Output
When autotune succeeds:
RECOMMEND: mode=v2 prefix=64 growth=4 rel_rms=8.3e-4 speedup=3.2x
When no combination meets all constraints:
NO_FEASIBLE: target_rel_rms=1e-3, best_rel_rms=2.1e-3
TOP1: mode=v2 prefix=128 growth=4 rel_rms=2.1e-3 speedup=1.8x
TOP2: ...
TOP3: ...
Use the fallback rows to decide whether to relax constraints or accept the best available option.
Spectral Metrics
Enable frequency-domain consistency checking during scan:
tdse profiler irc-scan \
--np 8 --nh 257 \
--scan-prefix-counts 32,64,128 \
--scan-growth-values 2,4 \
--scan-input lowpass \
--scan-lowpass-alpha 0.95 \
--scan-spectral 1 \
--scan-spectral-nfreq 128 \
--scan-spectral-fmin 0 \
--scan-spectral-fmax 0 \
--scan-spectral-weighting lowfreq
Spectral Options
| Option | Meaning | Default |
|---|---|---|
--scan-spectral 0|1 | enable spectral metrics | 0 |
--scan-spectral-nfreq <int> | number of frequency sample points | 128 |
--scan-spectral-fmin <Hz> | lower frequency bound | 0 |
--scan-spectral-fmax <Hz> | upper frequency bound (auto-derives from tau if <=fmin) | 0 |
--scan-spectral-weighting flat|lowfreq|band | weighting for spec_rel_rms | flat |
--scan-spectral-band f1,f2 | active frequency band when weighting is band | - |
Metric Definitions
The spectral metrics compare the original impulse response H(w) against the
compressed version H'(w):
H(w) = sum_k h[k] * exp(-j*w*tau[k])(NUDFT/DTFT on the configured grid)rel_f = ||H(w) - H'(w)||_F / (||H(w)||_F + eps)spec_rel_rms = sqrt(mean_w(rel_f^2))spec_max_rel = max_w(rel_f)
The weighting parameter controls how spec_rel_rms aggregates across frequency:
flat: equal weight at all frequencieslowfreq: heavier weight at low frequencies (typical for power systems)band f1,f2: only frequencies in[f1, f2]contribute
Applying Profiler Output to Runtime
The profiler produces a runtime_plan block compatible with
tdse_backend_apply_plan().
Using the Recommendation JSON
- Run the profiler with
--out-recommend-json plan.json - Load the JSON in your build pipeline
- At runtime, apply the plan:
tdse_backend_apply_plan(model, plan_json, plan_len);
The plan JSON contains backend selection, thread count, and precision settings derived from the profiler's benchmarks.
Manual Application
If you prefer to apply individual settings:
tdse_backend_set(model, backend_id);
tdse_local_threads_set(model, thread_count);
tdse_compute_precision_set(model, precision);
See Backend and Performance and Runtime API Summary for the full backend and performance API.
CLI Smoke Test
Verify the profiler CLI is working:
tdse profiler irc-scan \
--np 2 --nh 16 \
--scan-prefix-counts 4,8 \
--scan-growth-values 2 \
--scan-input lowpass \
--out-md /dev/stdout
This should complete in under a second and print a Markdown table of scan results.
