Profiler flows for scans, trace replay, autotuning, and applying results to runtime setup.

Audience: Integration engineers and performance engineers who need to benchmark convolution performance, explore IRC compression quality, or generate runtime plans.

Use this chapter when you want to measure TDSE performance on your hardware, compare execution choices, or generate a runtime plan you can apply in code.

Use this chapter to answer "how should I measure?" Then use Backend and Performance to answer "how should I interpret those results, choose a backend, and decide what evidence is strong enough for PoC or deployment?"

This split is intentional: Telemetry covers continuous runtime observability, while this chapter covers short, explicit measurement passes and runtime-plan generation.

What the Profiler Does

The TDSE Profiler is a CLI module (tdse profiler) that:

benchmarks convolution performance on your hardware
derives an optimal execution plan (backend selection, thread count, precision)
scans IRC compression parameters for the best speed/accuracy trade-off
produces a runtime_plan that can be passed directly to tdse_backend_apply_plan()

The profiler does not modify your pack. It measures and recommends; Runtime stays in charge of execution.

Quick Start

Minimal IRC scan with a synthetic lowpass input:

tdse profiler irc-scan \
  --np 8 --nh 257 \
  --scan-prefix-counts 16,32,64 \
  --scan-growth-values 2,4 \
  --scan-input lowpass \
  --scan-modes both

This scans all combinations of prefix counts [16, 32, 64] and growth values [2, 4] for both V1 and V2 IRC modes, using a built-in lowpass filter as the reference impulse response.

IRC Scan

What IRC Compression Does

Impulse Response Compression replaces a long impulse-response tail with a short prefix plus an exponential growth approximation. This reduces the per-step convolution cost from O(P^2 · nh) to O(P^2 · prefix_length) at the expense of a controllable accuracy loss.

When to use IRC:

nh is large (hundreds or thousands of taps)
the impulse response has a smooth exponential tail
per-step latency matters more than absolute spectral accuracy

When not to use IRC:

nh is small (under 50 taps)
the impulse response has sharp features in the tail
the application requires bit-exact convolution

Scan Parameters

Parameter	Meaning	Typical Values
`--np`	port count for the synthetic model	your model's `np`
`--nh`	history depth for the synthetic model	your model's `nh`
`--scan-prefix-counts`	comma-separated prefix lengths to test	`16,32,64,128`
`--scan-growth-values`	comma-separated growth factors to test	`2,4`
`--scan-input`	input source: `lowpass` or `trace`	`lowpass` for quick scan
`--scan-modes`	IRC mode(s): `v1`, `v2`, or `both`	`both`

IRC Modes

Mode	Description	Trade-off
V1	single-growth exponential tail	simpler, slightly less accurate
V2	piecewise growth with prefix	more accurate for complex tails

Output Formats

Flag	Output	Use When
`--out-md <path>`	Markdown summary report	human review
`--out-csv <path>`	CSV with per-combination metrics	spreadsheet analysis
`--out-recommend-json <path>`	machine-readable recommendation	feeding into build pipeline

Interpreting Results

The scan output includes per-combination metrics:

Metric	Meaning
`rel_rms`	relative RMS error of compressed vs full convolution
`peak_abs`	peak absolute error
`speedup`	speedup factor over uncompressed convolution
`spec_rel_rms`	spectral-domain relative RMS (if spectral metrics enabled)
`spec_max_rel`	spectral-domain maximum relative error

A recommendation line appears if a feasible candidate exists:

RECOMMEND: mode=v2 prefix=64 growth=4 rel_rms=8.3e-4 speedup=3.2x

If no candidate meets all constraints:

NO_FEASIBLE: ... plus TOP1..TOP3 fallback rows

Trace Replay Scan

Instead of a synthetic lowpass, you can scan against a real simulation trace from your application. This produces more representative accuracy metrics.

CSV Input Format

Required columns: t, x0, x1, ..., x{np-1}

One sample per row
Time column must be strictly increasing
Non-uniform intervals are allowed
Missing or empty values are rejected

Example for np=3:

t,x0,x1,x2
0.000,1.0,0.0,0.5
0.001,0.95,0.05,0.48
0.002,0.90,0.10,0.46

Option	Meaning	Default
`--scan-trace-csv <path>`	input CSV file (required with `--scan-input trace`)	-
`--scan-trace-time-col <name>`	time column name	`t`
`--scan-trace-port-prefix <prefix>`	port column prefix	`x` (matches `x0..x{np-1}`)
`--scan-trace-has-header 0\|1`	CSV has header row	`1`
`--scan-trace-time-units s\|ms\|us\|ns`	time unit conversion	`s`
`--scan-trace-use-all 0\|1`	use full trace length	`1`
`--scan-trace-start-index <k>`	start from sample k	`0`
`--scan-trace-max-steps <n>`	cap number of steps	`0` (unbounded)

Example: Trace Replay with Autotune

tdse profiler irc-scan \
  --np 8 --nh 257 \
  --scan-prefix-counts 32,64,128 \
  --scan-growth-values 2,4 \
  --scan-input trace \
  --scan-trace-csv ./my_trace.csv \
  --scan-trace-time-col t \
  --scan-trace-port-prefix x \
  --scan-trace-has-header 1 \
  --scan-trace-time-units s \
  --scan-trace-use-all 1 \
  --scan-autotune 1 \
  --scan-target-rel-rms 1e-3 \
  --scan-objective speedup \
  --scan-prefer-mode either \
  --out-md ./irc_scan_report.md \
  --out-csv ./irc_scan_results.csv \
  --out-recommend-json ./irc_recommend.json

Autotune

Autotune automatically selects the best IRC parameters subject to accuracy constraints.

Options

Option	Meaning	Default
`--scan-autotune 0\|1`	enable autotune	`0` (off)
`--scan-target-rel-rms <float>`	maximum acceptable `rel_rms`	`1e-3`
`--scan-target-peak-abs <float>`	maximum acceptable `peak_abs`	`0` (disabled)
`--scan-target-spec-rel-rms <float>`	maximum acceptable `spec_rel_rms`	`0` (disabled)
`--scan-target-spec-max-rel <float>`	maximum acceptable `spec_max_rel`	`0` (disabled)
`--scan-objective <type>`	optimization goal (see below)	`speedup`
`--scan-prefer-mode v1\|v2\|either`	preferred IRC mode	`either`

Objective Functions

Objective	Selects The Combination That...
`speedup`	maximizes speedup subject to accuracy constraints
`maxspeed`	maximizes raw speed regardless of nh reduction
`minnh`	minimizes effective nh subject to accuracy constraints
`minerror`	minimizes error subject to minimum speedup threshold

Output

When autotune succeeds:

RECOMMEND: mode=v2 prefix=64 growth=4 rel_rms=8.3e-4 speedup=3.2x

When no combination meets all constraints:

NO_FEASIBLE: target_rel_rms=1e-3, best_rel_rms=2.1e-3
TOP1: mode=v2 prefix=128 growth=4 rel_rms=2.1e-3 speedup=1.8x
TOP2: ...
TOP3: ...

Use the fallback rows to decide whether to relax constraints or accept the best available option.

Spectral Metrics

Enable frequency-domain consistency checking during scan:

tdse profiler irc-scan \
  --np 8 --nh 257 \
  --scan-prefix-counts 32,64,128 \
  --scan-growth-values 2,4 \
  --scan-input lowpass \
  --scan-lowpass-alpha 0.95 \
  --scan-spectral 1 \
  --scan-spectral-nfreq 128 \
  --scan-spectral-fmin 0 \
  --scan-spectral-fmax 0 \
  --scan-spectral-weighting lowfreq

Spectral Options

Option	Meaning	Default
`--scan-spectral 0\|1`	enable spectral metrics	`0`
`--scan-spectral-nfreq <int>`	number of frequency sample points	`128`
`--scan-spectral-fmin <Hz>`	lower frequency bound	`0`
`--scan-spectral-fmax <Hz>`	upper frequency bound (auto-derives from tau if <=fmin)	`0`
`--scan-spectral-weighting flat\|lowfreq\|band`	weighting for `spec_rel_rms`	`flat`
`--scan-spectral-band f1,f2`	active frequency band when weighting is `band`	-

Metric Definitions

The spectral metrics compare the original impulse response H(w) against the compressed version H'(w):

H(w) = sum_k h[k] * exp(-j*w*tau[k]) (NUDFT/DTFT on the configured grid)
rel_f = ||H(w) - H'(w)||_F / (||H(w)||_F + eps)
spec_rel_rms = sqrt(mean_w(rel_f^2))
spec_max_rel = max_w(rel_f)

The weighting parameter controls how spec_rel_rms aggregates across frequency:

flat: equal weight at all frequencies
lowfreq: heavier weight at low frequencies (typical for power systems)
band f1,f2: only frequencies in [f1, f2] contribute

Applying Profiler Output to Runtime

The profiler produces a runtime_plan block compatible with tdse_backend_apply_plan().

Using the Recommendation JSON

Run the profiler with --out-recommend-json plan.json
Load the JSON in your build pipeline
At runtime, apply the plan:

tdse_backend_apply_plan(model, plan_json, plan_len);

The plan JSON contains backend selection, thread count, and precision settings derived from the profiler's benchmarks.

Manual Application

If you prefer to apply individual settings:

tdse_backend_set(model, backend_id);
tdse_local_threads_set(model, thread_count);
tdse_compute_precision_set(model, precision);

See Backend and Performance and Runtime API Summary for the full backend and performance API.

CLI Smoke Test

Verify the profiler CLI is working:

tdse profiler irc-scan \
  --np 2 --nh 16 \
  --scan-prefix-counts 4,8 \
  --scan-growth-values 2 \
  --scan-input lowpass \
  --out-md /dev/stdout

This should complete in under a second and print a Markdown table of scan results.

Profiler