Time-Domain System Equivalent logoTime-Domain System EquivalentLinear dynamics, solved faster.Discuss Integration

Profiler

Profiler flows for scans, trace replay, autotuning, and applying results to runtime setup.

Audience: Integration engineers and performance engineers who need to benchmark convolution performance, explore IRC compression quality, or generate runtime plans.

Use this chapter when you want to measure TDSE performance on your hardware, compare execution choices, or generate a runtime plan you can apply in code.

Use this chapter to answer "how should I measure?" Then use Backend and Performance to answer "how should I interpret those results, choose a backend, and decide what evidence is strong enough for PoC or deployment?"

This split is intentional: Telemetry covers continuous runtime observability, while this chapter covers short, explicit measurement passes and runtime-plan generation.

What the Profiler Does

The TDSE Profiler is a CLI module (tdse profiler) that:

  1. benchmarks convolution performance on your hardware
  2. derives an optimal execution plan (backend selection, thread count, precision)
  3. scans IRC compression parameters for the best speed/accuracy trade-off
  4. produces a runtime_plan that can be passed directly to tdse_backend_apply_plan()

The profiler does not modify your pack. It measures and recommends; Runtime stays in charge of execution.

Quick Start

Minimal IRC scan with a synthetic lowpass input:

tdse profiler irc-scan \
  --np 8 --nh 257 \
  --scan-prefix-counts 16,32,64 \
  --scan-growth-values 2,4 \
  --scan-input lowpass \
  --scan-modes both

This scans all combinations of prefix counts [16, 32, 64] and growth values [2, 4] for both V1 and V2 IRC modes, using a built-in lowpass filter as the reference impulse response.

IRC Scan

What IRC Compression Does

Impulse Response Compression replaces a long impulse-response tail with a short prefix plus an exponential growth approximation. This reduces the per-step convolution cost from O(P^2 · nh) to O(P^2 · prefix_length) at the expense of a controllable accuracy loss.

When to use IRC:

  • nh is large (hundreds or thousands of taps)
  • the impulse response has a smooth exponential tail
  • per-step latency matters more than absolute spectral accuracy

When not to use IRC:

  • nh is small (under 50 taps)
  • the impulse response has sharp features in the tail
  • the application requires bit-exact convolution

Scan Parameters

ParameterMeaningTypical Values
--npport count for the synthetic modelyour model's np
--nhhistory depth for the synthetic modelyour model's nh
--scan-prefix-countscomma-separated prefix lengths to test16,32,64,128
--scan-growth-valuescomma-separated growth factors to test2,4
--scan-inputinput source: lowpass or tracelowpass for quick scan
--scan-modesIRC mode(s): v1, v2, or bothboth

IRC Modes

ModeDescriptionTrade-off
V1single-growth exponential tailsimpler, slightly less accurate
V2piecewise growth with prefixmore accurate for complex tails

Output Formats

FlagOutputUse When
--out-md <path>Markdown summary reporthuman review
--out-csv <path>CSV with per-combination metricsspreadsheet analysis
--out-recommend-json <path>machine-readable recommendationfeeding into build pipeline

Interpreting Results

The scan output includes per-combination metrics:

MetricMeaning
rel_rmsrelative RMS error of compressed vs full convolution
peak_abspeak absolute error
speedupspeedup factor over uncompressed convolution
spec_rel_rmsspectral-domain relative RMS (if spectral metrics enabled)
spec_max_relspectral-domain maximum relative error

A recommendation line appears if a feasible candidate exists:

RECOMMEND: mode=v2 prefix=64 growth=4 rel_rms=8.3e-4 speedup=3.2x

If no candidate meets all constraints:

NO_FEASIBLE: ... plus TOP1..TOP3 fallback rows

Trace Replay Scan

Instead of a synthetic lowpass, you can scan against a real simulation trace from your application. This produces more representative accuracy metrics.

CSV Input Format

Required columns: t, x0, x1, ..., x{np-1}

  • One sample per row
  • Time column must be strictly increasing
  • Non-uniform intervals are allowed
  • Missing or empty values are rejected

Example for np=3:

t,x0,x1,x2
0.000,1.0,0.0,0.5
0.001,0.95,0.05,0.48
0.002,0.90,0.10,0.46
OptionMeaningDefault
--scan-trace-csv <path>input CSV file (required with --scan-input trace)-
--scan-trace-time-col <name>time column namet
--scan-trace-port-prefix <prefix>port column prefixx (matches x0..x{np-1})
--scan-trace-has-header 0|1CSV has header row1
--scan-trace-time-units s|ms|us|nstime unit conversions
--scan-trace-use-all 0|1use full trace length1
--scan-trace-start-index <k>start from sample k0
--scan-trace-max-steps <n>cap number of steps0 (unbounded)

Example: Trace Replay with Autotune

tdse profiler irc-scan \
  --np 8 --nh 257 \
  --scan-prefix-counts 32,64,128 \
  --scan-growth-values 2,4 \
  --scan-input trace \
  --scan-trace-csv ./my_trace.csv \
  --scan-trace-time-col t \
  --scan-trace-port-prefix x \
  --scan-trace-has-header 1 \
  --scan-trace-time-units s \
  --scan-trace-use-all 1 \
  --scan-autotune 1 \
  --scan-target-rel-rms 1e-3 \
  --scan-objective speedup \
  --scan-prefer-mode either \
  --out-md ./irc_scan_report.md \
  --out-csv ./irc_scan_results.csv \
  --out-recommend-json ./irc_recommend.json

Autotune

Autotune automatically selects the best IRC parameters subject to accuracy constraints.

Options

OptionMeaningDefault
--scan-autotune 0|1enable autotune0 (off)
--scan-target-rel-rms <float>maximum acceptable rel_rms1e-3
--scan-target-peak-abs <float>maximum acceptable peak_abs0 (disabled)
--scan-target-spec-rel-rms <float>maximum acceptable spec_rel_rms0 (disabled)
--scan-target-spec-max-rel <float>maximum acceptable spec_max_rel0 (disabled)
--scan-objective <type>optimization goal (see below)speedup
--scan-prefer-mode v1|v2|eitherpreferred IRC modeeither

Objective Functions

ObjectiveSelects The Combination That...
speedupmaximizes speedup subject to accuracy constraints
maxspeedmaximizes raw speed regardless of nh reduction
minnhminimizes effective nh subject to accuracy constraints
minerrorminimizes error subject to minimum speedup threshold

Output

When autotune succeeds:

RECOMMEND: mode=v2 prefix=64 growth=4 rel_rms=8.3e-4 speedup=3.2x

When no combination meets all constraints:

NO_FEASIBLE: target_rel_rms=1e-3, best_rel_rms=2.1e-3
TOP1: mode=v2 prefix=128 growth=4 rel_rms=2.1e-3 speedup=1.8x
TOP2: ...
TOP3: ...

Use the fallback rows to decide whether to relax constraints or accept the best available option.

Spectral Metrics

Enable frequency-domain consistency checking during scan:

tdse profiler irc-scan \
  --np 8 --nh 257 \
  --scan-prefix-counts 32,64,128 \
  --scan-growth-values 2,4 \
  --scan-input lowpass \
  --scan-lowpass-alpha 0.95 \
  --scan-spectral 1 \
  --scan-spectral-nfreq 128 \
  --scan-spectral-fmin 0 \
  --scan-spectral-fmax 0 \
  --scan-spectral-weighting lowfreq

Spectral Options

OptionMeaningDefault
--scan-spectral 0|1enable spectral metrics0
--scan-spectral-nfreq <int>number of frequency sample points128
--scan-spectral-fmin <Hz>lower frequency bound0
--scan-spectral-fmax <Hz>upper frequency bound (auto-derives from tau if <=fmin)0
--scan-spectral-weighting flat|lowfreq|bandweighting for spec_rel_rmsflat
--scan-spectral-band f1,f2active frequency band when weighting is band-

Metric Definitions

The spectral metrics compare the original impulse response H(w) against the compressed version H'(w):

  • H(w) = sum_k h[k] * exp(-j*w*tau[k]) (NUDFT/DTFT on the configured grid)
  • rel_f = ||H(w) - H'(w)||_F / (||H(w)||_F + eps)
  • spec_rel_rms = sqrt(mean_w(rel_f^2))
  • spec_max_rel = max_w(rel_f)

The weighting parameter controls how spec_rel_rms aggregates across frequency:

  • flat: equal weight at all frequencies
  • lowfreq: heavier weight at low frequencies (typical for power systems)
  • band f1,f2: only frequencies in [f1, f2] contribute

Applying Profiler Output to Runtime

The profiler produces a runtime_plan block compatible with tdse_backend_apply_plan().

Using the Recommendation JSON

  1. Run the profiler with --out-recommend-json plan.json
  2. Load the JSON in your build pipeline
  3. At runtime, apply the plan:
tdse_backend_apply_plan(model, plan_json, plan_len);

The plan JSON contains backend selection, thread count, and precision settings derived from the profiler's benchmarks.

Manual Application

If you prefer to apply individual settings:

tdse_backend_set(model, backend_id);
tdse_local_threads_set(model, thread_count);
tdse_compute_precision_set(model, precision);

See Backend and Performance and Runtime API Summary for the full backend and performance API.

CLI Smoke Test

Verify the profiler CLI is working:

tdse profiler irc-scan \
  --np 2 --nh 16 \
  --scan-prefix-counts 4,8 \
  --scan-growth-values 2 \
  --scan-input lowpass \
  --out-md /dev/stdout

This should complete in under a second and print a Markdown table of scan results.