Multi-Model Deployment Patterns
Patterns for deploying multiple TDSE models inside one host application.
Use this section when one model is no longer enough and you need to choose a deployment pattern: many independent models, one reused model for repeated sweeps, parallel copies for throughput, or separate models running at different rates.
Related Chapters For one-handle ownership and shutdown rules, see Concurrency and Shutdown. For threading and memory scaling details, see Threading and Scaling. For variable time-stepping in multi-rate scenarios, see Variable Time-Step Integration.
Use these patterns when a deployment runs different TDSE models for different subsystems, or multiple copies of the same model for parameter sweeps.
Choose A Deployment Pattern
| Situation | Recommended Pattern | Why |
|---|---|---|
| independent subsystems that can run side by side | N parallel independent models | simplest ownership story and highest operational clarity |
| same pack, repeated sweeps, memory is tight | batch sweep with one reused model | lowest memory footprint |
| same pack, repeated sweeps, throughput matters most | batch sweep with parallel copies | easiest way to scale wall-clock throughput |
subsystems need different dt values | multi-rate coupling | keeps each model at the rate it actually needs |
N Parallel Independent Models
This is the default production pattern. Each model has its own handle, its own state, and a clear owner.
#define N_MODELS 8
tdse_model_t* models[N_MODELS];
for (int i = 0; i < N_MODELS; i++) {
tdse_model_create_diagnostics_t diag = tdse_model_create_diagnostics_init();
tdse_model_create(packs[i], pack_sizes[i], &diag, &models[i]);
}
/* Each worker thread owns one model */
#pragma omp parallel for
for (int i = 0; i < N_MODELS; i++) {
for (uint64_t n = 0; n < nsteps; ++n) {
tdse_step_begin(models[i], t[n], dt);
tdse_step_op(models[i], &op);
tdse_step_hr(models[i], hr);
tdse_step_commit(models[i], primary[i]);
}
}
Rules:
- Apply the one-handle rule from Concurrency and Shutdown: one worker owns one live handle at a time.
- Each handle has its own history buffer and state machine.
- Models may use different packs, different
dt, or different backends.
Per-Model Resource Budget
Each model handle consumes its own resources. When planning a deployment:
| Resource | Per Model | Shared |
|---|---|---|
| History ring buffer | yes (nh * nq * sizeof(double)) | no |
| Operator workspace | yes (nq * np * sizeof(double)) | no |
| GPU stream/context | yes (when using CUDA backend) | GPU device memory is shared |
| CPU thread pool | configurable via tdse_local_threads_set | OS thread pool |
| Backend selection | per-model via tdse_backend_set | Backend registry |
Use per-model resource controls only after ownership is already stable:
/* Assign CPU threads per model */
int threads_per_model = physical_cores / N_MODELS;
for (int i = 0; i < N_MODELS; i++) {
tdse_local_threads_set(models[i], threads_per_model);
}
GPU Sharing Across Models
Multiple models can share the same GPU. Each gets its own CUDA stream, but device memory is shared.
Guidelines:
- Estimate total GPU memory as
sum(per_model_gpu_footprint)+ overhead (~50-100 MB). - Monitor with
nvidia-smiduring initial deployment testing. - If GPU allocation fails for any model,
tdse_model_createreturnsTDSE_ERR_OUT_OF_MEMORY. - Prefer the async pipeline mode for concurrent GPU models:
for (int i = 0; i < N_MODELS; i++) {
tdse_cuda_backend_config_t cuda_cfg;
tdse_cuda_backend_get_config(models[i], &cuda_cfg);
cuda_cfg.pipeline_mode = TDSE_CUDA_PIPELINE_ASYNC;
tdse_cuda_backend_set_config(models[i], &cuda_cfg);
}
Batch Sweep Pattern
Use this pattern when the mathematical model stays the same but inputs, operating points, or sweep values change.
For parameter sweeps where the same pack structure is reused with different inputs:
tdse_model_t* base_model;
tdse_model_create(pack, pack_size, &diag, &base_model);
/* Option A: Sequential reuse with reset between sweeps */
for (int sweep = 0; sweep < N_SWEEPS; sweep++) {
for (uint64_t n = 0; n < nsteps; ++n) {
tdse_step_begin(base_model, t[n], dt);
/* ... solve with sweep-specific primary vectors ... */
tdse_step_commit(base_model, primary_sweep[sweep]);
}
tdse_model_reset(base_model); /* clear committed history for next sweep */
}
/* Option B: Parallel sweep with one model per sweep value */
tdse_model_t* sweep_models[N_SWEEPS];
for (int s = 0; s < N_SWEEPS; s++) {
tdse_model_create(pack, pack_size, &diag, &sweep_models[s]);
}
Read the tradeoff plainly:
- Option A is memory-efficient and simpler to operate.
- Option B is throughput-efficient and easier to spread across workers or devices.
- If repeated sweeps are frequent but not latency-sensitive, start with Option A.
Multi-Rate Coupling
When different subsystems require different time resolutions, use separate models with
different model_dt values and synchronize at coupling boundaries:
tdse_model_t* fast; /* model_dt = 1 ns */
tdse_model_t* slow; /* model_dt = 10 ns */
for (step = 0; step < TOTAL_STEPS; step++) {
tdse_step_begin(fast, t_fast, 1e-9);
/* ... step fast model ... */
tdse_step_commit(fast, primary_fast);
if (step % 10 == 0) {
/* Extract coupling variables from fast model */
/* ... */
tdse_step_begin(slow, t_slow, 10e-9);
/* ... step slow model with coupled inputs ... */
tdse_step_commit(slow, primary_slow);
}
t_fast += 1e-9;
t_slow = (step / 10) * 10e-9;
}
See Variable Time-Step Integration for more details on multi-rate patterns.
Deployment Checklist
- Estimate total memory: N * per-model footprint + shared overhead
- Assign thread resources: divide
local_threadsacross models - Select backend per model: CPU for small models, GPU for large ones
- Verify GPU memory budget if using CUDA backend
- Choose sweep strategy: sequential with reset vs. parallel copies
- Test scaling: run with 1, 2, 4, 8 models and measure throughput per model
- Monitor guard metrics on each model independently
- Keep shutdown ownership clear: destroy each model on its owning thread or wrapper path
