Time-Domain System Equivalent logoTime-Domain System EquivalentLinear dynamics, solved faster.Discuss Integration

Concurrency and Shutdown

Concurrency, synchronization, cancellation, and orderly runtime shutdown guidance.

Use this section when TDSE Runtime is live in more than one thread, worker, or shutdown path. It explains the one-handle rule, which calls can overlap safely, and how to reason about contention and teardown races.

Related Chapters For the base lifecycle semantics, see Runtime Lifecycle. For threading and memory scaling with many models, see Threading and Scaling.

Most production Runtime incidents in this area are not numerical defects. They are ownership defects: two threads think they own the same handle, shutdown starts while a step call is still live, or host wrappers treat release as a policy API instead of a finalizer.

Read this chapter together with Lifecycle and Ownership: lifecycle explains what local ownership means; this chapter explains how that ownership behaves under contention and teardown.

The One-Handle Rule

The core runtime concurrency rule is simple:

  • one live tdse_model_t* handle must not be entered concurrently for same-handle runtime APIs

The protected same-handle step surface is:

  • tdse_step_begin(...)
  • tdse_step_op(...)
  • tdse_step_hr(...)
  • tdse_step_ir(...)
  • tdse_step_commit(...)
  • tdse_step_dr(...)
  • tdse_model_close(...)
  • tdse_model_destroy(...)
  • tdse_model_release(...)

Supported versus unsupported host ownership

The runtime behavior is rejection, not automatic serialization. That is deliberate: silent serialization would hide ownership bugs and make timing behavior much harder to diagnose.

Which APIs Are Guarded Versus Snapshot-Style

Not every API participates in the same-handle execution guard. The most important distinction is:

  • execution APIs and teardown APIs are guard-sensitive
  • metadata and diagnostics snapshot APIs are live-handle reads

Snapshot-style query APIs are:

  • tdse_model_info(...)
  • tdse_model_state_info(...)
  • tdse_model_last_error_info(...)

These queries do not enter the same-handle execution guard. On a live handle they are intended to remain readable even if step traffic is happening on another thread. They still stop being valid once close or destroy has already started on that handle, in which case they return TDSE_ERR_INVALID_STATE.

Operational consequence:

  • do not use snapshot-query success as evidence that a handle is safe to step from another thread
  • do use snapshot queries to capture diagnostics before teardown begins or after a step failure

Runtime Guarantees Under Conflict

When runtime detects same-handle overlap on the guarded surface:

  • conflicting entrants are rejected rather than queued
  • step conflicts return TDSE_ERR_CONCURRENT_API_USE
  • lifecycle conflicts may return TDSE_ERR_CONCURRENT_API_USE, TDSE_ERR_TIMEOUT, or TDSE_ERR_INVALID_STATE depending on which API raced and whether ownership already moved
  • runtime avoids deadlock as part of the supported behavior

Important nuance:

  • forced overlap does not mean every entrant fails
  • one caller may legitimately acquire the handle first and succeed
  • conflicting entrants are rejected safely and observably

That distinction matters in stress tests. Under intentional race injection, "one winner plus one or more rejected entrants" is the expected shape.

Safe Parallelism Model

Supported host parallelism looks like this:

  1. one simulation worker owns one runtime handle
  2. different handles may run concurrently
  3. optional internal execution strategy may parallelize work inside one step call
  4. internal parallel execution does not make a single handle concurrently callable

Recommended host rule:

  • assign both execution ownership and shutdown ownership to the same wrapper or thread controller

If those responsibilities are split across components, the ownership handoff protocol must be explicit rather than implied.

Teardown State Model

The shutdown APIs are easiest to reason about as an ownership state machine.

Teardown-oriented handle states

Two rules keep this understandable:

  • once another thread acquires teardown ownership, the handle is no longer locally usable to you
  • TDSE_ERR_TIMEOUT from destroy is the one status that means the handle is still live after return

Close, Destroy, And Release Under Contention

tdse_model_close(...)

close is the immediate-answer lifecycle API.

Use it when:

  • you want a synchronous non-waiting lifecycle result
  • you want to detect overlap instead of waiting through it

Under contention:

  • if another guarded same-handle API is still in flight, close returns TDSE_ERR_CONCURRENT_API_USE
  • if another thread already started close or destroy, close returns TDSE_ERR_INVALID_STATE

Operational reading:

  • close is not a "fast destroy"
  • it is the right API when overlap itself is the information you need

tdse_model_destroy(...)

destroy is the recommended business-logic shutdown API because it makes wait policy explicit.

Use it when:

  • your host owns lifecycle policy
  • you need bounded wait behavior
  • you want structured wait telemetry

Destroy outcomes:

  • TDSE_OK: teardown completed and storage is gone
  • TDSE_ERR_TIMEOUT: destroy could not acquire teardown ownership within the budget; the handle remains valid
  • TDSE_ERR_INVALID_STATE: another thread already owns close or destroy; local ownership is gone

Destroy race interpretation

tdse_model_release(...)

release is for terminal cleanup, not shutdown policy.

Use it when:

  • a destructor must not become a policy engine
  • a finally or unwind path needs best-effort terminal cleanup

Under contention:

  • release waits for an in-flight same-handle API to leave the guard once it owns terminal cleanup
  • if another thread already started close, destroy, or release, release returns TDSE_ERR_INVALID_STATE

Operational reading:

  • release is acceptable in finalizers because it is cleanup-oriented
  • release is a poor choice for ordinary host shutdown because it does not carry a bounded-wait policy

Race Matrix

Use this matrix when a shutdown report is unclear about which thread acted first.

Situationclose seesdestroy seesrelease seesWhat the caller should assume
step call still in flight on same handleTDSE_ERR_CONCURRENT_API_USETDSE_OK or TDSE_ERR_TIMEOUT depending on wait budgetwaits until it can clean uphandle ownership is still local only if destroy timed out
another thread already started close/destroyTDSE_ERR_INVALID_STATETDSE_ERR_INVALID_STATETDSE_ERR_INVALID_STATElocal ownership is gone
no same-handle activity, caller owns handleTDSE_OKTDSE_OKTDSE_OKstorage is gone after return
destroy timed out while waitingn/aTDSE_ERR_TIMEOUTn/ahandle is still live and policy must decide next step

The support-facing rule is:

  • only TDSE_ERR_TIMEOUT from destroy preserves local ownership after return

Query Behavior During Shutdown

Support incidents often ask whether a host can still query state while shutdown is underway. Use the strict answer:

  • before close or destroy starts, snapshot queries are allowed on a live handle
  • after close or destroy has started, tdse_model_info(...), tdse_model_state_info(...), and tdse_model_last_error_info(...) may return TDSE_ERR_INVALID_STATE
  • after successful close, destroy, or release, no further handle use is valid

That means the host should capture evidence before starting teardown whenever possible.

Recommended diagnostic order on a failing live handle:

  1. tdse_model_info(...)
  2. tdse_model_state_info(...)
  3. tdse_model_last_error_info(...)
  4. chosen shutdown API and timeout policy

Bounded Destroy Policy

tdse_model_destroy_options_t.wait_timeout_ms is part of the public behavior, not a tuning footnote.

Interpret the wait budget as:

  • negative value: intentional infinite wait
  • small bounded value: supervisory shutdown that prefers a fast answer
  • medium bounded value: ordinary business-logic cleanup where some overlap is tolerated

Recommended policy questions:

  1. Which thread is allowed to decide the wait budget?
  2. What does the host do on TDSE_ERR_TIMEOUT?
  3. At what point does the host stop retrying and defer to finalizer cleanup?
  4. Which path records the observed wait_ms into logs or crash bundles?

Example bounded-destroy pattern:

tdse_model_destroy_options_t opt = tdse_model_destroy_options_init();
tdse_model_destroy_result_t result = tdse_model_destroy_result_init();
opt.wait_timeout_ms = 250.0;

tdse_status_t st = tdse_model_destroy(model, &opt, &result);
if (st == TDSE_OK) {
  model = NULL;
} else if (st == TDSE_ERR_TIMEOUT) {
  log_warn("destroy timeout wait_ms=%.3f timed_out=%d", result.wait_ms, result.timed_out);
  /* handle remains live; supervisor decides retry or escalation */
} else if (st == TDSE_ERR_INVALID_STATE) {
  model = NULL; /* ownership already moved elsewhere */
}

Worked Host Patterns

Pattern A. Worker-Owned Handle With Clean Shutdown

This is the preferred product integration pattern:

  1. worker thread creates the handle
  2. worker thread performs the step loop
  3. worker thread or its owner wrapper initiates destroy
  4. no other thread touches the handle after shutdown starts

Why it works:

  • execution ownership and teardown ownership stay aligned
  • there is no ambiguity about who records diagnostics or clears references

Pattern B. Supervisor Requests Stop, Worker Performs Destroy

This is often better than having the supervisor destroy directly:

  1. supervisor sets a stop request in host code
  2. worker exits its loop at a safe boundary
  3. worker performs tdse_model_destroy(...)
  4. supervisor observes the result through host telemetry

Why it works:

  • it avoids same-handle races between a live step call and a remote destroy
  • it keeps timeout policy near the code that already owns the handle

Pattern C. Destructor-Only Final Cleanup

Use this only when ordinary business shutdown has already failed or is unavailable:

class ModelGuard {
 public:
  ~ModelGuard() noexcept {
    if (handle_ != nullptr) {
      (void)tdse_model_release(handle_);
      handle_ = nullptr;
    }
  }
 private:
  tdse_model_t* handle_ = nullptr;
};

Why it is acceptable:

  • destructors need a terminal cleanup target
  • they should not be responsible for choosing timeout policy

Troubleshooting Shutdown Symptoms

Symptom: Destroy Times Out Repeatedly

Likely meaning:

  • a same-handle step call is still live when destroy starts
  • the host has no clear quiesce-before-destroy protocol

Collect:

  • destroy wait_timeout_ms
  • destroy wait_ms
  • failing thread identities from host logs
  • whether the worker loop had actually stopped before destroy

First corrective actions:

  1. move destroy to the owning worker or wrapper
  2. add an explicit stop-and-join phase before destroy
  3. keep bounded destroy, but treat repeated timeout as an ownership bug

Symptom: close Returns TDSE_ERR_CONCURRENT_API_USE

Likely meaning:

  • another same-handle API is still active

Correct interpretation:

  • runtime is working as designed
  • the host attempted an immediate-answer close during active execution

Corrective action:

  • use destroy if bounded waiting is desired
  • keep close only when fast overlap detection is the intent

Symptom: destroy Or release Returns TDSE_ERR_INVALID_STATE

Likely meaning:

  • another thread already owns teardown

Corrective action:

  • clear local references
  • stop issuing further same-handle calls
  • repair the host ownership model instead of retrying locally

When a concurrency issue is reported, collect the smallest bundle that explains ownership:

  • handle identity in host logs
  • failing API name
  • thread or worker identity on both sides of the race
  • tdse_model_state_info(...) captured before teardown when available
  • tdse_model_last_error_info(...) captured before teardown when available
  • chosen shutdown API: close, destroy, or release
  • destroy wait budget and returned wait_ms
  • whether the host had already requested worker stop
  • whether the issue reproduced with one handle per thread

This bundle is usually more valuable than a large raw trace with no ownership annotations.

Review Checklist

During integration review, ask:

  1. Which component owns each live handle?
  2. Which component is allowed to start close or destroy?
  3. Can a supervisor request stop without directly entering the handle?
  4. Where is the timeout policy for destroy chosen and logged?
  5. What happens to local references after TDSE_ERR_INVALID_STATE?
  6. Which path captures diagnostics before teardown starts?

Anti-Patterns

Avoid these patterns even if they look harmless in local tests:

  1. sharing one handle across workers and assuming Runtime will serialize it
  2. using release as the default business-logic shutdown API
  3. treating close as a faster version of destroy
  4. retrying local teardown after TDSE_ERR_INVALID_STATE
  5. treating TDSE_ERR_TIMEOUT as if the handle were already gone
  6. starting teardown before the host has a stop or quiesce protocol