Streams

A Stream is an asynchronous work queue. Operations enqueued onto the same stream are processed in their enqueue order; operations on different streams may run concurrently.

Every Map::execute() call goes through a stream. If you do not bind one explicitly, the Map uses its parent Job’s default stream.

Default Stream — One per Job

When you create a Job, PXL allocates a private default stream for it. Every Map built from that Job uses this default stream unless the Map explicitly binds another.

flowchart LR
    subgraph JobA[Job A]
        MapA1[Map a1] --> StreamA[default stream]
        MapA2[Map a2] --> StreamA
    end
    subgraph JobB[Job B]
        MapB1[Map b1] --> StreamB[default stream]
    end
    StreamA --> Device((Device))
    StreamB --> Device

Consequences:

Multiple Jobs dispatch independently with no extra setup — each Job has its own default stream.
Maps built from the same Job share that Job’s default stream and are dispatched in enqueue order on the host side.

If two Maps from the same Job need to run concurrently, give at least one of them a dedicated stream (see below).

Custom Streams

Create a stream with pxl::createStream() and bind it to a Map:

auto stream = pxl::createStream();

map->setStream(stream);             // bind a custom stream
map->execute(/* args */);

// Reset to the Job's default stream
map->setStream();

pxl::destroyStream(stream);

A stream can be reused — it is not consumed by a single execution. The same stream can also be shared by multiple Maps; in that case, work from those Maps interleaves in enqueue order on the shared stream.

In-flight Limit and Backpressure

A stream can hold only a bounded number of in-flight operations. When the queue is full, Map::execute() blocks until a slot frees up. This is the library’s natural backpressure — you do not need to throttle externally.

execute() returns Result::Failure if the stream is torn down or its consumer thread dies while it is waiting.

When to Add More Streams

Streams parallelize the host-side dispatch path. They do not multiply device throughput.

A Job’s device throughput is bounded by the number of Subs it owns. Adding more streams to the same Job does not exceed that bound.
Multiple streams help when host-side argument setup or kernel-launch overhead is the bottleneck and there is room for more host concurrency.
To increase device parallelism, give the Job more Subs (Job::subAlloc()) or create more Jobs — not more streams.

A practical guideline:

Streams per Job ≤ Subs per Job. Going beyond that range tends to flatten or regress throughput because contention on shared dispatch resources begins to dominate.