GPU

Copy Queue woes.

Consider this case: you’re launching several Async Compute jobs. They need some input buffers to be uploaded, and some output buffers to be read back.

Base scenario

How do you minimize the time it takes to get all the outputs back to the CPU? I’m focusing on D3D12 and Windows for this article.