@cvi I worked on a project doing just that, except "system" memory was decentralized — each CPU had a few GB of RAM physically connected to that CPU. A thread running on CPU 1 could "own" a chunk of memory that was physically connected to CPU 7. Also, the virtual address of that memory might be different for each CPU, a DMA request from a PCIe peripheral, a DMA request from a legacy PCI device, and all different from the RAM's physical address. And we were intentionally creating high-contention situations, where every device that could conceivably access it was accessing overlapping chunks repeatedly. Following a transaction through multiple address translations in each direction to figure out why a device got stale data, or data it shouldn't have seen yet, or data from the wrong location was ... not fun ... to debug.