Flexible Untyped Memory Regions #1363

Indanz · 2024-12-08T12:04:39Z

Pre-RFC proof-of-concept for Aarch64.

Support for non-power-of-two and unaligned untyped memory regions.

It's fine that seL4 delegates memory management to user space, it's not fine that it makes doing that very hard because of arcane limitations. With arbitrary untyped memory regions there is no need to subdivide non-power-of-two memory regions into multiple, smaller UTs any more. Allocating physical memory with a specific address will also be much easier now, as it needs to allocate at most one UT to get to the right address (would be even easier if GetAddress would work on UT).

Instead of keeping track of free memory with an explicit index stored in the cap, use the address of the last child object its size to find the beginning of unallocated memory. This works because newly created objects are inserted just after the UT node in the derivation list.

Now that we don't need storage to keep track of the free index, we can store the full length in bytes instead of bits.

Also lifts the limitation that caps to UTs with children can't be copied: After copying the original cap will look "full" and only the derived cap can keep allocating further.

This doesn't actually change the user space API yet. Untyped_Retype has been prepared to use the size argument as a byte value, which can easily be made backward compatible with a simple wrapper. But for a real API change the bootinfo also needs to change to pass UTs regions with byte sizes instead of bit sizes, and that would break sel4test, which I didn't want to do yet.

Because UTs now act like a stack and can reallocate over previously deleted objects, it's not possible to only zero memory at UT reset time, so to limit WCET zeroing an alternative solution needed to be found, which I implemented for frames and CNodes (SCs can be ridiculously big too in theory). The current code probably needs to be made more generic so that there is less code duplication when adding more architectures.

Clearing memory just before use should be more cache friendly than zeroing it at UT reset time, and also fairer in CPU time accounting. A big problem of the current code is that the cache flushes do happen in one go without preemption. In practise that means the WCET is equal to the time it takes to more or less flush the whole last level cache. Another problem with zeroing memory at UT reset time is that if UT are allocated, the same memory gets zeroed multiple times unnecessarily.

The UT changes passed sel4test before I added the zeroing changes, I haven't tested those yet.

As this is a major API break, an RFC is needed, but before proceeding with that I wanted to share my implementation and gather verification impact feedback. Considering how easy the code changes were, I suspect verification won't be that hard either, but it will cost time. Except if there is some deep verification reliance on UTs being power-of-two in size anywhere in the proofs, then this won't work.

Indanz · 2024-12-08T12:05:24Z

Relevant discussion about memory zeroing at #1289.

Support for non-power-of-two and unaligned untyped memory regions. Instead of keeping track of free memory with an explicit index stored in the cap, use the address of the last child object its size to find the beginning of unallocated memory. This works because newly created objects are inserted just after the UT node in the derivation list. Now that we don't need storage to keep track of the free index, we can store the full length in bytes instead of bits. Also lifts the limitation that caps to UTs with children can't be copied: After copying the original cap will look "full" and only the derived cap can keep allocating further. Proof-of-concept which doesn't change the user space API yet. For that bootinfo also needs to change. Signed-off-by: Indan Zupancic <[email protected]>

Zero memory before use instead of at UT reset time. This is fairer because the actual user pays the CPU cost and not whoever happens to be the first to allocate from an UT. All memory must have been zeroed before the CNode can be used. However, as zeroing state is kept in the cap, we also need to guarantee that all memory is zeroed before the cap gets copied. This is done by checking for pending zeroing in deriveCap(). Signed-off-by: Indan Zupancic <[email protected]>

Proof-of-concept Aarch64 implementation. Zero memory before use instead of at UT reset time. This is fairer because the actual user pays the CPU cost and not whoever happens to be the first to allocate from an UT. All memory must have been zeroed before the page can be mapped. However, as zeroing state is kept in the cap, we also need to guarantee that all memory is zeroed before the cap gets copied. This is done by checking for pending zeroing in Arch_deriveCap(). The old code zeroed memory in CONFIG_RESET_CHUNK_BITS many times before checking for preemption. CONFIG_MAX_NUM_WORK_UNITS_PER_PREEMPTION << CONFIG_RESET_CHUNK_BITS amounts to 25 kB. Now we zero 16 kB before checking for preemption, which should both give much better performance and lower WCET. Signed-off-by: Indan Zupancic <[email protected]>

Indanz added enhancement verification Needs formal verification input/change, or is motivated by verification labels Dec 8, 2024

Indanz added 2 commits December 8, 2024 18:46

Indanz force-pushed the flexible-untyped branch from 80b0b36 to b71d927 Compare December 8, 2024 22:36

Indanz force-pushed the flexible-untyped branch from b71d927 to 5cfa394 Compare December 11, 2024 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flexible Untyped Memory Regions #1363

Flexible Untyped Memory Regions #1363

Indanz commented Dec 8, 2024

Indanz commented Dec 8, 2024

Flexible Untyped Memory Regions #1363

Are you sure you want to change the base?

Flexible Untyped Memory Regions #1363

Conversation

Indanz commented Dec 8, 2024

Indanz commented Dec 8, 2024