[RFC-15] Add experimental CHERI support (hybrid kernel) #1344

heshamelmatary · 2024-11-03T19:05:44Z

This PR adds experimental support for building and running the seL4 kernel in hybrid mode, and userspace in CHERI C (purecap) mode. The tested architectures are Morello (ARMv8) and CHERI-RISC-V, both RV32 and RV64. This is an incremental base port that allows further features and fine-grained security implementations to be implemented on top.

Only builds and runs with fixed default CMake configurations at the moment. For example, features like hypervisor and SMP aren't supported yet. Furthermore, It's not yet ready for others to build and run whole projects on top and requires supporting from other repos (eg build system, elfloader, OpenSBI, userspace libs, etc) that we will push support for later.

Future and ongoing work:

Benchmarking.
Complete sel4test/sel4bench purecap libraries.
Purecap elfloader.
Complete end-to-end purecap sel4test project that runs on all supported platforms.
Further temporal safety and security design reconsiderations and enhancements (e.g., CHERI capability propagation inter-AS

This PR updates RFC-15, is a subset of (#1322), and depends on issue #1185 (just for Morello/Arm, and not CHERI-RISC-V)

src/kernel/boot.c

This change adds a new rword_t type for variables that may hold CHERI capabilities at any point in the kernel. rword_t: newly added to *always* hold capability-width variables in CHERI mode (e.g., for purecap user-space register context, IPC message registers, syscall args, etc). - In non-CHERI mode, this is just an integer and corresponds to unsigned long (word_t). - In any CHERI mode, this is a capability-width type and corresponds to CHERI's __uintcap_t. word_t: is used by seL4 as some type that can hold anything (eg unsigned long, and is the most widely used for mixed use cases such as pointers, integers, registers, etc. It's left it as it is to only represent "integers". vptr_t: conventionally only holds user pointers. Because we only support purecap user-space, any user pointers held in vptr_t need to be capabilities, hence this type is changed to __uintcap_t when in CHERI-mode. seL4_Register: newly added and is used in user-space to hold capability-width values which could be pointers or not. Only purecap CHERI mode is supported for users, so this is always defined to the kernel's rword_t type. Signed-off-by: Hesham Almatary <[email protected]>

Enable seL4's IPC buffer to seemingly hold CHERI-sized words. Message registers in the IPC buffer intuitively correspond to the underlying HW register/word's size/format, not an integer or a pointer. It is used during IPC for various use cases such as: 1) Transferring HW registers (eg read/write/copy TCBRegisterContext) 2) Transferring register context to a fault handler 3) Transferring HW-register-sized data between user threads 4) System call arguments (eg TLSBase, VCPU reg) Hence, it is a deliberate decision to have msg[] the same as rword_t/seL4_Register. This commits adds further required changes to an assumption seL4 is making where it assumes all of the IPC buffer's fields lower to word_t. This is obscure and is no longer true; the IPC buffer holds different types such as seL4_MessageInfo_t, seL4_Register, seL4_Word, and seL4_CPtr; any of which shouldn't be assumed to lower to word_t. Thus, we do the following changed: 1- When a buffer pointer is passed in the kernel without knowing what it's going to be used for or what fields are going to be referenced, make it a generic pointer (void *), and change the current assumption that it is always (word_t *). 2- When the buffer pointer is going to be dominantly used as syscall argument, message register, or HW register (all of which could contain CHERI pointers), make it a pointer to rword_t. 3- When a buffer pointer is going to be used as an integer or CPtr, leave it to point to word_t. 4- Remove obscure offsetting and pointer arithmetics that assume msg[] and all other fields lower to word_t (eg bufferPtr[seL4_MsgMaxLength 2 i] Allowing/prohibiting tagged CHERI capabilities to be passed inter-AS IPC is another separate issue that will need to be addressed separately. Signed-off-by: Hesham Almatary <[email protected]>

HW registers have another format and size when CHERI is enabled. This needs to be able to hold full CHERI HW registers all the time when CHERI is enabled. Signed-off-by: Hesham Almatary <[email protected]>

System call arguments need to be of HW-register size type (whether they are tagged CHERI capabilities or not) in order to fully support purecap user ABI, besides saving/restoring HW registers. Signed-off-by: Hesham Almatary <[email protected]>

This commit adds core files to manipulate CHERI capabilities and does a basic port to architecture-independent code to build and run the kernel in hybrid mode. It also adds shared system support in shared CMake and C files for supported CHERI architectures. In hybrid mode, any pointer created by the kernel and passed to user, needs to be CHERI capabilities for purecap user; this includes the IPC buffer, hence it needs to be manually annotated as a CHERI capability (with the __capability keyword). Signed-off-by: Hesham Almatary <[email protected]>

Bulk porting to save/restore full CHERI registers, hardware init, build system flags, new CHERI PTE bits, and boot the root task (and others) in purecap mode and create CHERI pointers for them. This only support building/running the kernel in hybrid CHERI mode. Signed-off-by: Hesham Almatary <[email protected]>

Signed-off-by: Hesham Almatary <[email protected]>

Morello/CHERI add PTE bits permissions to control capability propagations and further allow revocation techniques. The user should be able to make use of that to implement security policies and user-level revokers. The default is to allow capability loads/stores for purecap user, since we mainly care about spatial memory safety at this point. If the user wants to enforce and implement some security policy (e.g., temporal memory safety and revokers) to control CHERI capability propagations, they can use their own non-default VM rights. Signed-off-by: Hesham Almatary <[email protected]>

Signed-off-by: Hesham Almatary <[email protected]>

CHERI-RISC-V add PTE bits permissions to control capability propagations and further allow revocation techniques. The user should be able to make use of that to implement security policies and user-level revokers. The default is to allow capability loads/stores for purecap user, since we mainly care about spatial memory safety at this point. If the user wants to enforce and implement some security policy (e.g., temporal memory safety and revokers) to control CHERI capability propagations, they can use their own non-default VM rights Signed-off-by: Hesham Almatary <[email protected]>

kent-mcleod

Hi Hesham, thanks for putting up this hybrid PR. I'm excited to continue the discussion and to also try it out.

I've tried to make separate threads for the design-level discussions I wanted to have first. Some meta comments:

I think for deciding new type names and IPC buffer formats we need to try reach some sort of consensus that then potentially produces an RFC for any type changes visible to the API of existing configurations. This RFC then can get approved and implemented and hopefully merged without being blocked by other CHERI related changes.
Defining and deciding allowable behaviors for CHERI capability transfer to a different address space will have the most verification-visible impact I predict and so we need to figure out what to do here quite early in the review process.

kent-mcleod · 2024-11-11T00:19:05Z

include/arch/arm/arch/types.h

+typedef __uintcap_t rword_t;
+typedef __uintcap_t vptr_t;
+#else
+typedef word_t rword_t;


Trying to pick up the discussion from #1322 (comment):

That discussion suggested register_t could be considered for holding register values with seL4_Register potentially being added to the libsel4 types for the public variant. What are the reasons for selecting rword_t now beyond register_t already being used for register IDs?

The reason is trying to make this PR minimum (as register_t is already used by seL4 and needed more changes), but I'd very much like it to be register_t. This is based on our discussion at the seL4 summit that we want to only include the necessary parts for CHERI, so I made it an optional commit and separated this renaming part in a separate commit here: CTSRD-CHERI@1e4608f

I am more than happy to integrate that back if we all agree that's fine.

If there was consensus in using register_t for an actual register value type, then we would want to separately take that separate commit to apply in advance of this PR.

kent-mcleod · 2024-11-11T00:24:24Z

include/arch/arm/arch/types.h

@@ -15,14  17,21 @@ typedef signed long sword_t;
 /* for printf() formatting */
 #define _seL4_word_fmt  l

-typedef word_t vptr_t;


vptr_t is currently used by the kernel as an offset into a virtual address space and at this stage I don't agree that vptr_t variables should hold CHERI values. I'm interested in having the discussion about this.

For example: create_it_frame_cap() calls -> cap_frame_cap_new() and vptr is passed through as the mapping address that's written into the hardware page tables which has no use for CHERI permissions.

vptr_t is currently used as both user pointers (ipcbuf_vptr, v_entry, bi_frame_vptr, all Arm's cache operations, etc) and offsets/integers as you mentioned. When we need a type to be able to hold both pointers and integers, we just a CHERI type, hence I changed vptr_t.

We could split that up into two types if you'd like, one that's always a pointer, and another that's just an integer (though, I wouldn't include the ptr_t part in it because it's not really a pointer). However, that will require changes to existing function definitions, calls, and structures that might be more disruptive than my current changes.

Ok your right that sometimes the kernel uses vptr_t as offset into its own address space. I'm not sure whether this goes beyond the cache maintenance instructions though.

Do the cache maintenance operations that operate by VA change on Arm with CHERI to take a capability as the virtual address argument or is it just the address?

There is a place where the kernel does use a vptr_t value to initialize the initial thread's R0 for the address to the bootinfo struct. But it casts the vptr_t to a pointer first. This seems like a valid point to need to explicitly construct a new CHERI capability for launching this thread.

Ok your right that sometimes the kernel uses vptr_t as offset into its own address space. I'm not sure whether this goes beyond the cache maintenance instructions though.

Do the cache maintenance operations that operate by VA change on Arm with CHERI to take a capability as the virtual address argument or is it just the address?

Yes, they need a valid tagged capability in C64 mode. But in hybrid kernel it's not required.

There is a place where the kernel does use a vptr_t value to initialize the initial thread's R0 for the address to the bootinfo struct. But it casts the vptr_t to a pointer first. This seems like a valid point to need to explicitly construct a new CHERI capability for launching this thread.

Yeah, bootinfo passed to user, IPC buffer pointer (inside bootinfo struct), and user v_entry (written in the root thread's PC register). So 3 places at least where vptr_t is used as a pointer and needs to be a valid CHERI capability in a hybrid kernel.

But because it's the kernel managing CHERI pointers for a remote address space doesn't it make sense to pass around the raw address space offset and only construct the CHERI pointer right at the end when writing it into the user state?

kent-mcleod · 2024-11-11T00:51:24Z

libsel4/include/sel4/constants.h

+#if defined(CONFIG_HAVE_CHERI)
+    /* As CHERI doubles the register size, this affects the size of msg[]
+     * array, so we halve it here to have an overall fixed-size IPC buffer
+     * for both CHERI and non-CHERI builds. We further substract 1 more CHERI
+     * word to meet the alignment constraint set on the IPC buffer struct.
+     */
+    seL4_MsgMaxLength = 59
+#else
    seL4_MsgMaxLength = 120
+#endif


[cheri][ipc] Make IPC's msg[] seL4_Register type

Hence, it is a deliberate decision to have msg[] the same as
rword_t/seL4_Register.

This decision is something that should have more discussion and I think the design choice here likely needs to be presented at the level of the RFC.

Currently the IPC buffer is defined in terms of 120 words for all architectures where a word is the same size for a pointer and for data. A benefit of this choice is the direct mapping of msg registers onto hardware registers for fast IPC transfer.

I argue that seL4_Word msg[seL4_MsgMaxLength]; should be preserved on CHERI configurations to continue supporting programs and interfaces that make use the IPC buffer to transfer messages. When transferring values that aren't CHERI pointers (The common case), halving the number of message registers and doubling their size would add a bunch of porting effort and make it harder to write programs that target both CHERI and non-CHERI hardware. We should consider it's possible to avoid this.

There are other options we could consider such as defining an alternate seL4_IPCBuffer type that can be used for invocations that transfer hardware caps values. Or inserting a union into the storage occupied by the msg field that allows it to be used through a seL4_Word interface or instead through a seL4_HWCap interface. (I think it would be worth considering aligning up the msg field to 128bits in this scenario).

[cheri][ipc] Make IPC's msg[] seL4_Register type

Hence, it is a deliberate decision to have msg[] the same as
rword_t/seL4_Register.

This decision is something that should have more discussion and I think the design choice here likely needs to be presented at the level of the RFC.

I am happy to discuss that, I'll just quote my reasons in the commit message here again:

Enable seL4's IPC buffer to seemingly hold CHERI-sized words.
Message registers in the IPC buffer intuitively correspond to the
underlying HW register/word's size/format, not an integer or a pointer.
It is used during IPC for various use cases such as:

Transferring HW registers (eg read/write/copy TCBRegisterContext)

Transferring register context to a fault handler

Transferring HW-register-sized data between user threads

System call arguments (eg TLSBase, VCPU reg)

Currently the IPC buffer is defined in terms of 120 words for all architectures where a word is the same size for a pointer and for data. A benefit of this choice is the direct mapping of msg registers onto hardware registers for fast IPC transfer.

I argue that seL4_Word msg[seL4_MsgMaxLength]; should be preserved on CHERI configurations to continue supporting programs and interfaces that make use the IPC buffer to transfer messages. When transferring values that aren't CHERI pointers (The common case), halving the number of message registers and doubling their size would add a bunch of porting effort and make it harder to write programs that target both CHERI and non-CHERI hardware. We should consider it's possible to avoid this.

The common case is transferring untagged data "words" or messages, not 64-bit integers. seL4_Word is no longer representing a HW word here, it's just an integer. The name is just left as it is (probably misleadingly) to avoid more disruptive changes, but it's documented in the commit message and it should only be used to hold integers.

I believe that this change has no effect at all on continuing supporting programs and interfaces, on the contrary, it's necessary to:

naturally respect, align with, and intentionally support the purecap ABI (where your words are really no longer seL4_Word integers), and,

avoid extra porting and maintenance headache (e.g., casting, unions, ifdefs, etc). This claim is based on my experimentations with sel4test and sel4bench that worked just fine with that change whether it was 120 words, or 59 words, with no changes at all this part, whether for CHERI or non-CHERI builds.

Perform 2x faster IPC (e.g., doing 128-bit untagged data copies, instead of 64-bit), when transferring fixed message sizes.

I don't mind preserving the seL4_MsgMaxLength to be 120 as it is (and I actually kept it that way in the previous PR), but that will require increasing the IPC buffer size, and I got the impression you don't want that from the discussion on the previous PR, so I changed it here.

There are other options we could consider such as defining an alternate seL4_IPCBuffer type that can be used for invocations that transfer hardware caps values. Or inserting a union into the storage occupied by the msg field that allows it to be used through a seL4_Word interface or instead through a seL4_HWCap interface. (I think it would be worth considering aligning up the msg field to 128bits in this scenario).

Like I said before, I would prohibit any tagged pointer transfer via memory across tasks and address spaces.

I would limit tagged pointer passing only via the register arguments of a select few syscalls, and add new single register read/write ones to avoid any need for tagged pointer passing via the IPC buffer.

seL4 can disallow multiple mappings of tagged pointer pages, as enabled with the new CHERI PTE flag set by:

Only allowing a tagged pointer memory mapping if the page cap used is the only copy of that cap. This can be done by checking whether the parent object is an UT and whether there are no siblings.

Prohibit cap copying of a page cap which is mapped with the CHERI PTE flag.
You need the flag, you really don't want to enable tagged pointers for shared memory.

The above may result in a slightly bigger diff, but the actual behavioural and semantic change would be much smaller.

Hi Indan,

Like I said before, I would prohibit any tagged pointer transfer via memory across tasks and address spaces.

Yeah we agree on the across address spaces part.

I would limit tagged pointer passing only via the register arguments of a select few syscalls, and add new single register read/write ones to avoid any need for tagged pointer passing via the IPC buffer.

I have been thinking about this approach as well; do you think if the user maps the IPC buffer page without CHERI PTE capability load/store permissions, this will yield to that limiting behaviour? This way, only the fast message registers will be able to contain tagged pointers, and those get saved/restored from the TCB's archRegisters, not the IPC buffer.

seL4 can disallow multiple mappings of tagged pointer pages, as enabled with the new CHERI PTE flag set by:

Only allowing a tagged pointer memory mapping if the page cap used is the only copy of that cap. This can be done by checking whether the parent object is an UT and whether there are no siblings.

Prohibit cap copying of a page cap which is mapped with the CHERI PTE flag.
You need the flag, you really don't want to enable tagged pointers for shared memory.

Exactly, we generally disallow tagged pointers for shared memory in CheriBSD (eg mmap with MAP_SHARED).

The above may result in a slightly bigger diff, but the actual behavioural and semantic change would be much smaller.
I will think about this further.

Like I said before, I would prohibit any tagged pointer transfer via memory across tasks and address spaces.

I made a separate thread for having that discussion: #1344 (comment)

I'd like for this thread to be about whether the IPC buffer storage should change it's type.

seL4_Word is no longer representing a HW word here, it's just an integer. The name is just left as it is (probably misleadingly) to avoid more disruptive changes, but it's documented in the commit message and it should only be used to hold integers.

I don't think we agree on what a HW word is. Usually 1, it's the natural unit of data used by the processor which is still 64 bits in an ARM C64 architecture because the natural size of data used by instructions is still 64bits. I can see how a disagreement in this definition leads to downstream disagreements about what the natural size of base types should be things like the IPC buffer.

Follow 2), while trying to be more intentional and use seL4_Register in shared code (non-CHERI and CHERI) to differentiate between using integers, pointers, messages registers, HW registers, etc.

I think option 3 should be considered:

Specializing the seL4_Word to seL4_Register as a typedef is a pattern used elsewhere in libsel4 already,

It does distinguish that the type of the field is "register" based and does map to register transfers by the kernel,

The type change doesn't create any ABI break in non-CHERI architectures,

It shouldn't break existing C client code because all supported architectures have the same original type.

It gives a way for existing C code to be ported to CHERI aware while still being able to work with non-CHERI arch targets.

One possible downside is that non-C languages that generate language bindings via parsing the existing C headers may experience a breakage (eg a Rust bindgen approach), but we could ask about this when proposing the seL4_Register addition (it would need a small RFC).

@kent-mcleod Any preference on that change? I am happy to do either

I think my preference would need to be informed by trying to apply user level code to the changes. I know there's code that exists that uses a single 4k page for IPC buffer and some other thread specific data such as static TLS, and maybe it'd be better to keep the total IPC buffer size the same between C64 and A64.

No, we don't agree on that. Maybe Kent does, but I don't. I think it would increase the attack surface too much in practice if we would allow user space to do that. You don't for CHERI-BSD either, why open this gaping hole for seL4?

I think to merge something back upstream there should be acceptance of at least the design approach/goals and an acceptable implementation. I agree that transferring CHERI pointers across different virtual address spaces breaks a bunch of the monotonicity and confinement properties that CHERI enables. There's no way to use a mark and sweep approach for revoking capabilities if they can move across an address space for example.

@kent-mcleod Any preference on that change? I am happy to do either

I think my preference would need to be informed by trying to apply user level code to the changes. I know there's code that exists that uses a single 4k page for IPC buffer and some other thread specific data such as static TLS, and maybe it'd be better to keep the total IPC buffer size the same between C64 and A64.

Even when we increase the IPC buffer for CHERI, it'd still fit under 4k page. That is, it's only the structure size itself that got changed, but whenever you allocate an IPC buffer, it's still a 4k page as usual. No TLS or API changes were further required for that in sel4test/sel4bench.

I think to merge something back upstream there should be acceptance of at least the design approach/goals and an acceptable implementation. I agree that transferring CHERI pointers across different virtual address spaces breaks a bunch of the monotonicity and confinement properties that CHERI enables.

Yeah I am not disagreeing with that. I thought for this RFC:

We concluded that we globally, by default, forbid passing tagged pointers inter-AS.

We give the config option to the user to allow that for different threat models, given that they know what they're doing.

Even if the user allows 2, it still has the option to have strict CHERI security and avoid pointer aliasing by having non-overlapping address spaces for mutually distrusting (and untrustworthy compartments).

That way, we're making sure both seL4 and CHERI communities make the best use of this core CHERI-seL4 PR. We give the user the option to implement the most strict and/or informed flexible security policy based on their threat models. Where's the security gap here or the disagreement? What are the blockers?

There's no way to use a mark and sweep approach for revoking capabilities if they can move across an address space for example.

There are CHERI approaches to do so and to revoke propagated capabilities, however, this is a temporal safety issue that's beyond the scope of this project and isn't required to still have spatial memory safety. I listed it as future work, and this PR does allow it to be implemented by exporting the relative CHERI PTE bits. CheriBSD didn't have that for quite sometime, so it's totally fine for CHERI-seL4 to offer some spatial memory security and not address all CHERI issues we can think of immediately before this RFC/PR gets accepted; given that we know there are solutions for them and we don't forbid them. As I mentioned, this is just an incremental step to allow sensible CHERI use-cases on top with flexible security policies based on user's threat model.

Follow 2), while trying to be more intentional and use seL4_Register in shared code (non-CHERI and CHERI) to differentiate between using integers, pointers, messages registers, HW registers, etc.

I think option 3 should be considered:

Specializing the seL4_Word to seL4_Register as a typedef is a pattern used elsewhere in libsel4 already,

It does distinguish that the type of the field is "register" based and does map to register transfers by the kernel,

The type change doesn't create any ABI break in non-CHERI architectures,

It shouldn't break existing C client code because all supported architectures have the same original type.

It gives a way for existing C code to be ported to CHERI aware while still being able to work with non-CHERI arch targets.

This PR does all of that.

One possible downside is that non-C languages that generate language bindings via parsing the existing C headers may experience a breakage (eg a Rust bindgen approach), but we could ask about this when proposing the seL4_Register addition (it would need a small RFC).

kent-mcleod · 2024-11-11T01:04:03Z

libsel4/include/interfaces/object-api.xml

@@ -811,7  811,7 @@
             <description>
                An invocation for setting the Thread Local Storage (TLS) base address. This ensures that across all platforms, the TLSBase register is viewed as being completely mutable, just like all of the general purpose registers, even on platforms where modification is a privileged operation.
             </description>
-             <param dir="in" name="tls_base" type="seL4_Word"
+             <param dir="in" name="tls_base" type="seL4_Register"


Can we maintain a list of all the invocation and syscalls that are having their interfaces changed to transfer hardware caps?
My understanding so far is that the following change:

TCBSetTLSBase,

All invocations that take or return a seL4_UserContext

seL4_SetTLSBase

And what isn't expected to change:

VSpace operations that use a virtual address for mapping or cache operations.

seL4_TCB_SetIPCBuffer

(I assume that the seL4_ARM_SMC cap is out of scope at this stage too).

Good idea. I tried to separate such changes into separate "ABI" commits. seL4_ARM_SMC and seL4_ARM_VCPU_WriteRegs, seL4_ARM_VCPU_ReadRegs and future RISC-V hypervisor syscalls, etc are out of scope of this PR, but will need to change when we support hypervisors. Otherwise, what you mentioned are the minimum ones that were required to support passing purecap sel4test with the default configs.

kent-mcleod · 2024-11-11T01:09:50Z

src/kernel/thread.c

@@ -195,9  195,9 @@ void doReplyTransfer(tcb_t *sender, tcb_t *receiver, cte_t *slot, bool_t grant)
 #endif
 }

-void doNormalTransfer(tcb_t *sender, word_t *sendBuffer, endpoint_t *endpoint,


Whether transferring HWCaps in an IPC buffer to a thread of a different VSpace is allowed is also something that probably needs to be specified in the RFC. I think that if the restriction is added, it would also require adding more invocations that allow transferring HWCapabilities across an address space.

My opinion is that conceptually, if transferring HWCaps across an address space is prohibited, then it follows that performing TCBWriteRegisters when the target TCB is in a different address space should also be prohibited unless the caller can present sufficient authority - control of the target VSpace via a Vspace capability, which would introduce new invocation variations.

Whether transferring HWCaps in an IPC buffer to a thread of a different VSpace is allowed is also something that probably needs to be specified in the RFC. I think that if the restriction is added, it would also require adding more invocations that allow transferring HWCapabilities across an address space.

I touched a bit on that in the previous reply.

My opinion is that conceptually, if transferring HWCaps across an address space is prohibited, then it follows that performing TCBWriteRegisters when the target TCB is in a different address space should also be prohibited unless the caller can present sufficient authority - control of the target VSpace via a Vspace capability, which would introduce new invocation variations.

Yeah I agree. This is also discussed in the other PR, and I suggested using the CHERI sealing mechanism for that, which might get us away with not having to introduce new seL4 invocations. This PR is an incremental effort, however. It's not aiming to have the most secure/restrictive CHERI solution yet, but should allow this to happen. I am happy to explore and trial with sealing if you think this (i.e., prohibit TCBWriteRegisters and friends from de-referencing inter-AS CHERI caps) is a blocker and needs to be addressed now. But IMO, it's not too bad for this PR to say, "if you build with CHERI and want secure CHERI behaviour, don't overlap your user processes address spaces, until/unless you/we provide better fine-grained solution, otherwise you should know what you're doing at your own risk"

Are the goals of this PR for merging or to serve as an informal checkpoint of progress? I assume that a goal of the RFC is to achieve consensus on design of CHERI support for seL4. So we eventually need to address this issue before the RFC can produce a result that has code landed in the seL4 repo. But having this PR in its current state is still useful to be able to use experimentally with user level projects.

I guess this PR should maybe be given the draft status too.

Both actually. I made the PR in a good-enough shape to be merged as an incremental core base that can, in its current state, run complete purecap sel4test/sel4bench userspace (to provide spatial memory safety as proposed, but not temporal memory-safety as it's beyond the scope of this project). It doesn't prohibit future seL4 and/or CHERI use cases or experimentations, while allowing for flexible security policies at user-level (e.g., completely disallowing tagged CHERI caps in the IPC buffer, avoid CHERI pointer aliasing by asking the user to not have overlapping AS, having intermediate trusted servers, etc). But that's just my opinion.

What I'd like to get out of this RFC PR and work with you and the community towards is identifying:

Any CHERI design blockers that must be addressed before this RFC/PR gets accepted.

Whether the verification-visible changes in non-CHERI shared code are minimal enough to be tolerated.

Future optional CHERI design trade-offs that aren't blockers for this PR to get merged, but important to think about.

anything else?

I'd put sealing and associating endpoint permissions with CHERI under 3, but happy to discuss.

kent-mcleod · 2024-11-13T22:54:19Z

include/cheri/mode/cheri.h

+typedef __SIZE_TYPE__ ptraddr_t;
+
+void *__capability cheri_seal_cap(void *__capability unsealed_cap, size_t otype);
+void *__capability cheri_unseal_cap(void *__capability unsealed_cap);


How is authority for capability sealing and unsealing expected to be managed in seL4? What authority is needed to be able to seal a capability?

Capability sealing The CSeal and CUnseal instructions seal or unseal capabilities given a
suitable authorizing capability. Sealed capabilities allow software to implement encap-
sulation, such as is required for software compartmentalization.

How is authority for capability sealing and unsealing expected to be managed in seL4? What authority is needed to be able to seal a capability?

It currently isn't expected to be used in seL4 kernel itself (similar to CheriBSD). But it's just provided as CHERI helpers here in case we do in the future. The user, however, can use sealing as they wish (and there are many user-level compartmentalisation use-cases for it).

To answer your question, in order to seal/unseal a capability, you'd have to have a "sealer/unsealer" capability that:

Has sealing/unsealing permissions.

Has the same otype as the capability you're sealing/unsealing.

Capability sealing The CSeal and CUnseal instructions seal or unseal capabilities given a
suitable authorizing capability. Sealed capabilities allow software to implement encap-
sulation, such as is required for software compartmentalization.

I probably need to look more at some examples. Does a CHERI cap with sealing/unsealing permissions and no otype set represent authority to craft sealing/unsealing caps for any otype? Is that something that is controlled by the OS in cheriBSD or would it be given to the program's runtime? Thinking about how this could work on seL4. I think it's only relevant at this stage if there's suggestions of using sealing/unsealing to transfer CHERI pointers via IPC buffers.

I probably need to look more at some examples. Does a CHERI cap with sealing/unsealing permissions and no otype set represent authority to craft sealing/unsealing caps for any otype?

Yes

Is that something that is controlled by the OS in cheriBSD or would it be given to the program's runtime?

It's given to the program. CheriBSD doesn't manage or use that.

Thinking about how this could work on seL4. I think it's only relevant at this stage if there's suggestions of using sealing/unsealing to transfer CHERI pointers via IPC buffers.

Sealing and their invocations is still pretty much researchy. CheriBSD just lets the user do what they want with it, but we don't have production-ready uses for it AFAIK, and it's not used by default apart from sentries (just a special case for CFI). The sealing mechanism itself (e.g., otypes, invocations, unsealing, etc) also differs between Morello, our CHERI-RISC-V, CHERIoT, and the to-be-standardised CHERI-RISC-V draft. I wouldn't strongly bound this PR to it, but we could keep thinking about solutions using it for future use, and probably make suggestions to something like CHERI-RISC-V ISA draft to better accommodate CHERI-seL4 needs if we think of any.

Indanz · 2024-11-14T13:02:02Z

So I think we should try to have high-level discussions on seL4/rfcs#21 and only discuss implementation details here. We have unresolved high-level decisions that influence the implementation a lot.

One question I have is how this PR compares to Sid's work 2 years ago. The diff seems much smaller there:

https://gitlab.com/icecap-project/seL4/-/compare/pre-morello-upstream-sel4...morello?from_project_id=22234764

heshamelmatary · 2024-11-14T14:37:43Z

So I think we should try to have high-level discussions on seL4/rfcs#21 and only discuss implementation details here. We have unresolved high-level decisions that influence the implementation a lot.

Yeah I don't really mind discussing here or there. So we could just lift the 4 points or a subset of them to have there.

One question I have is how this PR compares to Sid's work 2 years ago. The diff seems much smaller there:

https://gitlab.com/icecap-project/seL4/-/compare/pre-morello-upstream-sel4...morello?from_project_id=22234764

To give a simple answer, Sid's work is doing the bare minimum that's only allowing hybrid userspace. It was a great first step towards what we have now during a 3 month internship, as indicated by him [1]. We've investigated whether we can build on it to provide what we wanted/achieved in this PR(clean generic support for all-purecap userspace), and it wasn't sufficient. I've been in contact with Sid as well and he himself even wants to use this PR along with our purecap kernel and userspace.

[1] https://sid-agrawal.ca/sel4,/cheri,/morello,/aarch64,/cheribsd/2023/01/01/seL4-CHERI.html

kent-mcleod · 2024-11-15T06:45:59Z

Yeah I don't really mind discussing here or there. So we could just lift the 4 points or a subset of them to have there.

I'm happy to redirect design-level discussions back to the RFC, but when we last caught up, we concluded that we wanted to see this PR before discussing the RFC more, so it's natural that we're using the current PR as the basis for discussing the issues.

arichardson · 2024-11-19T23:58:13Z

src/machine/cheri.c

+ */
+
+/*
+ * This file is copied from CHERI-LLVM's cheri_init_globals.h and modified to


Why does this need to be modified? Can't you use the compiler-provided header instead. It works for all other usecases? Adding a copy of this code seems like a bad idea.

If the compiler provided header does not work, we should fix that in the compiler and just require a new enough compiler for CHERI.

@arichardson It's because seL4 builds standalone and doesn't use or rely on any compiler headers (or standard libraries)

It uses -nostdinc -ffreestanding -fno-common -nostdlib when building the kernel and doesn't link to the compiler library. But as long as seL4 CHERI support is unverified, tweaking that a bit is preferred to duplicating compiler code.

You did add seL4 specific code to that file though, so it's not a pure copy either.

Yeah I added some helper functions along with some bootstrapping to initialise the CHERI-seL4 caps to that file. I tried to only contain such CHERI helpers in just two self-contained files (cheri.h and cheri.c). Also I added the option to do either DYNAMIC or capable relocs for Morello, which isn't there in the compiler toolchain file.

What FreeBSD does to still allow access to the compiler-provided headers is add Clang's resource dir as a explicit include directory, which is quite easy in CMake compared to the complicated makefile logic FreeBSD uses (https://github.com/freebsd/freebsd-src/blob/4b65481ac68a99122991fd73583d071c5e27c65a/share/mk/bsd.sys.mk#L285).

Something like

if (CHERI) execute_process(COMMAND ${CMAKE_C_COMPILER} --print-resource-directory OUTPUT_VARIABLE _clang_resource_dir) add_compile_options(-idirafter ${_clang_resource_dir}/include) endif()

heshamelmatary force-pushed the cheri_hybrid_kernel branch 2 times, most recently from 21edfd8 to bde9f25 Compare November 4, 2024 14:53

arichardson reviewed Nov 4, 2024

View reviewed changes

src/kernel/boot.c Outdated Show resolved Hide resolved

heshamelmatary added 13 commits November 4, 2024 16:17

[cheri][registers] retype HW reg variables

15758aa

HW registers have another format and size when CHERI is enabled. This needs to be able to hold full CHERI HW registers all the time when CHERI is enabled. Signed-off-by: Hesham Almatary <[email protected]>

[cheri][morello][ipc] Retype buffer arg to rword_t

5e9f635

Signed-off-by: Hesham Almatary <[email protected]>

[cheri][morello][abi] Enforce user purecap ABI

fac7d8f

Signed-off-by: Hesham Almatary <[email protected]>

[cheri][riscv] low-level architecture port

ec31db3

Signed-off-by: Hesham Almatary <[email protected]>

[cheri][riscv][ipc] Retype buffer arg to rword_t

74484b6

Signed-off-by: Hesham Almatary <[email protected]>

[cheri][riscv][abi] Enforce user purecap ABI

ff80131

Signed-off-by: Hesham Almatary <[email protected]>

heshamelmatary force-pushed the cheri_hybrid_kernel branch from bde9f25 to 73bbaea Compare November 4, 2024 16:17

kent-mcleod reviewed Nov 11, 2024

View reviewed changes

kent-mcleod reviewed Nov 13, 2024

View reviewed changes

heshamelmatary mentioned this pull request Nov 19, 2024

RFC-15: Support CHERI/Morello in seL4 seL4/rfcs#21

Open

arichardson reviewed Nov 19, 2024

View reviewed changes

lsf37 added the new-platform platform ports label Dec 4, 2024

[RFC-15] Add experimental CHERI support (hybrid kernel) #1344

Are you sure you want to change the base?

[RFC-15] Add experimental CHERI support (hybrid kernel) #1344

Conversation

heshamelmatary commented Nov 3, 2024 • edited by jira bot Loading

kent-mcleod left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heshamelmatary Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heshamelmatary Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Indanz commented Nov 14, 2024

heshamelmatary commented Nov 14, 2024

kent-mcleod commented Nov 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heshamelmatary commented Nov 3, 2024 •

edited by jira bot

Loading

heshamelmatary Nov 14, 2024 •

edited

Loading

heshamelmatary Nov 15, 2024 •

edited

Loading