Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC-15] Add experimental CHERI support (hybrid kernel) #1344

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

heshamelmatary
Copy link
Contributor

@heshamelmatary heshamelmatary commented Nov 3, 2024

This PR adds experimental support for building and running the seL4 kernel in hybrid mode, and userspace in CHERI C (purecap) mode. The tested architectures are Morello (ARMv8) and CHERI-RISC-V, both RV32 and RV64. This is an incremental base port that allows further features and fine-grained security implementations to be implemented on top.

Only builds and runs with fixed default CMake configurations at the moment. For example, features like hypervisor and SMP aren't supported yet. Furthermore, It's not yet ready for others to build and run whole projects on top and requires supporting from other repos (eg build system, elfloader, OpenSBI, userspace libs, etc) that we will push support for later.
 
Future and ongoing work:

  1. Benchmarking.
  2. Complete sel4test/sel4bench purecap libraries.
  3. Purecap elfloader.
  4. Complete end-to-end purecap sel4test project that runs on all supported platforms.
  5. Further temporal safety and security design reconsiderations and enhancements (e.g., CHERI capability propagation inter-AS

This PR updates RFC-15, is a subset of (#1322), and depends on issue #1185 (just for Morello/Arm, and not CHERI-RISC-V)

@heshamelmatary heshamelmatary force-pushed the cheri_hybrid_kernel branch 2 times, most recently from 21edfd8 to bde9f25 Compare November 4, 2024 14:53
src/kernel/boot.c Outdated Show resolved Hide resolved
This change adds a new rword_t type for variables that may hold CHERI
capabilities at any point in the kernel.

rword_t: newly added to *always* hold capability-width variables in
CHERI mode (e.g., for purecap user-space register context, IPC message
registers, syscall args, etc).
- In non-CHERI mode, this is just an integer and corresponds to
unsigned long (word_t).
- In any CHERI mode,  this is a capability-width type and
corresponds to CHERI's __uintcap_t.

word_t: is used by seL4 as some type that can hold anything (eg
unsigned long, and is the most widely used for mixed use cases such as
pointers, integers, registers, etc. It's left it as it is to only
represent "integers".

vptr_t: conventionally only holds user pointers. Because we only
support purecap user-space, any user pointers held in vptr_t need to
be capabilities, hence this type is changed to __uintcap_t when in
CHERI-mode.

seL4_Register: newly added and is used in user-space to hold
capability-width values which could be pointers or not. Only purecap
CHERI mode is supported for users, so this is always defined to the
kernel's rword_t type.

Signed-off-by: Hesham Almatary <[email protected]>
Enable seL4's IPC buffer to seemingly hold CHERI-sized words.
Message registers in the IPC buffer intuitively correspond to the
underlying HW register/word's size/format, not an integer or a pointer.
It is used during IPC for various use cases such as:
1) Transferring HW registers (eg read/write/copy TCBRegisterContext)
2) Transferring register context to a fault handler
3) Transferring HW-register-sized data between user threads
4) System call arguments (eg TLSBase, VCPU reg)

Hence, it is a deliberate decision to have msg[] the same as
rword_t/seL4_Register.

This commits adds further required changes to an assumption seL4 is
making where it assumes all of the IPC buffer's fields lower to word_t.
This is obscure and is no longer true; the IPC buffer holds different
types such as seL4_MessageInfo_t, seL4_Register, seL4_Word, and
seL4_CPtr; any of which shouldn't be assumed to lower to word_t.
Thus, we do the following changed:
1- When a buffer pointer is passed in the kernel without knowing what
it's going to be used for or what fields are going to be referenced,
make it a generic pointer (void *), and change the current assumption
that it is always (word_t *).
2- When the buffer pointer is going to be dominantly used as syscall
argument, message register, or HW register (all of which could contain
CHERI pointers), make it a pointer to rword_t.
3- When a buffer pointer is going to be used as an integer or CPtr,
leave it to point to word_t.
4- Remove obscure offsetting and pointer arithmetics that assume msg[]
and all other fields lower to word_t (eg
bufferPtr[seL4_MsgMaxLength   2   i]

Allowing/prohibiting tagged CHERI capabilities to be passed inter-AS IPC
is another separate issue that will need to be addressed separately.

Signed-off-by: Hesham Almatary <[email protected]>
HW registers have another format and size when CHERI is enabled. This
needs to be able to hold full CHERI HW registers all the time when CHERI
is enabled.

Signed-off-by: Hesham Almatary <[email protected]>
System call arguments need to be of HW-register size type (whether
they are tagged CHERI capabilities or not) in order to fully support
purecap user ABI, besides saving/restoring HW registers.

Signed-off-by: Hesham Almatary <[email protected]>
This commit adds core files to manipulate CHERI capabilities and
does a basic port to architecture-independent code to build and run
the kernel in hybrid mode. It also adds shared system support in shared
CMake and C files for supported CHERI architectures.

In hybrid mode, any pointer created by the kernel and passed to user,
needs to be CHERI capabilities for purecap user; this includes the
IPC buffer, hence it needs to be manually annotated as a CHERI
capability (with the __capability keyword).

Signed-off-by: Hesham Almatary <[email protected]>
Bulk porting to save/restore full CHERI registers, hardware init,
build system flags, new CHERI PTE bits, and boot the root task
(and others) in purecap mode and create CHERI pointers for them.
This only support building/running the kernel in hybrid CHERI mode.

Signed-off-by: Hesham Almatary <[email protected]>
Morello/CHERI add PTE bits permissions to control capability
propagations and further allow revocation techniques. The user should
be able to make use of that to implement security policies and
user-level revokers.

The default is to allow capability loads/stores for purecap user, since
we mainly care about spatial memory safety at this point. If the user
wants to enforce and implement some security policy (e.g., temporal
memory safety and revokers) to control CHERI capability propagations,
they can use their own non-default VM rights.

Signed-off-by: Hesham Almatary <[email protected]>
CHERI-RISC-V add PTE bits permissions to control capability
propagations and further allow revocation techniques. The user should
be able to make use of that to implement security policies and
user-level revokers.

The default is to allow capability loads/stores for purecap user, since
we mainly care about spatial memory safety at this point. If the user
wants to enforce and implement some security policy (e.g., temporal
memory safety and revokers) to control CHERI capability propagations,
they can use their own non-default VM rights

Signed-off-by: Hesham Almatary <[email protected]>
Copy link
Member

@kent-mcleod kent-mcleod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Hesham, thanks for putting up this hybrid PR. I'm excited to continue the discussion and to also try it out.

I've tried to make separate threads for the design-level discussions I wanted to have first. Some meta comments:

  1. I think for deciding new type names and IPC buffer formats we need to try reach some sort of consensus that then potentially produces an RFC for any type changes visible to the API of existing configurations. This RFC then can get approved and implemented and hopefully merged without being blocked by other CHERI related changes.
  2. Defining and deciding allowable behaviors for CHERI capability transfer to a different address space will have the most verification-visible impact I predict and so we need to figure out what to do here quite early in the review process.

typedef __uintcap_t rword_t;
typedef __uintcap_t vptr_t;
#else
typedef word_t rword_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to pick up the discussion from #1322 (comment):

That discussion suggested register_t could be considered for holding register values with seL4_Register potentially being added to the libsel4 types for the public variant. What are the reasons for selecting rword_t now beyond register_t already being used for register IDs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is trying to make this PR minimum (as register_t is already used by seL4 and needed more changes), but I'd very much like it to be register_t. This is based on our discussion at the seL4 summit that we want to only include the necessary parts for CHERI, so I made it an optional commit and separated this renaming part in a separate commit here: CTSRD-CHERI@1e4608f

I am more than happy to integrate that back if we all agree that's fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there was consensus in using register_t for an actual register value type, then we would want to separately take that separate commit to apply in advance of this PR.

@@ -15,14 17,21 @@ typedef signed long sword_t;
/* for printf() formatting */
#define _seL4_word_fmt l

typedef word_t vptr_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vptr_t is currently used by the kernel as an offset into a virtual address space and at this stage I don't agree that vptr_t variables should hold CHERI values. I'm interested in having the discussion about this.

For example: create_it_frame_cap() calls -> cap_frame_cap_new() and vptr is passed through as the mapping address that's written into the hardware page tables which has no use for CHERI permissions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vptr_t is currently used as both user pointers (ipcbuf_vptr, v_entry, bi_frame_vptr, all Arm's cache operations, etc) and offsets/integers as you mentioned. When we need a type to be able to hold both pointers and integers, we just a CHERI type, hence I changed vptr_t.

We could split that up into two types if you'd like, one that's always a pointer, and another that's just an integer (though, I wouldn't include the ptr_t part in it because it's not really a pointer). However, that will require changes to existing function definitions, calls, and structures that might be more disruptive than my current changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok your right that sometimes the kernel uses vptr_t as offset into its own address space. I'm not sure whether this goes beyond the cache maintenance instructions though.

Do the cache maintenance operations that operate by VA change on Arm with CHERI to take a capability as the virtual address argument or is it just the address?

There is a place where the kernel does use a vptr_t value to initialize the initial thread's R0 for the address to the bootinfo struct. But it casts the vptr_t to a pointer first. This seems like a valid point to need to explicitly construct a new CHERI capability for launching this thread.

Copy link
Contributor Author

@heshamelmatary heshamelmatary Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok your right that sometimes the kernel uses vptr_t as offset into its own address space. I'm not sure whether this goes beyond the cache maintenance instructions though.

Do the cache maintenance operations that operate by VA change on Arm with CHERI to take a capability as the virtual address argument or is it just the address?

Yes, they need a valid tagged capability in C64 mode. But in hybrid kernel it's not required.

There is a place where the kernel does use a vptr_t value to initialize the initial thread's R0 for the address to the bootinfo struct. But it casts the vptr_t to a pointer first. This seems like a valid point to need to explicitly construct a new CHERI capability for launching this thread.

Yeah, bootinfo passed to user, IPC buffer pointer (inside bootinfo struct), and user v_entry (written in the root thread's PC register). So 3 places at least where vptr_t is used as a pointer and needs to be a valid CHERI capability in a hybrid kernel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But because it's the kernel managing CHERI pointers for a remote address space doesn't it make sense to pass around the raw address space offset and only construct the CHERI pointer right at the end when writing it into the user state?

Comment on lines 55 to 64
#if defined(CONFIG_HAVE_CHERI)
/* As CHERI doubles the register size, this affects the size of msg[]
* array, so we halve it here to have an overall fixed-size IPC buffer
* for both CHERI and non-CHERI builds. We further substract 1 more CHERI
* word to meet the alignment constraint set on the IPC buffer struct.
*/
seL4_MsgMaxLength = 59
#else
seL4_MsgMaxLength = 120
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cheri][ipc] Make IPC's msg[] seL4_Register type

Hence, it is a deliberate decision to have msg[] the same as
rword_t/seL4_Register.

This decision is something that should have more discussion and I think the design choice here likely needs to be presented at the level of the RFC.

Currently the IPC buffer is defined in terms of 120 words for all architectures where a word is the same size for a pointer and for data. A benefit of this choice is the direct mapping of msg registers onto hardware registers for fast IPC transfer.

I argue that seL4_Word msg[seL4_MsgMaxLength]; should be preserved on CHERI configurations to continue supporting programs and interfaces that make use the IPC buffer to transfer messages. When transferring values that aren't CHERI pointers (The common case), halving the number of message registers and doubling their size would add a bunch of porting effort and make it harder to write programs that target both CHERI and non-CHERI hardware. We should consider it's possible to avoid this.

There are other options we could consider such as defining an alternate seL4_IPCBuffer type that can be used for invocations that transfer hardware caps values. Or inserting a union into the storage occupied by the msg field that allows it to be used through a seL4_Word interface or instead through a seL4_HWCap interface. (I think it would be worth considering aligning up the msg field to 128bits in this scenario).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cheri][ipc] Make IPC's msg[] seL4_Register type

Hence, it is a deliberate decision to have msg[] the same as
rword_t/seL4_Register.

This decision is something that should have more discussion and I think the design choice here likely needs to be presented at the level of the RFC.

I am happy to discuss that, I'll just quote my reasons in the commit message here again:

Enable seL4's IPC buffer to seemingly hold CHERI-sized words.
Message registers in the IPC buffer intuitively correspond to the
underlying HW register/word's size/format, not an integer or a pointer.
It is used during IPC for various use cases such as:

  1. Transferring HW registers (eg read/write/copy TCBRegisterContext)
  2. Transferring register context to a fault handler
  3. Transferring HW-register-sized data between user threads
  4. System call arguments (eg TLSBase, VCPU reg)

Currently the IPC buffer is defined in terms of 120 words for all architectures where a word is the same size for a pointer and for data. A benefit of this choice is the direct mapping of msg registers onto hardware registers for fast IPC transfer.

I argue that seL4_Word msg[seL4_MsgMaxLength]; should be preserved on CHERI configurations to continue supporting programs and interfaces that make use the IPC buffer to transfer messages. When transferring values that aren't CHERI pointers (The common case), halving the number of message registers and doubling their size would add a bunch of porting effort and make it harder to write programs that target both CHERI and non-CHERI hardware. We should consider it's possible to avoid this.

The common case is transferring untagged data "words" or messages, not 64-bit integers. seL4_Word is no longer representing a HW word here, it's just an integer. The name is just left as it is (probably misleadingly) to avoid more disruptive changes, but it's documented in the commit message and it should only be used to hold integers.

I believe that this change has no effect at all on continuing supporting programs and interfaces, on the contrary, it's necessary to:

  1. naturally respect, align with, and intentionally support the purecap ABI (where your words are really no longer seL4_Word integers), and,
  2. avoid extra porting and maintenance headache (e.g., casting, unions, ifdefs, etc). This claim is based on my experimentations with sel4test and sel4bench that worked just fine with that change whether it was 120 words, or 59 words, with no changes at all this part, whether for CHERI or non-CHERI builds.
  3. Perform 2x faster IPC (e.g., doing 128-bit untagged data copies, instead of 64-bit), when transferring fixed message sizes.

I don't mind preserving the seL4_MsgMaxLength to be 120 as it is (and I actually kept it that way in the previous PR), but that will require increasing the IPC buffer size, and I got the impression you don't want that from the discussion on the previous PR, so I changed it here.

There are other options we could consider such as defining an alternate seL4_IPCBuffer type that can be used for invocations that transfer hardware caps values. Or inserting a union into the storage occupied by the msg field that allows it to be used through a seL4_Word interface or instead through a seL4_HWCap interface. (I think it would be worth considering aligning up the msg field to 128bits in this scenario).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said before, I would prohibit any tagged pointer transfer via memory across tasks and address spaces.

I would limit tagged pointer passing only via the register arguments of a select few syscalls, and add new single register read/write ones to avoid any need for tagged pointer passing via the IPC buffer.

seL4 can disallow multiple mappings of tagged pointer pages, as enabled with the new CHERI PTE flag set by:

  • Only allowing a tagged pointer memory mapping if the page cap used is the only copy of that cap. This can be done by checking whether the parent object is an UT and whether there are no siblings.
  • Prohibit cap copying of a page cap which is mapped with the CHERI PTE flag.
    You need the flag, you really don't want to enable tagged pointers for shared memory.

The above may result in a slightly bigger diff, but the actual behavioural and semantic change would be much smaller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Indan,

Like I said before, I would prohibit any tagged pointer transfer via memory across tasks and address spaces.

Yeah we agree on the across address spaces part.

I would limit tagged pointer passing only via the register arguments of a select few syscalls, and add new single register read/write ones to avoid any need for tagged pointer passing via the IPC buffer.

I have been thinking about this approach as well; do you think if the user maps the IPC buffer page without CHERI PTE capability load/store permissions, this will yield to that limiting behaviour? This way, only the fast message registers will be able to contain tagged pointers, and those get saved/restored from the TCB's archRegisters, not the IPC buffer.

seL4 can disallow multiple mappings of tagged pointer pages, as enabled with the new CHERI PTE flag set by:

  • Only allowing a tagged pointer memory mapping if the page cap used is the only copy of that cap. This can be done by checking whether the parent object is an UT and whether there are no siblings.
  • Prohibit cap copying of a page cap which is mapped with the CHERI PTE flag.
    You need the flag, you really don't want to enable tagged pointers for shared memory.

Exactly, we generally disallow tagged pointers for shared memory in CheriBSD (eg mmap with MAP_SHARED).

The above may result in a slightly bigger diff, but the actual behavioural and semantic change would be much smaller.
I will think about this further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said before, I would prohibit any tagged pointer transfer via memory across tasks and address spaces.

I made a separate thread for having that discussion: #1344 (comment)

I'd like for this thread to be about whether the IPC buffer storage should change it's type.

seL4_Word is no longer representing a HW word here, it's just an integer. The name is just left as it is (probably misleadingly) to avoid more disruptive changes, but it's documented in the commit message and it should only be used to hold integers.

I don't think we agree on what a HW word is. Usually 1, it's the natural unit of data used by the processor which is still 64 bits in an ARM C64 architecture because the natural size of data used by instructions is still 64bits. I can see how a disagreement in this definition leads to downstream disagreements about what the natural size of base types should be things like the IPC buffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Follow 2), while trying to be more intentional and use seL4_Register in shared code (non-CHERI and CHERI) to differentiate between using integers, pointers, messages registers, HW registers, etc.

I think option 3 should be considered:

  • Specializing the seL4_Word to seL4_Register as a typedef is a pattern used elsewhere in libsel4 already,
  • It does distinguish that the type of the field is "register" based and does map to register transfers by the kernel,
  • The type change doesn't create any ABI break in non-CHERI architectures,
  • It shouldn't break existing C client code because all supported architectures have the same original type.
  • It gives a way for existing C code to be ported to CHERI aware while still being able to work with non-CHERI arch targets.

One possible downside is that non-C languages that generate language bindings via parsing the existing C headers may experience a breakage (eg a Rust bindgen approach), but we could ask about this when proposing the seL4_Register addition (it would need a small RFC).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kent-mcleod Any preference on that change? I am happy to do either

I think my preference would need to be informed by trying to apply user level code to the changes. I know there's code that exists that uses a single 4k page for IPC buffer and some other thread specific data such as static TLS, and maybe it'd be better to keep the total IPC buffer size the same between C64 and A64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't agree on that. Maybe Kent does, but I don't. I think it would increase the attack surface too much in practice if we would allow user space to do that. You don't for CHERI-BSD either, why open this gaping hole for seL4?

I think to merge something back upstream there should be acceptance of at least the design approach/goals and an acceptable implementation. I agree that transferring CHERI pointers across different virtual address spaces breaks a bunch of the monotonicity and confinement properties that CHERI enables. There's no way to use a mark and sweep approach for revoking capabilities if they can move across an address space for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kent-mcleod Any preference on that change? I am happy to do either

I think my preference would need to be informed by trying to apply user level code to the changes. I know there's code that exists that uses a single 4k page for IPC buffer and some other thread specific data such as static TLS, and maybe it'd be better to keep the total IPC buffer size the same between C64 and A64.

Even when we increase the IPC buffer for CHERI, it'd still fit under 4k page. That is, it's only the structure size itself that got changed, but whenever you allocate an IPC buffer, it's still a 4k page as usual. No TLS or API changes were further required for that in sel4test/sel4bench.

I think to merge something back upstream there should be acceptance of at least the design approach/goals and an acceptable implementation. I agree that transferring CHERI pointers across different virtual address spaces breaks a bunch of the monotonicity and confinement properties that CHERI enables.

Yeah I am not disagreeing with that. I thought for this RFC:

  1. We concluded that we globally, by default, forbid passing tagged pointers inter-AS.
  2. We give the config option to the user to allow that for different threat models, given that they know what they're doing.
  3. Even if the user allows 2, it still has the option to have strict CHERI security and avoid pointer aliasing by having non-overlapping address spaces for mutually distrusting (and untrustworthy compartments).

That way, we're making sure both seL4 and CHERI communities make the best use of this core CHERI-seL4 PR. We give the user the option to implement the most strict and/or informed flexible security policy based on their threat models. Where's the security gap here or the disagreement? What are the blockers?

There's no way to use a mark and sweep approach for revoking capabilities if they can move across an address space for example.

There are CHERI approaches to do so and to revoke propagated capabilities, however, this is a temporal safety issue that's beyond the scope of this project and isn't required to still have spatial memory safety. I listed it as future work, and this PR does allow it to be implemented by exporting the relative CHERI PTE bits. CheriBSD didn't have that for quite sometime, so it's totally fine for CHERI-seL4 to offer some spatial memory security and not address all CHERI issues we can think of immediately before this RFC/PR gets accepted; given that we know there are solutions for them and we don't forbid them. As I mentioned, this is just an incremental step to allow sensible CHERI use-cases on top with flexible security policies based on user's threat model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Follow 2), while trying to be more intentional and use seL4_Register in shared code (non-CHERI and CHERI) to differentiate between using integers, pointers, messages registers, HW registers, etc.

I think option 3 should be considered:

  • Specializing the seL4_Word to seL4_Register as a typedef is a pattern used elsewhere in libsel4 already,
  • It does distinguish that the type of the field is "register" based and does map to register transfers by the kernel,
  • The type change doesn't create any ABI break in non-CHERI architectures,
  • It shouldn't break existing C client code because all supported architectures have the same original type.
  • It gives a way for existing C code to be ported to CHERI aware while still being able to work with non-CHERI arch targets.

This PR does all of that.

One possible downside is that non-C languages that generate language bindings via parsing the existing C headers may experience a breakage (eg a Rust bindgen approach), but we could ask about this when proposing the seL4_Register addition (it would need a small RFC).

@@ -811,7 811,7 @@
<description>
An invocation for setting the Thread Local Storage (TLS) base address. This ensures that across all platforms, the TLSBase register is viewed as being completely mutable, just like all of the general purpose registers, even on platforms where modification is a privileged operation.
</description>
<param dir="in" name="tls_base" type="seL4_Word"
<param dir="in" name="tls_base" type="seL4_Register"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maintain a list of all the invocation and syscalls that are having their interfaces changed to transfer hardware caps?
My understanding so far is that the following change:

  • TCBSetTLSBase,
  • All invocations that take or return a seL4_UserContext
  • seL4_SetTLSBase

And what isn't expected to change:

  • VSpace operations that use a virtual address for mapping or cache operations.
  • seL4_TCB_SetIPCBuffer

(I assume that the seL4_ARM_SMC cap is out of scope at this stage too).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I tried to separate such changes into separate "ABI" commits. seL4_ARM_SMC and seL4_ARM_VCPU_WriteRegs, seL4_ARM_VCPU_ReadRegs and future RISC-V hypervisor syscalls, etc are out of scope of this PR, but will need to change when we support hypervisors. Otherwise, what you mentioned are the minimum ones that were required to support passing purecap sel4test with the default configs.

@@ -195,9 195,9 @@ void doReplyTransfer(tcb_t *sender, tcb_t *receiver, cte_t *slot, bool_t grant)
#endif
}

void doNormalTransfer(tcb_t *sender, word_t *sendBuffer, endpoint_t *endpoint,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether transferring HWCaps in an IPC buffer to a thread of a different VSpace is allowed is also something that probably needs to be specified in the RFC. I think that if the restriction is added, it would also require adding more invocations that allow transferring HWCapabilities across an address space.

My opinion is that conceptually, if transferring HWCaps across an address space is prohibited, then it follows that performing TCBWriteRegisters when the target TCB is in a different address space should also be prohibited unless the caller can present sufficient authority - control of the target VSpace via a Vspace capability, which would introduce new invocation variations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether transferring HWCaps in an IPC buffer to a thread of a different VSpace is allowed is also something that probably needs to be specified in the RFC. I think that if the restriction is added, it would also require adding more invocations that allow transferring HWCapabilities across an address space.

I touched a bit on that in the previous reply.

My opinion is that conceptually, if transferring HWCaps across an address space is prohibited, then it follows that performing TCBWriteRegisters when the target TCB is in a different address space should also be prohibited unless the caller can present sufficient authority - control of the target VSpace via a Vspace capability, which would introduce new invocation variations.

Yeah I agree. This is also discussed in the other PR, and I suggested using the CHERI sealing mechanism for that, which might get us away with not having to introduce new seL4 invocations. This PR is an incremental effort, however. It's not aiming to have the most secure/restrictive CHERI solution yet, but should allow this to happen. I am happy to explore and trial with sealing if you think this (i.e., prohibit TCBWriteRegisters and friends from de-referencing inter-AS CHERI caps) is a blocker and needs to be addressed now. But IMO, it's not too bad for this PR to say, "if you build with CHERI and want secure CHERI behaviour, don't overlap your user processes address spaces, until/unless you/we provide better fine-grained solution, otherwise you should know what you're doing at your own risk"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the goals of this PR for merging or to serve as an informal checkpoint of progress? I assume that a goal of the RFC is to achieve consensus on design of CHERI support for seL4. So we eventually need to address this issue before the RFC can produce a result that has code landed in the seL4 repo. But having this PR in its current state is still useful to be able to use experimentally with user level projects.

I guess this PR should maybe be given the draft status too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both actually. I made the PR in a good-enough shape to be merged as an incremental core base that can, in its current state, run complete purecap sel4test/sel4bench userspace (to provide spatial memory safety as proposed, but not temporal memory-safety as it's beyond the scope of this project). It doesn't prohibit future seL4 and/or CHERI use cases or experimentations, while allowing for flexible security policies at user-level (e.g., completely disallowing tagged CHERI caps in the IPC buffer, avoid CHERI pointer aliasing by asking the user to not have overlapping AS, having intermediate trusted servers, etc). But that's just my opinion.

What I'd like to get out of this RFC PR and work with you and the community towards is identifying:

  1. Any CHERI design blockers that must be addressed before this RFC/PR gets accepted.
  2. Whether the verification-visible changes in non-CHERI shared code are minimal enough to be tolerated.
  3. Future optional CHERI design trade-offs that aren't blockers for this PR to get merged, but important to think about.
  4. anything else?

I'd put sealing and associating endpoint permissions with CHERI under 3, but happy to discuss.

typedef __SIZE_TYPE__ ptraddr_t;

void *__capability cheri_seal_cap(void *__capability unsealed_cap, size_t otype);
void *__capability cheri_unseal_cap(void *__capability unsealed_cap);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is authority for capability sealing and unsealing expected to be managed in seL4? What authority is needed to be able to seal a capability?

Capability sealing The CSeal and CUnseal instructions seal or unseal capabilities given a
suitable authorizing capability. Sealed capabilities allow software to implement encap-
sulation, such as is required for software compartmentalization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is authority for capability sealing and unsealing expected to be managed in seL4? What authority is needed to be able to seal a capability?

It currently isn't expected to be used in seL4 kernel itself (similar to CheriBSD). But it's just provided as CHERI helpers here in case we do in the future. The user, however, can use sealing as they wish (and there are many user-level compartmentalisation use-cases for it).

To answer your question, in order to seal/unseal a capability, you'd have to have a "sealer/unsealer" capability that:

  1. Has sealing/unsealing permissions.
  2. Has the same otype as the capability you're sealing/unsealing.

Capability sealing The CSeal and CUnseal instructions seal or unseal capabilities given a
suitable authorizing capability. Sealed capabilities allow software to implement encap-
sulation, such as is required for software compartmentalization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably need to look more at some examples. Does a CHERI cap with sealing/unsealing permissions and no otype set represent authority to craft sealing/unsealing caps for any otype? Is that something that is controlled by the OS in cheriBSD or would it be given to the program's runtime? Thinking about how this could work on seL4. I think it's only relevant at this stage if there's suggestions of using sealing/unsealing to transfer CHERI pointers via IPC buffers.

Copy link
Contributor Author

@heshamelmatary heshamelmatary Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably need to look more at some examples. Does a CHERI cap with sealing/unsealing permissions and no otype set represent authority to craft sealing/unsealing caps for any otype?

Yes

Is that something that is controlled by the OS in cheriBSD or would it be given to the program's runtime?

It's given to the program. CheriBSD doesn't manage or use that.

Thinking about how this could work on seL4. I think it's only relevant at this stage if there's suggestions of using sealing/unsealing to transfer CHERI pointers via IPC buffers.

Sealing and their invocations is still pretty much researchy. CheriBSD just lets the user do what they want with it, but we don't have production-ready uses for it AFAIK, and it's not used by default apart from sentries (just a special case for CFI). The sealing mechanism itself (e.g., otypes, invocations, unsealing, etc) also differs between Morello, our CHERI-RISC-V, CHERIoT, and the to-be-standardised CHERI-RISC-V draft. I wouldn't strongly bound this PR to it, but we could keep thinking about solutions using it for future use, and probably make suggestions to something like CHERI-RISC-V ISA draft to better accommodate CHERI-seL4 needs if we think of any.

@Indanz
Copy link
Contributor

Indanz commented Nov 14, 2024

So I think we should try to have high-level discussions on seL4/rfcs#21 and only discuss implementation details here. We have unresolved high-level decisions that influence the implementation a lot.

One question I have is how this PR compares to Sid's work 2 years ago. The diff seems much smaller there:

https://gitlab.com/icecap-project/seL4/-/compare/pre-morello-upstream-sel4...morello?from_project_id=22234764

@heshamelmatary
Copy link
Contributor Author

So I think we should try to have high-level discussions on seL4/rfcs#21 and only discuss implementation details here. We have unresolved high-level decisions that influence the implementation a lot.

Yeah I don't really mind discussing here or there. So we could just lift the 4 points or a subset of them to have there.

One question I have is how this PR compares to Sid's work 2 years ago. The diff seems much smaller there:

https://gitlab.com/icecap-project/seL4/-/compare/pre-morello-upstream-sel4...morello?from_project_id=22234764

To give a simple answer, Sid's work is doing the bare minimum that's only allowing hybrid userspace. It was a great first step towards what we have now during a 3 month internship, as indicated by him [1]. We've investigated whether we can build on it to provide what we wanted/achieved in this PR(clean generic support for all-purecap userspace), and it wasn't sufficient. I've been in contact with Sid as well and he himself even wants to use this PR along with our purecap kernel and userspace.

[1] https://sid-agrawal.ca/sel4,/cheri,/morello,/aarch64,/cheribsd/2023/01/01/seL4-CHERI.html

@kent-mcleod
Copy link
Member

Yeah I don't really mind discussing here or there. So we could just lift the 4 points or a subset of them to have there.

I'm happy to redirect design-level discussions back to the RFC, but when we last caught up, we concluded that we wanted to see this PR before discussing the RFC more, so it's natural that we're using the current PR as the basis for discussing the issues.

*/

/*
* This file is copied from CHERI-LLVM's cheri_init_globals.h and modified to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be modified? Can't you use the compiler-provided header instead. It works for all other usecases? Adding a copy of this code seems like a bad idea.

If the compiler provided header does not work, we should fix that in the compiler and just require a new enough compiler for CHERI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arichardson It's because seL4 builds standalone and doesn't use or rely on any compiler headers (or standard libraries)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses -nostdinc -ffreestanding -fno-common -nostdlib when building the kernel and doesn't link to the compiler library. But as long as seL4 CHERI support is unverified, tweaking that a bit is preferred to duplicating compiler code.

You did add seL4 specific code to that file though, so it's not a pure copy either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I added some helper functions along with some bootstrapping to initialise the CHERI-seL4 caps to that file. I tried to only contain such CHERI helpers in just two self-contained files (cheri.h and cheri.c). Also I added the option to do either DYNAMIC or capable relocs for Morello, which isn't there in the compiler toolchain file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What FreeBSD does to still allow access to the compiler-provided headers is add Clang's resource dir as a explicit include directory, which is quite easy in CMake compared to the complicated makefile logic FreeBSD uses (https://github.com/freebsd/freebsd-src/blob/4b65481ac68a99122991fd73583d071c5e27c65a/share/mk/bsd.sys.mk#L285).

Something like

if (CHERI)
  execute_process(COMMAND ${CMAKE_C_COMPILER} --print-resource-directory OUTPUT_VARIABLE _clang_resource_dir)
  add_compile_options(-idirafter ${_clang_resource_dir}/include)
endif()

@lsf37 lsf37 added the new-platform platform ports label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-platform platform ports
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants