Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACP: add slice::split_once #102

Closed
Benjamin-L opened this issue Sep 11, 2022 · 5 comments
Closed

ACP: add slice::split_once #102

Benjamin-L opened this issue Sep 11, 2022 · 5 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@Benjamin-L
Copy link

Benjamin-L commented Sep 11, 2022

Proposal

Add slice::split_once and slice::rsplit_once methods, analogous to the existing str::split_once and str::rsplit_once methods.

Problem statement

When doing ad-hoc parsing of a format that isn't guaranteed to be valid unicode, it's often useful to split byte slices on the first occurrence a specific delimiter. There isn't currently an API that expresses this directly for byte slices, although there is for strings.

Motivation, use-cases

There are some examples in aprs-parser-rs crate, which is being refactored from treating its input data as strings to using byte-slices. APRS packets consist of a header and a body, separated by a b':' byte. This is currently being parsed like this:

let header_delimiter = s
    .iter()
    .position(|x| *x == b':')
    .ok_or_else(|| AprsError::InvalidPacket(s.to_owned()))?;
let (header, rest) = s.split_at(header_delimiter);
let body = &rest[1..];

Solution sketches

There are currently two options to do this in stable rust.

Using position to find the delimiter's index, then splitting it on that index and explicitly rejecting the first byte of the second slice (which contains the delimiter):

let v = b"first:second";
let split_index = v.iter().position(|&x| x == b':')?;
let (first, second) = v.split_at(split_index);
let second = &second[1..];

Using splitn:

let v = b"first:second";
let split = v.splitn(2, |&x| x == b':');
let first = split.next()?;
let second = split.next()?;

These options are okay, but not great. They're both relatively verbose and don't express the actual intention very directly. They also have the issue that mistakes aren't necessarily going to show up in the type system.

With strings, there is currently a split_once method, that handles this exact use case:

let v = "first:second";
let (first, second) = v.split_once(':')?;

A similar method could be added for slices:

pub fn split_once<F>(&self, pred: F) -> Option<(&[T], &[T])>
    where F: FnMut(&T) -> bool
{
    let index = self.iter().position(pred)?;
    Some((&self[..index], &self[index 1..]))
}

Along with an rsplit_once equivalent. I also think it might make sense to add split_once_mut and rsplit_once_mut, however those don't currently exist for str.

Links and related work

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

@Benjamin-L Benjamin-L added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Sep 11, 2022
@pitaj
Copy link

pitaj commented Oct 20, 2022

What about a more general form?

pub fn array_split<const N: usize>(&'a self, pred: F) -> Option<[&[T]; N]>
    where F: FnMut(&T) -> bool

Could be used the same way by requesting an array of length 2, but would allow for usage in cases where one wants to split exactly twice, or exactly thrice, etc

@Benjamin-L
Copy link
Author

I wouldn't necessarily be opposed to that, but I still think slice::split_once makes sense for symmetry with str::split_once if nothing else. When str::split_once was introduced, this kind of option was discussed, although at the time const generics weren't stable:

In theory, once const-generics are done, this functionality could be achieved without a dedicated method with

match s.splitn(delimier, 2).collect_array::<2>() {
  Some([prefix, suffix]) => todo!(),
  None => todo!(),
}

Even in that world, having a dedicated method seems clearer on the intention.

I think I agree that, even now that we have const generics, splitting into exactly two slices is often a special case, and a dedicated method for it is more clear.

@the8472
Copy link
Member

the8472 commented Oct 20, 2022

How does this differ from the things proposed in #69?

@Benjamin-L
Copy link
Author

How does this differ from the things proposed in #69?

#69 is about splitting a slice on a constant index to produce arrays. This proposal is for splitting on a delimiter.

@m-ou-se
Copy link
Member

m-ou-se commented May 18, 2023

This seems fine. Feel free to open a tracking issue and open a PR to rust-lang/rust to add it as an unstable feature.

@m-ou-se m-ou-se closed this as completed May 18, 2023
@m-ou-se m-ou-se added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

4 participants