Skip to content

feat(core/str): implement str::first_char, last_char, split_first_char, split_last_char#154544

Open
F0RREALTHO wants to merge 2 commits intorust-lang:mainfrom
F0RREALTHO:feat/str-first-last-char
Open

feat(core/str): implement str::first_char, last_char, split_first_char, split_last_char#154544
F0RREALTHO wants to merge 2 commits intorust-lang:mainfrom
F0RREALTHO:feat/str-first-last-char

Conversation

@F0RREALTHO
Copy link
Copy Markdown

@F0RREALTHO F0RREALTHO commented Mar 29, 2026

View all comments

Tracking issue: #154393

What this PR does

Implements the four str methods proposed in the tracking issue:

  • str::first_char(&self) -> Option<char>
  • str::last_char(&self) -> Option<char>
  • str::split_first_char(&self) -> Option<(char, &str)>
  • str::split_last_char(&self) -> Option<(char, &str)>

All methods are const fn and correctly handle all UTF-8 character widths (1, 2, 3, and 4 bytes).

Implementation notes

  • UTF-8 decoding is done manually using utf8_char_width to stay const-compatible
  • split_first_char and split_last_char are the core methods; first_char and last_char delegate to them
  • All methods have #[must_use] and #[inline]

Tests

  • Runtime tests covering empty strings, ASCII, 2-byte, 3-byte, and 4-byte characters
  • Compile-time const evaluation tests proving these work as const fn

Open question

The tracking issue has an unresolved question about the _char suffix. This PR uses _char as proposed in the ACP, but I'm happy to rename if the team decides otherwise.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 29, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Mar 29, 2026

r? @jhpratt

rustbot has assigned @jhpratt.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: @scottmcm, libs
  • @scottmcm, libs expanded to 8 candidates
  • Random selection from Mark-Simulacrum, jhpratt, scottmcm

@rustbot

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@F0RREALTHO F0RREALTHO force-pushed the feat/str-first-last-char branch from 504171b to e3b58c0 Compare March 29, 2026 09:58
@rustbot

This comment has been minimized.

@F0RREALTHO F0RREALTHO force-pushed the feat/str-first-last-char branch from e3b58c0 to 8a6c194 Compare March 29, 2026 10:05
@rust-log-analyzer

This comment has been minimized.

@F0RREALTHO F0RREALTHO force-pushed the feat/str-first-last-char branch from 8a6c194 to 3b44c45 Compare March 29, 2026 11:44
@GrigorenkoPV
Copy link
Copy Markdown
Contributor

Not sure if the split_* methods should actually be #[inline]

@F0RREALTHO
Copy link
Copy Markdown
Author

Not sure if the split_* methods should actually be #[inline]

That's a fair point. I included #[inline] to match the existing split_at and chars methods, but I agree these have more logic branches. I'll leave them for now and happy to remove them if @jhpratt thinks the binary bloat outweighs the inlining benefit here.

@GrigorenkoPV
Copy link
Copy Markdown
Contributor

GrigorenkoPV commented Mar 29, 2026

Fixes #154393

This will make GitHub close the tracking issue after the PR gets merged, I think it should be "Tracking issue: #154393" instead.

Anyways, congrats on the first contribution and welcome!

@F0RREALTHO F0RREALTHO force-pushed the feat/str-first-last-char branch from 3b44c45 to 5622b1d Compare March 29, 2026 12:48
@F0RREALTHO
Copy link
Copy Markdown
Author

This will make GitHub close the tracking issue after the PR gets merged, I think it should be "Tracking issue: #154393" instead.

Anyways, congrats on the first contribution and welcome!

Thank you for the warm welcome! That makes total sense. I’ve updated the PR description to use Tracking issue: #154393 so it stays open as intended.

I've also just pushed a refactor that incorporates your suggestions for the decode_utf8_char helper, slice pattern matching, and bytes.first() usage. Really appreciate the guidance on making this more idiomatic!

@rust-log-analyzer

This comment has been minimized.

…st_char

Implements the API proposed in the tracking issue.
@F0RREALTHO F0RREALTHO force-pushed the feat/str-first-last-char branch from 5622b1d to 9fe51b5 Compare March 29, 2026 12:59
@krtab
Copy link
Copy Markdown
Contributor

krtab commented Mar 29, 2026

I'm sorry to ask but are you using an LLM to generate code and/or interact in this PR? Both your messages and code seem possibly LLM generated to me, so I'm just asking to be sure.

In any case, there is plenty of UTF8 decoding machinery in core already, and the chars iterator, may I ask why you are not using those? If it is because of constness have you tried extracting the part of them (for example the logic of next_code_point) that can be made const into a common function?

}

#[inline]
const fn decode_utf8_char(bytes: &[u8]) -> char {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function must be marked unsafe as it is unsafe to call on anything but the UTF8 encoding of a single char

@F0RREALTHO
Copy link
Copy Markdown
Author

Thank you for contacting me, @krtab. I would like to make it very clear that I am a second-year math student who is new to Rust and contributing to Open Source. I also used some AI support to learn how to work with the code base. But I've run every test locally, so I know what didn't work.

Thank you for your technical advice on:

  1. Creating an unsafe decode_utf8_char function
  2. Comparing the current implementation of next_code_point with a const-compatible version

I'll resolve these problems and upload again tomorrow. Finally, I want to thank @GrigorenkoPV for his previous comments!

@krtab
Copy link
Copy Markdown
Contributor

krtab commented Mar 29, 2026

I am not against AI per se (I myself use it extensively to understand and dive into new codebases) and I commend you for willing to contribute to open source, especially so early in your career. If however, as I suspect, your PR is mostly vibe coded (which your comment has not really convinced me to the contrary), I think you'll have a hard time finding someone to review and accept it given that -- to the best of my knowledge -- most reviewers here find interacting with someone who merely serves as a proxy to an LLM agent rather unpleasant.

#[must_use]
#[unstable(feature = "str_first_last_char", issue = "154393")]
#[rustc_const_unstable(feature = "str_first_last_char", issue = "154393")]
pub const fn split_last_char(&self) -> Option<(char, &str)> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason for returning (char, &str) instead of (&str, char) here, since the latter would mirror the order in the original string?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ordering was (char, &str) used to provide symmetry with split_first_char, so both would return the first character followed by the rest of the string...aslo could argue that having (&str, char) for split_last_char is better because it preserves the natural order of a string.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it as-is for now. I agree that it probably should be swapped, but that's not what the ACP was for.

@F0RREALTHO F0RREALTHO requested a review from krtab March 30, 2026 10:00
// above len
check_many("hello", 5..=10, 5);
}
const _: () = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this done as a const assertion?

Comment on lines +1066 to +1086
/// # Safety
///
/// `bytes` must be the UTF-8 encoding of exactly one valid Unicode scalar value.
#[inline]
const unsafe fn decode_utf8_char(bytes: &[u8]) -> char {
let ch = match bytes {
&[a] => a as u32,
&[a, b] => ((a & 0x1F) as u32) << 6 | (b & 0x3F) as u32,
&[a, b, c] => ((a & 0x0F) as u32) << 12 | ((b & 0x3F) as u32) << 6 | (c & 0x3F) as u32,
&[a, b, c, d] => {
((a & 0x07) as u32) << 18
| ((b & 0x3F) as u32) << 12
| ((c & 0x3F) as u32) << 6
| (d & 0x3F) as u32
}
// SAFETY: All valid UTF-8 sequences are covered above; this arm is unreachable for valid input.
_ => unsafe { crate::hint::unreachable_unchecked() },
};
// SAFETY: the caller must ensure `bytes` contains a valid UTF-8 sequence.
unsafe { char::from_u32_unchecked(ch) }
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved to either a standalone method or (ideally) a method on char? It would be pub(crate), but this doesn't feel like the best of places for such a method.

/// # Safety
///
/// `bytes` must be the UTF-8 encoding of exactly one valid Unicode scalar value.
#[inline]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that is probably best to not be declared #[inline], as it's not something that can trivially be optimized out and is shared between a few methods.

const unsafe fn decode_utf8_char(bytes: &[u8]) -> char {
let ch = match bytes {
&[a] => a as u32,
&[a, b] => ((a & 0x1F) as u32) << 6 | (b & 0x3F) as u32,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these hex numbers coming from? The way UTF-8 is encoded?

#[must_use]
#[unstable(feature = "str_first_last_char", issue = "154393")]
#[rustc_const_unstable(feature = "str_first_last_char", issue = "154393")]
pub const fn split_last_char(&self) -> Option<(char, &str)> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it as-is for now. I agree that it probably should be swapped, but that's not what the ACP was for.

@jhpratt
Copy link
Copy Markdown
Member

jhpratt commented Apr 1, 2026

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 1, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 1, 2026

Reminder, once the PR becomes ready for a review, use @rustbot ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants