feat(core/str): implement str::first_char, last_char, split_first_char, split_last_char by F0RREALTHO · Pull Request #154544 · rust-lang/rust

F0RREALTHO · 2026-03-29T09:37:59Z

View all comments

Tracking issue: #154393

What this PR does

Implements the four str methods proposed in the tracking issue:

str::first_char(&self) -> Option<char>
str::last_char(&self) -> Option<char>
str::split_first_char(&self) -> Option<(char, &str)>
str::split_last_char(&self) -> Option<(char, &str)>

All methods are const fn and correctly handle all UTF-8 character widths (1, 2, 3, and 4 bytes).

Implementation notes

UTF-8 decoding is done manually using utf8_char_width to stay const-compatible
split_first_char and split_last_char are the core methods; first_char and last_char delegate to them
All methods have #[must_use] and #[inline]

Tests

Runtime tests covering empty strings, ASCII, 2-byte, 3-byte, and 4-byte characters
Compile-time const evaluation tests proving these work as const fn

Open question

The tracking issue has an unresolved question about the _char suffix. This PR uses _char as proposed in the ACP, but I'm happy to rename if the team decides otherwise.

rustbot · 2026-03-29T09:38:04Z

r? @jhpratt

rustbot has assigned @jhpratt.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

Owners of files modified in this PR: @scottmcm, libs
@scottmcm, libs expanded to 8 candidates
Random selection from Mark-Simulacrum, jhpratt, scottmcm

GrigorenkoPV · 2026-03-29T12:11:47Z

Not sure if the split_* methods should actually be #[inline]

F0RREALTHO · 2026-03-29T12:18:57Z

Not sure if the split_* methods should actually be #[inline]

That's a fair point. I included #[inline] to match the existing split_at and chars methods, but I agree these have more logic branches. I'll leave them for now and happy to remove them if @jhpratt thinks the binary bloat outweighs the inlining benefit here.

library/core/src/str/mod.rs

GrigorenkoPV · 2026-03-29T12:23:36Z

Fixes #154393

This will make GitHub close the tracking issue after the PR gets merged, I think it should be "Tracking issue: #154393" instead.

Anyways, congrats on the first contribution and welcome!

F0RREALTHO · 2026-03-29T12:52:40Z

This will make GitHub close the tracking issue after the PR gets merged, I think it should be "Tracking issue: #154393" instead.

Anyways, congrats on the first contribution and welcome!

Thank you for the warm welcome! That makes total sense. I’ve updated the PR description to use Tracking issue: #154393 so it stays open as intended.

I've also just pushed a refactor that incorporates your suggestions for the decode_utf8_char helper, slice pattern matching, and bytes.first() usage. Really appreciate the guidance on making this more idiomatic!

…st_char Implements the API proposed in the tracking issue.

krtab · 2026-03-29T18:06:59Z

I'm sorry to ask but are you using an LLM to generate code and/or interact in this PR? Both your messages and code seem possibly LLM generated to me, so I'm just asking to be sure.

In any case, there is plenty of UTF8 decoding machinery in core already, and the chars iterator, may I ask why you are not using those? If it is because of constness have you tried extracting the part of them (for example the logic of next_code_point) that can be made const into a common function?

krtab · 2026-03-29T18:25:59Z

library/core/src/str/mod.rs

    }

+    #[inline]
+    const fn decode_utf8_char(bytes: &[u8]) -> char {


This function must be marked unsafe as it is unsafe to call on anything but the UTF8 encoding of a single char

F0RREALTHO · 2026-03-29T19:46:28Z

Thank you for contacting me, @krtab. I would like to make it very clear that I am a second-year math student who is new to Rust and contributing to Open Source. I also used some AI support to learn how to work with the code base. But I've run every test locally, so I know what didn't work.

Thank you for your technical advice on:

Creating an unsafe decode_utf8_char function
Comparing the current implementation of next_code_point with a const-compatible version

I'll resolve these problems and upload again tomorrow. Finally, I want to thank @GrigorenkoPV for his previous comments!

krtab · 2026-03-29T20:34:33Z

I am not against AI per se (I myself use it extensively to understand and dive into new codebases) and I commend you for willing to contribute to open source, especially so early in your career. If however, as I suspect, your PR is mostly vibe coded (which your comment has not really convinced me to the contrary), I think you'll have a hard time finding someone to review and accept it given that -- to the best of my knowledge -- most reviewers here find interacting with someone who merely serves as a proxy to an LLM agent rather unpleasant.

juntyr · 2026-03-30T05:19:17Z

library/core/src/str/mod.rs

+    #[must_use]
+    #[unstable(feature = "str_first_last_char", issue = "154393")]
+    #[rustc_const_unstable(feature = "str_first_last_char", issue = "154393")]
+    pub const fn split_last_char(&self) -> Option<(char, &str)> {


Is there a particular reason for returning (char, &str) instead of (&str, char) here, since the latter would mirror the order in the original string?

the ordering was (char, &str) used to provide symmetry with split_first_char, so both would return the first character followed by the rest of the string...aslo could argue that having (&str, char) for split_last_char is better because it preserves the natural order of a string.

Let's keep it as-is for now. I agree that it probably should be swapped, but that's not what the ACP was for.

jhpratt · 2026-04-01T02:32:36Z

library/alloctests/tests/str.rs

    // above len
    check_many("hello", 5..=10, 5);
 }
+const _: () = {


Why is this done as a const assertion?

jhpratt · 2026-04-01T02:43:31Z

library/core/src/str/mod.rs

+    /// # Safety
+    ///
+    /// `bytes` must be the UTF-8 encoding of exactly one valid Unicode scalar value.
+    #[inline]
+    const unsafe fn decode_utf8_char(bytes: &[u8]) -> char {
+        let ch = match bytes {
+            &[a] => a as u32,
+            &[a, b] => ((a & 0x1F) as u32) << 6 | (b & 0x3F) as u32,
+            &[a, b, c] => ((a & 0x0F) as u32) << 12 | ((b & 0x3F) as u32) << 6 | (c & 0x3F) as u32,
+            &[a, b, c, d] => {
+                ((a & 0x07) as u32) << 18
+                    | ((b & 0x3F) as u32) << 12
+                    | ((c & 0x3F) as u32) << 6
+                    | (d & 0x3F) as u32
+            }
+            // SAFETY: All valid UTF-8 sequences are covered above; this arm is unreachable for valid input.
+            _ => unsafe { crate::hint::unreachable_unchecked() },
+        };
+        // SAFETY: the caller must ensure `bytes` contains a valid UTF-8 sequence.
+        unsafe { char::from_u32_unchecked(ch) }
+    }


Can this be moved to either a standalone method or (ideally) a method on char? It would be pub(crate), but this doesn't feel like the best of places for such a method.

jhpratt · 2026-04-01T02:46:53Z

library/core/src/str/mod.rs

+    /// # Safety
+    ///
+    /// `bytes` must be the UTF-8 encoding of exactly one valid Unicode scalar value.
+    #[inline]


I do agree that is probably best to not be declared #[inline], as it's not something that can trivially be optimized out and is shared between a few methods.

jhpratt · 2026-04-01T02:47:37Z

library/core/src/str/mod.rs

+    const unsafe fn decode_utf8_char(bytes: &[u8]) -> char {
+        let ch = match bytes {
+            &[a] => a as u32,
+            &[a, b] => ((a & 0x1F) as u32) << 6 | (b & 0x3F) as u32,


Where are these hex numbers coming from? The way UTF-8 is encoded?

jhpratt · 2026-04-01T02:49:09Z

library/core/src/str/mod.rs

+    #[must_use]
+    #[unstable(feature = "str_first_last_char", issue = "154393")]
+    #[rustc_const_unstable(feature = "str_first_last_char", issue = "154393")]
+    pub const fn split_last_char(&self) -> Option<(char, &str)> {


Let's keep it as-is for now. I agree that it probably should be swapped, but that's not what the ACP was for.

jhpratt · 2026-04-01T02:49:56Z

@rustbot author

rustbot · 2026-04-01T02:50:01Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

rustbot assigned jhpratt Mar 29, 2026

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 29, 2026

This comment has been minimized.

Sign in to view

F0RREALTHO force-pushed the feat/str-first-last-char branch from 504171b to e3b58c0 Compare March 29, 2026 09:58

This comment has been minimized.

Sign in to view

F0RREALTHO force-pushed the feat/str-first-last-char branch from e3b58c0 to 8a6c194 Compare March 29, 2026 10:05

This comment has been minimized.

Sign in to view

F0RREALTHO force-pushed the feat/str-first-last-char branch from 8a6c194 to 3b44c45 Compare March 29, 2026 11:44

GrigorenkoPV reviewed Mar 29, 2026

View reviewed changes

library/core/src/str/mod.rs Outdated Show resolved Hide resolved

GrigorenkoPV reviewed Mar 29, 2026

View reviewed changes

library/core/src/str/mod.rs Outdated Show resolved Hide resolved

F0RREALTHO force-pushed the feat/str-first-last-char branch from 3b44c45 to 5622b1d Compare March 29, 2026 12:48

This comment has been minimized.

Sign in to view

feat(core/str): add first_char, last_char, split_first_char, split_la…

9fe51b5

…st_char Implements the API proposed in the tracking issue.

F0RREALTHO force-pushed the feat/str-first-last-char branch from 5622b1d to 9fe51b5 Compare March 29, 2026 12:59

krtab suggested changes Mar 29, 2026

View reviewed changes

juntyr reviewed Mar 30, 2026

View reviewed changes

fix: mark decode_utf8_char as unsafe with safety docs

7d12d86

F0RREALTHO requested a review from krtab March 30, 2026 10:00

jhpratt reviewed Apr 1, 2026

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 1, 2026

Uh oh!

Conversation

F0RREALTHO commented Mar 29, 2026 • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Implementation notes

Tests

Open question

Uh oh!

rustbot commented Mar 29, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

GrigorenkoPV commented Mar 29, 2026

Uh oh!

F0RREALTHO commented Mar 29, 2026

Uh oh!

Uh oh!

Uh oh!

GrigorenkoPV commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

F0RREALTHO commented Mar 29, 2026

Uh oh!

This comment has been minimized.

krtab commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

F0RREALTHO commented Mar 29, 2026

Uh oh!

krtab commented Mar 29, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhpratt commented Apr 1, 2026

Uh oh!

rustbot commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

F0RREALTHO commented Mar 29, 2026 •

edited by rustbot

Loading

GrigorenkoPV commented Mar 29, 2026 •

edited

Loading

krtab commented Mar 29, 2026 •

edited

Loading