Add digit separators to Jsonnet #760

seizethedave · 2024-06-23T15:11:43Z

This adds digit separators (1_000) to Jsonnet's numeric literals.

Companion to the same PR in C++ repo: google/jsonnet#1160

Reference issue with spec proposal: google/jsonnet#1155

coveralls · 2024-06-24T19:22:15Z

coverage: 68.206% (+0.06%) from 68.143%
when pulling f10caa0 on seizethedave:digitsep
into 2b4d753 on google:master.

coveralls · 2026-01-26T19:59:15Z

coverage: 44.297% (+0.1%) from 44.168%
when pulling a52ac8d on seizethedave:digitsep
into 6a5c085 on google:master.

johnbartholomew · 2026-01-26T21:06:25Z

internal/parser/lexer.go

+	// Run the postprocessor if the token kind has one defined.
+	if pp, ok := tokenKindPostprocessors[kind]; ok {
+		data = pp(data)
+	}


To be honest I think this is an unnecessary generalisation. There are various other tokens that already put edited/processed content into the token data field, but they just have the processing inline in the lexer code, and call emitFullToken directly.

Examples:

Chomping newlines from a text block:

go-jsonnet/internal/parser/lexer.go

Lines 786 to 794 in 604f4d7

var str string = cb.String()

if chompTrailingNl {

str = str[:len(str)-1]

}

l.emitFullToken(tokenStringBlock, str,

stringBlockIndent, stringBlockTermIndent)

l.resetTokenStart()

return nil

Removing the quotes from a string literal:

go-jsonnet/internal/parser/lexer.go

Lines 900 to 905 in 604f4d7

if r == '"' {

// Don't include the quotes in the token data

l.emitFullToken(tokenStringDouble, l.input[l.tokenStart+1:l.pos.byteNo-1], "", "")

l.resetTokenStart()

break

}

I think we can just do the same for lexNumber - currently it calls emitToken just before returning; it can process the token data and call emitFullToken instead.

Sounds good. I remember being surprised that this would be the first instance of needing to post-process a lexed token. I guess I didn't look hard enough.

johnbartholomew · 2026-01-27T16:18:25Z

Ok, I can go ahead and rebase+add-adjustments+merge this, unless you have further changes in flight.

seizethedave · 2026-01-27T17:49:05Z

👍 Thanks John, I welcome the assist. I have nothing in flight.

…onents See also the corresponding C++ jsonnet commit: google/jsonnet@82ebe7d There are some cases which are a little strange but lexically valid. - `1.2.3.4` lexically this tokenises as `1.2` DOT `3.4`, because a dot in the fractional or exponent part of a number is simply treated the same as any other possible terminating character (any character that isn't part of the valid number lexical syntax) - `1e2.34` lexically is `1e2` DOT `34` (same as the first case) - `1e2e34` lexically is `1e2` (number) `e34` (identifier) These behaviours are basically preserved/extrapolated in the case of digit separators, so for example `1_2.3_4.5_6` is lexically parsed as `12.34` DOT `56`. And `1e2_3e4` is lexically parsed as `1e23` (number), `e4` (identifier). These both look very confusing, but it probably doesn't matter because those token sequences are, I think, not valid syntactically so they'll just be rejected by the parser. Note that in JSON (and jsonnet), leading zeros are not allowed in numeric literals. This behaviour is explicitly kept with digit separators, so `0_5` is explicitly rejected. The alternatives are: - Treat underscore after an initial zero the same as any terminator character, so `0_5` lexes as tokens `0` followed by identifier `_5`. - Allow underscore, thereby breaking the no-leading-zeros rule, so `0_5` tokenises as `05`. Either option seems confusing, hence it seems better to explicitly reject an underscore after an initial zero.

…tors

seizethedave marked this pull request as ready for review July 6, 2024 20:26

DarkAngel35-commits approved these changes Dec 25, 2024

View reviewed changes

johnbartholomew marked this pull request as draft January 26, 2026 19:54

johnbartholomew marked this pull request as ready for review January 26, 2026 19:54

johnbartholomew mentioned this pull request Jan 26, 2026

Bring all the number lex tests into one table driven test #834

Merged

johnbartholomew force-pushed the digitsep branch 2 times, most recently from 31c770b to f10caa0 Compare January 26, 2026 20:57

johnbartholomew reviewed Jan 26, 2026

View reviewed changes

He-Pin mentioned this pull request Jan 27, 2026

port: Add digit separators databricks/sjsonnet#585

Open

seizethedave and others added 10 commits January 27, 2026 18:43

Initial separator lexing.

ac85b18

More tests. Some fail.

c700c00

Fix the test.

30ab4ea

Add exceptional test cases. Make case table less crazy.

1c972d0

Just use struct literals.

502cac2

Add a test for _123 lexing as identifier.

f664e6d

Simpler to not special-case consecutive _s.

449c302

add end to end tests for number literals with underscore digit separa…

a44ee94

…tors

inline processing of number text into lexNumber

a52ac8d

johnbartholomew force-pushed the digitsep branch from f10caa0 to a52ac8d Compare January 27, 2026 18:45

johnbartholomew merged commit a52ac8d into google:master Jan 27, 2026
9 checks passed

johnbartholomew mentioned this pull request Jan 27, 2026

Support underscores or other separators in numeric literals. google/jsonnet#1155

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add digit separators to Jsonnet #760

Add digit separators to Jsonnet #760

seizethedave commented Jun 23, 2024 •

edited

Loading

Uh oh!

coveralls commented Jun 24, 2024

Uh oh!

coveralls commented Jan 26, 2026 •

edited

Loading

Uh oh!

johnbartholomew Jan 26, 2026

Uh oh!

seizethedave Jan 26, 2026

Uh oh!

johnbartholomew commented Jan 27, 2026

Uh oh!

seizethedave commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	var str string = cb.String()
	if chompTrailingNl {
	str = str[:len(str)-1]
	}

	l.emitFullToken(tokenStringBlock, str,
	stringBlockIndent, stringBlockTermIndent)
	l.resetTokenStart()
	return nil

	if r == '"' {
	// Don't include the quotes in the token data
	l.emitFullToken(tokenStringDouble, l.input[l.tokenStart+1:l.pos.byteNo-1], "", "")
	l.resetTokenStart()
	break
	}

Add digit separators to Jsonnet #760

Add digit separators to Jsonnet #760

Conversation

seizethedave commented Jun 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Jun 24, 2024

Uh oh!

coveralls commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnbartholomew Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

seizethedave Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

johnbartholomew commented Jan 27, 2026

Uh oh!

seizethedave commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

seizethedave commented Jun 23, 2024 •

edited

Loading

coveralls commented Jan 26, 2026 •

edited

Loading