Skip to content

Issue #19617: added Checks to cover special characters OpenJDK Style §2 - Java Source Files#19715

Open
Anushreebasics wants to merge 1 commit intocheckstyle:masterfrom
Anushreebasics:special
Open

Issue #19617: added Checks to cover special characters OpenJDK Style §2 - Java Source Files#19715
Anushreebasics wants to merge 1 commit intocheckstyle:masterfrom
Anushreebasics:special

Conversation

@Anushreebasics
Copy link
Copy Markdown
Contributor

fixes #19617

Summary

  • Added special-character checks to OpenJDK style config: [openjdk_checks.xml:66]

  • Added IllegalTokenText with the OpenJDK escape-preference pattern

  • Added AvoidEscapedUnicodeCharacters

  • Updated OpenJDK style coverage page so mapping is accurate: [openjdk_style.xml:141]

  • Section 2 now explicitly includes charset=US-ASCII

  • Section 2.1 now references IllegalTokenText and AvoidEscapedUnicodeCharacters

  • Added new OpenJDK Chapter 2 integration tests:
    [SpecialCharactersTest.java:1]
    [InputSpecialCharactersValid.java:1]
    [InputSpecialCharactersInvalid.java:1]

@Anushreebasics Anushreebasics force-pushed the special branch 3 times, most recently from 22a933c to 936f6fc Compare April 18, 2026 11:59
@Anushreebasics
Copy link
Copy Markdown
Contributor Author

@romani @vivek-0509 please review

@Anushreebasics
Copy link
Copy Markdown
Contributor Author

@vivek-0509 please review

@Anushreebasics Anushreebasics changed the title Issue #19617: added Checks to cover special characters Issue #19617: added Checks to cover special characters OpenJDK Style §2 - Java Source Files Apr 21, 2026
@Anushreebasics
Copy link
Copy Markdown
Contributor Author

@romani please review

Copy link
Copy Markdown
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Items

// violation above 'Consider using special escape sequence.'

private final String escapedLetter = "\u0041";
// violation above 'Unicode escape(s) usage should be avoided.'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what is wrong with this, it is good ASCII file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file being ASCII-encoded is not the issue here. The OpenJDK rule in section 2.1 also forbids escaped Unicode sequences in Java source, so "\u0041" is still a violation even though the file itself contains only ASCII bytes. I added a clarifying comment to make that intent explicit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please point to jdk spec it is written that escaped are forbidden.

Copy link
Copy Markdown
Contributor Author

@Anushreebasics Anushreebasics May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — per the Java Language Specification https://docs.oracle.com/javase/specs/jls/se17/html/jls-3.html#jls-3.3 , Unicode escapes are part of the language lexical rules and are permitted; the JLS only documents how they are processed and the cases that are compile-time errors (e.g. \u000A inside a literal). The rule I added is an OpenJDK/Code-style guideline to avoid escaped characters in source for readability/portability, not a language prohibition. If you’d like, I can add a link in the PR to the OpenJDK coding-style page that recommends avoiding escaped Unicode in source

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a link in the PR to the OpenJDK coding-style page that recommends avoiding escaped Unicode in source

yes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this implies that other white space characters (in, for instance, string and character literals) must be written in escaped form.
', ", , \t, \b, \r, \f, and \n should be preferred over corresponding octal (e.g. \047) or Unicode (e.g. \u0027) escaped characters.

so only very special set of unicode escapes are forbidden, please look at Google style , they do same/similar rule.

@Anushreebasics Anushreebasics force-pushed the special branch 2 times, most recently from 7b72fd8 to 02a55cb Compare May 3, 2026 06:46
@Anushreebasics
Copy link
Copy Markdown
Contributor Author

@romani please review

Copy link
Copy Markdown
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item:

Comment on lines +5 to +12
private final String escapedTab = "\011";
// violation above 'Consider using special escape sequence.'

private final String escapedLetter = "\u0041";
// ASCII bytes, but the unicode escape below is still a violation.
// violation above 'Unicode escape(s) usage should be avoided.'

}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add few more examles of escaped unicodes to Correct file, to show that not all unicodes are forbidden.


private final String escapedLetter = "\u0041";
// ASCII bytes, but the unicode escape below is still a violation.
// violation above 'Unicode escape(s) usage should be avoided.'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


    private final String escapedLetter = "\u0041";
    // ASCII bytes, but the unicode escape below is still a violation.
    // violation above 'Unicode escape(s) usage should be avoided.'

// violation above should right after javacode, and message should be different, please investigate why test it is not failing.

@romani
Copy link
Copy Markdown
Member

romani commented May 5, 2026

The short forms (e.g. \t) are commonly used and easier to recognize than the corresponding longer forms (\011, \u0009).

add exactly this tests.

\', \", \\, \t, \b, \r, \f, and \n should be preferred over corresponding octal (e.g. \047) or Unicode (e.g. \u0027) escaped characters.

add all of this to test code.

@Anushreebasics Anushreebasics force-pushed the special branch 2 times, most recently from c5f4575 to 4cf0f87 Compare May 5, 2026 14:50
@Anushreebasics
Copy link
Copy Markdown
Contributor Author

@romani please review

Copy link
Copy Markdown
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Items

private final char formFeed = '\f';
private final char carriageReturn = '\r';
private final char newLine = '\n';

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put here valid unicodes

// violation above 'special escape sequence'

private final char newLineOctal = '\012';
// violation above 'special escape sequence'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do extra comment to explain what should be used instead

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all invalid symbols

@romani
Copy link
Copy Markdown
Member

romani commented May 5, 2026

#19715 (comment)

All codes that referenced in jdk style should be tests.

@Anushreebasics Anushreebasics force-pushed the special branch 2 times, most recently from 90c20b0 to d97208d Compare May 5, 2026 16:42
@Anushreebasics
Copy link
Copy Markdown
Contributor Author

@romani please review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add checks for OpenJDK Style §2 - Java Source Files

2 participants