Skip to content

Parse HTML properly in Scaladoc#25681

Open
SolalPirelli wants to merge 4 commits intoscala:mainfrom
dotty-staging:s/m6
Open

Parse HTML properly in Scaladoc#25681
SolalPirelli wants to merge 4 commits intoscala:mainfrom
dotty-staging:s/m6

Conversation

@SolalPirelli
Copy link
Copy Markdown
Contributor

Avoid basic stuff like letting <script> through.

Obviously Scaladoc content is in the hands of the user, but we should at least not do HTML with regexes.

Obligatory SO post: https://stackoverflow.com/a/1732454

How much have you relied on LLM-based tools in this contribution?

not

How was the solution tested?

new tests

lazy val commonBootstrappedSettings = commonDottySettings ++ Seq(
// To enable support of scaladoc and language-server projects you need to change this to true
bspEnabled := false,
bspEnabled := enableBspAllProjects,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason not to do this IMHO

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a very strong reason: you don't want your IDE to recompile every bootstrapped project from scratch every time you change one line in the compiler!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. If you deliberately enable this off-by-default option, why not recompile whatever's necessary in that case?

If there's an issue we should at the very least put it on the same plan as enableBspAllProjects and make it a variable that's easy to toggle without editing the build file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah never mind, I saw the removal of false and panicked. Using enableBspAllProjects seems accurate.

*/
class CaretTest extends BaseHtmlTest:

private def docHtml(cls: String, syntax: String = "markdown"): String =
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved one level up so the logic can be shared with the new tests

class ScriptWithSpaces

/** <script>alert('hello')</script> */
class FakeSafeScript
Copy link
Copy Markdown
Contributor Author

@SolalPirelli SolalPirelli Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the character in the comment that your browser probably doesn't render very well is the same char we internally use to mark "safe" HTML tags (why do we have such a char? because IMHO the entire scaladoc parsing stack is too hacky, and should be rewritten to be a single-pass parser... but that's for another day)

val CleanCommentLine =
new Regex("""(?:\s*\*\s?\s?)?(.*)""")

/** Dangerous HTML tags that should be replaced by something safer,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a dangerous use of dangerous, pun intended. Those tags were not the kind of tags one should call dangerous.

/** Safe HTML tags that can be kept. */
val SafeTags =
new Regex("""((&\w+;)|(&#\d+;)|(</?(abbr|acronym|address|area|a|bdo|big|blockquote|br|button|b|caption|cite|code|col|colgroup|dd|del|dfn|em|fieldset|form|hr|img|input|ins|i|kbd|label|legend|link|map|object|optgroup|option|param|pre|q|samp|select|small|span|strong|sub|sup|table|tbody|td|textarea|tfoot|th|thead|tr|tt|var)( [^>]*)?/?>))""")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed "safe" tags that are related to user input (why would anyone do this?) or considered terminally deprecated (e.g., acronym)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants