Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Our dataset pipelines take raw HTML from the descriptions of some datasets which means that they can often be littered with various tags that mess with the styling when outputted on opendata.scot (e.g. <h1> or any tag with a style) property. This can sometimes produce unexpected results like large text being outputted from header tags.
Expected behavior
Some of these styles or tags could be simplified (e.g. we could convert all header tags to just be bold and underlined)
Screenshots

Example from https://opendata.scot/datasets/dundee+city+council-housing+available+now/
Hardware and software used
N/A
Additional context
Whilst unlikely to happen, I have concerns that this could leave us vulnerable to XSS (cross-site scripting) attacks if we ended up loading JavaScript <script> tags in the description of datasets we pull from other websites. See this relevant article where someone registered an XSS attack payload as a company name on Companies House which had the knock on effect of XSSing websites that consumed data from the Companies House API: https://www.theregister.com/2020/10/30/companies_house_xss_silliness/
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Our dataset pipelines take raw HTML from the descriptions of some datasets which means that they can often be littered with various tags that mess with the styling when outputted on opendata.scot (e.g.
<h1>or any tag with astyle) property. This can sometimes produce unexpected results like large text being outputted from header tags.Expected behavior
Some of these styles or tags could be simplified (e.g. we could convert all header tags to just be bold and underlined)
Screenshots

Example from https://opendata.scot/datasets/dundee+city+council-housing+available+now/
Hardware and software used
N/A
Additional context
Whilst unlikely to happen, I have concerns that this could leave us vulnerable to XSS (cross-site scripting) attacks if we ended up loading JavaScript
<script>tags in the description of datasets we pull from other websites. See this relevant article where someone registered an XSS attack payload as a company name on Companies House which had the knock on effect of XSSing websites that consumed data from the Companies House API: https://www.theregister.com/2020/10/30/companies_house_xss_silliness/