Skip to content

Architecture changes for enabling future development #1252

@mre

Description

@mre

Here's our current architecture:
image

In #185 we added preliminary support for anchor tags / fragments.

We discussed that supporting fragments in URLs (e.g. https://foo/bar.html#frag as in option 3 in the link types) won't be a small change.

We'd probably have to make few changes to the architecture, because right now there's no way for the check step to "ask" for all links of a given input or even just ask if a link occurs in a given input (e.g. "is https://foo/bar.html#frag valid?").

One way to go about it might be to fully decouple input handling from link checking (where inputs can be files or websites).

The fragment cache is a basic version of that, but it's limited to fragments. I think we need a bigger cache for all inputs we encounter and a central entity, which manages this cache. We can think of it as an abstraction on top of the network and the file system, purpose-built for our use-case.

It could lazy-load resources on demand and store the parsed information from inputs, which would be used by the rest of the system; so our parsed representation would be the ground-truth for the rest of the link checking. For each input, it would contain a big map of the URI of the input (i.e. the path or URL) and its parsed links/fragments.
It should be fully async, and we will need read/write access throughout the program's runtime.

Maybe this is even a graph problem, but I don't feel comfortable going down that route.
In any case, we will need a lot of discussions to come up with a solid design.

However we model it, a check to see if an input contains a link or fragment should be trivial from other parts of the program. We should not deal with ad-hoc resource-fetching within the checking code.

I'd be happy for any design feedback.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions