Architecture changes for enabling future development

Here's our current architecture:
<img width="1055" alt="image" src="https://github.com/lycheeverse/lychee/assets/175809/c79d4506-a0af-466c-93d7-3640c3158000">

In https://github.com/lycheeverse/lychee/issues/185 we added preliminary support for anchor tags / fragments.

We discussed that supporting fragments in URLs (e.g. `https://foo/bar.html#frag` as in option 3 in the link types) won't be a small change. 

We'd probably have to make few changes to the architecture, because right now there's no way for the `check` step to "ask" for all links of a given input or even just ask if a link occurs in a given input (e.g. "is `https://foo/bar.html#frag` valid?").

One way to go about it might be to fully decouple input handling from link checking (where inputs can be files or websites).

The fragment cache is a basic version of that, but it's limited to fragments. I think we need a bigger cache for all inputs we encounter and a central entity, which manages this cache. We can think of it as an abstraction on top of the network and the file system, purpose-built for our use-case.

It could lazy-load resources on demand and store the parsed information from inputs, which would be used by the rest of the system; so our parsed representation would be the ground-truth for the rest of the link checking. For each input, it would contain a big map of the URI of the input (i.e. the path or URL) and its parsed links/fragments.
It should be fully async, and we will need read/write access throughout the program's runtime.

Maybe this is even a graph problem, but I don't feel comfortable going down that route.
In any case, we will need a lot of discussions to come up with a solid design.

However we model it, a check to see if an input contains a link or fragment should be trivial from other parts of the program. We should not deal with ad-hoc resource-fetching within the checking code.

I'd be happy for any design feedback.
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Architecture changes for enabling future development #1252

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Architecture changes for enabling future development #1252

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions