Skip to content

Conversation

@ErichDonGubler
Copy link

This (currently WIP) PR proposes switching to ignore's parallel walker API. It's not complete yet (threads are still only set to 1), but I figured that I should start discussion with y'all here upstream before I got too carried away, since there's actually quite a few design considerations here, and I haven't even validated that this is something y'all want in the first place! :)

See individual commits for more details on changes. Note that this is based on #93, and commits specifically for this PR start with the commit titled refactor: make `Stats` multithread-friendly.

Do our best to limit visiting paths to at most once, in preparation for
allowing multiple paths to be specified in a search-and-replace
operation. If we didn't do this, multiple path cases like this would be
handled incorrectly:

```
$ cat a.txt
foo

$ # We want to replace `foo` with `foobar`...
ruplacer foo foobar a.txt a.txt
<snip>

$ # ...but we got `foobarbar` instead:
$ cat a.txt
foobarbar
```

It's important to note what the trade-off for deduplication is. For each
path, we now call `std::fs::canonicalize`, and put the canonicalized
path in a set built using the the [`patricia_tree` crate] for somewhat
efficiently storage.

Interestingly, the above is also a potential issue with following
symlinks, which has never been allowed before. Perhaps this change could
make it sufficiently safe to add an option for following symlinks?

[`patricia_tree` crate]: https://docs.rs/patricia_tree/0.3.1/
This will be necessary for when we shortly use the multi-threaded API
for `walkdir`, where we will need to use `&self` instead of `&mut self`
as the receiver for methods.
…h_file`

We need this for two interconnected reasons:

1. We won't be able to take by `&mut` _anything_ when using this method
   in the `walkdir` parallel API. This means at least changing the `&mut
   self` receiver to `&self`.
2. We will be adding a field in the next commit that won't be `Sync`.
   This only leaves field splitting as the minimum viable path for
   refactoring `patch_file`.
This takes care of one part of migrating to multiple threads but not
everything. See the large `NOTE` comment for further details on what
needs to be done still.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant