Skip to content

feat: Reprovide Sweep#1082

Closed
guillaumemichel wants to merge 145 commits intomasterfrom
reprovide-sweep
Closed

feat: Reprovide Sweep#1082
guillaumemichel wants to merge 145 commits intomasterfrom
reprovide-sweep

Conversation

@guillaumemichel
Copy link
Copy Markdown
Collaborator

@guillaumemichel guillaumemichel commented May 6, 2025

Note

This PR may be replaced by

Summary

Problem

Reproviding many keys to the DHT one by one is inefficient, because it requires a GetClosestPeers (or GCP) request for every key.

Current state

Currently, reprovides are managed in boxo/provider. Every ReprovideInterval (22h in Amino DHT), all keys matching the reprovide strategy are reprovided at once. The process is slightly different depending on whether the accelerated DHT client is enabled.

Default DHT client

All the keys are reprovided sequentially, using the go-libp2p-kad-dht Provide() method. This operation consists in finding the k closest peers to the given key, and then request them all to store the associated provider record.

The process is expensive because it requires a GCP for each key (opening approx. 20-30 connections). Timeouts due to unreachable peers make this process very long, resulting in a mean of ~10s in provide time (source: probelab.io 2025-06-13).

dht-publish-performance-overall

With 10 seconds per provide, a node using this process could reprovide less than 8'000 keys over the reprovide interval of 22h (using a single thread).

Accelerated DHT client (fullrt)

The accelerated DHT client periodically (every 1h) crawls the DHT swarm to cache the addresses of all discovered peers. It allows it to skip the GCP during the provide request, since it already knows the k closest peers and the associated multiaddrs.

Hence, the accelerated DHT client is able to provide much more keys during the reprovide interval compared with the default DHT client. However, crawling the DHT swarm is an expensive operation (networking, memory), and since all the keys are reprovided at once, the node will experience a bust period until all keys are reprovided.

Ideally, nodes wouldn't have to crawl the swarm to reprovide content, and the reprovide operation could be smoothed over time to avoid a bust during which the libp2p node is incapable of performing other actions.

Pooling Reprovides

If there are more keys to be reprovided than the number of nodes in the DHT swarm divided by the replication factor (k), then it means that there are at least two keys that will be provided to the exact same set of peers. This means that the number of GCP is less than the number of keys to reprovide.

For the Amino DHT, containing ~10k DHT servers and having a replication factor of 20, pooling reprovides becomes efficient starting from 500 keys.

Reprovide Sweep

The current process of reproviding all keys at once is bad because it creates a bust. In order to smooth the reprovide process, we can sweep the keyspace from left to right, in order to cover all peers over time. This consists of exploring keyspace regions, corresponding to a set of peers that are close to each other in the Kademlia XOR distance metric.

⚠️ The Kademlia keyspace in NOT linear

A keyspace region is explored using a few (typically 2-4 GCP) to discover all the peers it contains. A keyspace region can be identified by a Kademlia identifier prefix, the kademlia identifiers of all peers within this region start with the region's prefix.

Once a region is fully explored, all the keys matching the keyspace region's prefix can be allocated to this set of peers. No additional GCP is needed.

Implementation

This PR contains an implementation of the Reprovide Sweep strategy. The SweepingReprovider basically does the following:

  • Expose Provide() and ProvideMany() methods. All cids passed through this methods are provided to the DHT as expected.
  • All cids that are given through the above methods are stored in a trie. The reprovider implementation keeps a state of all cids it is responsible for reproviding.
  • The reprovider schedules when reprovides should happen for each keyspace region (for which there is at least 1 cid). Region reprovides are spread evenly over the reprovide interval.
  • Once the time to reprovide a region has come, the reprovider explores the region, and allocate the provider records corresponding to the cids belonging to this region to the appropriate peers.

Features

  • Concurrency limit
    • Ability to configure the number of workers both for i) initial provide operation and ii) regular reprovides
    • Limit the number of connection that a worker can open
  • Parallel reprovide
    • If a reprovide isn't complete and it is time for the next one, the next one can start already given there are some available workers
  • Error handling
    • If a cid or a complete region couldn't be provided, the operation will be retried later until it succeeds
  • Connectivity checker
    • The reprovider will check connectivity on provide failure, and won't try to provide as long as the node is offline.
    • When the node comes back online, the activity resumes with (re)providing the cids/regions that should have been provided during the down time.
  • Dynamic prefix length estimation
    • When starting up, the reprovider doesn't know how many peers are included in a region. It will hence make a few GCP requests to estimate the initial prefix length for exploring regions.
  • Reset reprovided cids
    • Offer a ResetReprovideSet method to replace the cids that must be reprovided.

Missing features

  • Store keys to reprovide in Datastore instead of memory.
    • Currently a trie.Trie in memory containing all cids to be reprovided
    • Ideally move the trie to Datastore
    • Keys can be grouped by region/prefix if it helps
      • Anyway they will be loaded by region
      • Not sure if adding just 1 key to a group is easy
  • (optional) Persist when a region is reprovided to the datastore (region prefix, timestamp, e.g prefix -> timestamp).
    • Allows resuming reprovides after a crash/shutdown and start by catching up regions that should have been provided during the down time.
      • For this it may be useful to save the last reprovided region (e.g lastProvided -> [prefix, timestamp])
    • Only store the last time a region was reprovided, everytime the region is reprovided we can override the older timestamp
    • Storing timestamp about individual provides would help kubo users know the last time a cid was provided. (e.g cid -> timestamp). These can {expire, be garbage collected} after reprovideInterval.
  • (optional) Persist provide and reprovide queues to datastore
    • Don't loose pending cids on restart
  • Refactor pending cids queue
    • Mix failed cids with cids that were just added using Provide()
    • Allows to group close cids together to provide more efficiently
    • We may loose prioritization (e.g calling Provide(cidA) before Provide(cidB) doesn't mean that cidA will be provided before cidB)
  • [ ] The Dual DHT (used by Kubo) currently has 1 SweepingReprovider for each DHT (LAN and WAN)
    • Allow the SweepingReprovider to (re)provide content to multiple DHT swarms with a single scheduler and cids store (trie)
    • It means that pending regions/cids have to be distinct for each swarm since provide could work for a swarm, but fail in another one
    • It will probably require multiple ConnectivityCheckers, one for each DHT swarm.
    • This isn't useful since the schedule depends on the network size. Hence each network should have its own schedule.
    • The only thing that can be shared between the 2 ReprovideSweepers is the set of cids that needs to be reprovided (datastore).
  • If we decide to change the routing/provide interfaces in kubo get rid of the boxo/provider.System implementation in go-libp2p-kad-dht/dual/reprovider.go
  • (optional) Provide status provider: ProvideStatus interface #1110

TODO

  • Complete implementation with missing mandatory features
  • Implementation review Reprovide Sweep #1095
  • (optional) Increase unit & integration test coverage
  • (optional) Increase amino DHT test coverage
  • Benchmark performance vs default and accelerated DHT clients
  • (optional) High level documentation in go-libp2p-kad-dht/reprovider/README.md
  • Integration in kubo feat: opt-in new Sweep provide system ipfs/kubo#10834
    • This one is going to be long and painful 😢

Admin

Depends on:

Need new release of:

Closes #824

Part of ipshipyard/roadmaps#6, ipshipyard/roadmaps#7, ipshipyard/roadmaps#8

@guillaumemichel guillaumemichel force-pushed the reprovide-sweep branch 2 times, most recently from 6cad84d to 9bd54ba Compare May 21, 2025 14:54
@lidel lidel mentioned this pull request May 21, 2025
46 tasks
@guillaumemichel guillaumemichel force-pushed the reprovide-sweep branch 2 times, most recently from 18b6384 to 3955aa1 Compare August 6, 2025 12:59
when claiming a region to reprovide, if a superstring of the prefix is
already claims, Adding to the trie will panic. This fix is to prune any
potential superstrings from the trie before adding.
Copy link
Copy Markdown
Contributor

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

Comment on lines +211 to +212
// ResetCids purges the KeyStore and repopulates it with the provided cids.
func (s *KeyStore) ResetCids(ctx context.Context, keysChan <-chan cid.Cid) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name it Purge

Suggested change
// ResetCids purges the KeyStore and repopulates it with the provided cids.
func (s *KeyStore) ResetCids(ctx context.Context, keysChan <-chan cid.Cid) error {
// Purge removes all cids from the KeyStore and repopulates it with the provided cids.
func (s *KeyStore) Purge(ctx context.Context, keysChan <-chan cid.Cid) error {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to reserve Purge for the following function

func (s *KeyStore) Purge(ctx context.Context, keysChan <-chan mh.Multihash) error { }

The system is working with multihashes only, this is the only place where we deal with cids. For now it is easier to consume cids because we need to keep compatibility with the old system, which consumes cids. However, when we deprecate the old system, we should be able to switch to using Multihashes only.

Reserving the name would allow a smoother transition.

for prefix, hs := range groups {
dsKey := s.dsKey(prefix)
var stored []mh.Multihash
data, err := s.ds.Get(ctx, dsKey)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really worth checking if the key is already stored, instead of just overwriting any existing key?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because in the current state, we read the stored values, append new values to the existing ones, and write back everything to the datastore.

It would probably be better to store keys directly though (instead of slices of keys).


for prefix, toDel := range groups {
dsKey := s.dsKey(prefix)
data, err := s.ds.Get(ctx, dsKey)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to Get before Delete? Seems better to just delete whether or not item is in datastore.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to get the group of keys to which the target belong, remove it from the group, and write back the group to the datastore.

@guillaumemichel
Copy link
Copy Markdown
Collaborator Author

Review was addressed in #1133, since this PR will soon be closed, superseded by #1095

@guillaumemichel
Copy link
Copy Markdown
Collaborator Author

Closing in favor of #1095

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reprovide Sweep

2 participants