Skip to content

Speeding up for pages with large histories #8

@tingofurro

Description

@tingofurro

Hey,

First of all great work, this is going to be a tremendous resource and save a lot of time to fetch and organize Wikipedia revision histories.
I've noticed that for some "larger" pages, the loading time is very slow, because of what I assume is a large number of revisions to load.
For example when I run:

import wikipedia_histories
histories = wikipedia_histories.get_history("Paris")

It has been running for 5+ minutes without any end in sight.

Do you have an idea of what we can do to speed up the process? Perhaps batch retrieval? It could also be useful to do also have a progress bar (tqdm) or a "limit" function to specify for example edits in a time range.

Thanks again,
Philippe

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions