-
Notifications
You must be signed in to change notification settings - Fork 1
Description
The naive way to use snakemake is that when it's re-run, it will re-execute any steps if an upstream data file has been updated (ie. its timestamp is newer). This means even if a data CSV from several months ago is "touched", or some of the FTPS sentinel files are deleted in a clearout, snakemake might well decide to re-execute several months worth of data, which we don't want to happen unless we really mean it!
Therefore, we need to pass an explicit date as a config parameter to snakemake, so it will only include files with that date in the filename in its processing. The date would normally be yesterday's date, since we intend to run the script daily in the early hours of the morning.
Definition of done
There is no easy mechanism by which more than a day of data can be processed at a time. It must still be possible in dev though.
Corollory: Deliberate re-processing will have to be manually invoked, so there must exist documentation for this process, and a wrapper script if needed to make it easy (but still explicit).