-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
The naive way to use snakemake is that when it's re-run, it will re-execute any steps if an upstream data file has been updated (ie. its timestamp is newer). This means even if a data CSV from several months ago is "touched", or some of the FTPS sentinel files are deleted in a clearout, snakemake might well decide to re-execute several months worth of data, which we don't want to happen unless we really mean it!
Therefore, we need to pass an explicit date as a config parameter to snakemake, so it will only include files with that date in the filename in its processing. The date would normally be yesterday's date, since we intend to run the script daily in the early hours of the morning.
Definition of done
- It will not be automatic or easy to accidentally process more than a day of data at a time. It must still be possible in dev though.
- Corollory: Deliberate re-processing will have to be manually invoked, so there must exist documentation for this process, and a wrapper script if needed to make it easy (but still explicit).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels