-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Description
Objective
Establish a standardized folder structure and file naming convention for new data ingest processes, ensuring compatibility with the latest release schema and efficient storage/validation practices.
Requirements
- Create a new ingest folder in the repository.
- Within the ingest folder, create a subfolder for each data provider.
- All ingests must support the latest release schema.
- Depending on total data size, files should be split to limit each to ~25 MB.
- Do not split records between files: each file must contain only complete records so that validation can be performed independently.
- All data files are to be formatted as JSON lists (enclosed in brackets). Consider https://jsonlines.org/ as an alternative approach if more appropriate for downstream usage.
- File naming convention:
<data provider>_<padded 5 number>.json(e.g.,emsl_00001.json). - Future - explore jsonlines formate
Acceptance Criteria
- New ingest folder structure is documented and implemented.
- Each data provider has its own subfolder.
- All files conform to the current release schema.
- No file exceeds ~25 MB; splitting strategy is documented.
- No records are split between files; all files independently valid.
- Naming convention is followed for all new files.
- JSON format (list or dict) is clearly specified and documented.
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request