Skip to content

Efficiency of catalogue compression script #129

@robjmcgibbon

Description

@robjmcgibbon

We have a script for compressing SOAP catalogues. It consists of two parts. Firstly all the datasets in the catalogue are compressed and written to a temporary output files. This part is done in parallel. Then a single process copies all the datasets from the temporary files into the final output file.

The script is written with MPI. I did try to get multiple ranks to write to the output file, but there were complications due to the fact that the datasets use the custom SWIFT lossy compression filters. Option one would be to try and get this to work.

At second option would be to run the first part of the script on a compute node (where we can make use of all the cores), and then run the second part on a login node. If we went for this option I would add an argument to the compression script which specifies whether to run the first part, second part, or both. I'm not sure how many people would actually bother to use this option though, which is why I haven't implemented it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions