Skip to content

Chop up ancestors along the genome, for better parallelization #982

@hyanwong

Description

@hyanwong

Chatting to @nspope, we just had an interesting idea for tsinfer. If you want to get some basic site times out, and don't mind too much about contiguity of the nodes, you could chop up the older ancestors arbitrarily in the same places for all chopped ancestors (e.g. at 1MB intervals). This would be a bit like running inference separately on 1MB chunks of genome, but with the advantage that you don't need to chop up the long, young ancestors. I think @benjeffery 's linesweep algorithm would then see all the chunks as parallelizable.
This could be a good way of doing a fast first pass to get site times for later reinference.

I was imagining a method on an ancestors instance like ancestors.truncate_ancestors, perhaps ancestors.chop(min_time, chop_positions), where chop_positions could be an integer giving a number of regularly spaced chop positions, or an array of floats specifying the positions to use, and only ancestors older than min_time are chopped. I think this could be quite easy to implement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions