-
Notifications
You must be signed in to change notification settings - Fork 38
Description
In AnyVar, we want to receive VRS objects, but they need IDs to be stored. We'd like some kind of function that will ensure IDs get added to received objects if they aren't there already. This needs to recurse down through any contained objects.
I had assumed this existed somewhere in VRS-Python, but I haven't found a working solution yet. Granted, this is a very simple problem, and I could just write something that manually checks if an object is an allele or a different type of object and handles each case, but I felt like there should be a better option, especially since we already have some functions that get close:
ga4gh_identify() returns the correct ID, but doesn't completely update the contained object
It'll add the outermost .id property if you set in_place="always", but not a contained object's ID
In [1]: from ga4gh.vrs import models, normalize; from ga4gh.core import ga4gh_identify; from ga4gh.vrs.enderef import vrs_deref, vrs_enref
In [2]: input_data = {"location": {"end": 87894077, "start": 87894076, "sequenceReference": { "refgetAccession": "SQ.ss8r_wB0-b9r44TQTMmVTI92884QvBiB", "type": "SequenceReference"},},"state": {"sequence": "T"}}
In [3]: allele1 = models.Allele(**input_data)
In [4]: ga4gh_identify(allele1, in_place="always")
Out[8]: 'ga4gh:VA.K7akyz9PHB0wg8wBNVlWAAdvMbJUJJfU'
In [5]: allele1.id
Out[5]: 'ga4gh:VA.K7akyz9PHB0wg8wBNVlWAAdvMbJUJJfU'
In [6]: allele1.location.id is None
Out[6]: True
vrs_enref()/vrs_deref() will update IDs in place, but they'll be wrong
This one seems bad. I mean, maybe I don't understand how these methods are supposed to work, but this is troubling. Regardless, this is not a solution for my problem.
In [7]: storage = {}
In [8]: enreffed = vrs_enref(models.Allele(**input_data), storage)
In [9]: dereffed = vrs_deref(enreffed, storage)
In [10]: dereffed.id
Out[11]: 'ga4gh:VA.UBp6cO0u3i286SZhHhfUo1uFft259YyC'
In [12]: dereffed.location.id
Out[12]: 'ga4gh:SL.01EH5o6V6VEyNUq68gpeTwKE7xOo-WAy'
Note -- that location ID is correct, as far as I can tell. I don't understand why the allele ID is wrong.
vrs_enref() -> vrs_deref() -> clear allele ID and digest -> ga4gh_identify() works, but this is way more complicated than it should be
This is what I put into AnyVar as a temporary measure
def recursive_identify(vrs_object: Type_VrsObject) -> Type_VrsObject:
"""Add GA4GH IDs to an object and all GA4GH-identifiable objects contained within.
:param vrs_object: AnyVar-supported variation object
:return: same object, with any missing ID fields filled in
"""
storage = {}
enreffed = vrs_enref(vrs_object, storage)
dereffed = vrs_deref(enreffed, storage)
dereffed.id = None # type: ignore[reportAttributeAccessIssue]
dereffed.digest = None # type: ignore[reportAttributeAccessIssue]
ga4gh_identify(dereffed, in_place="always")
return dereffed # type: ignore[reportReturnType]I cannot imagine this is the best possible solution to this problem, or that I'm the only person who's ever needed something like this before. I think it'd be nice to either update behavior of the existing functions or add something new that does this efficiently.