Skip to content

vcf2smc #264

@Risingsun93

Description

@Risingsun93

hii dear @terhorst @willright28 I'm facing same issue "RuntimeError("Distinguished lineages not found in data?")
RuntimeError: Distinguished lineages not found in data?"
using example data mentioned in this github repository.https://github.com/popgenmethods/smcpp/blob/master/example/example.vcf.gz

smc++ vcf2smc example.vcf.gz chr1.smc.gz chr1 CEU:NA12878,NA12879
smc++ vcf2smc -d NA12878 NA12879 example.vcf.gz chr1.smc.gz chr1 CEU:NA12878,NA12879
for i in {7..9};
do smc++ vcf2smc -d NA1287$i NA1287$i example.vcf.gz out.$i.txt chr1 NA12877 NA12878 NA12890;
done
smc++ estimate -o output/ 0.1 out1.txt

kindly help me to solve
please check the header for this file and sample and population info. and suggest me changes to be do accordingly

###########
mylinux@ChiragsPC:~/smcppdata$ smc++ vcf2smc example.vcf.gz chr1.smc.gz chr1 CEU:NA12878,NA12879
2016 smcpp.commands.vcf2smc WARNING Neither missing cutoff (-c) or mask (-m) has been specified. This means that stretches of the chromosome that do not have any VCF entries (for example, centromeres) will be interpreted as homozygous recessive.
2020 smcpp.commands.vcf2smc INFO Population 1:
2020 smcpp.commands.vcf2smc INFO Distinguished lineages: NA12878:0, NA12878:1
2021 smcpp.commands.vcf2smc INFO Undistinguished lineages: NA12879:0, NA12879:1
[E::idx_find_and_load] Could not retrieve index file for 'example.vcf.gz'
Traceback (most recent call last):
File "/home/mylinux/.local/bin/smc++", line 8, in
sys.exit(main())
File "/home/mylinux/.local/lib/python3.10/site-packages/smcpp/frontend/console.py", line 28, in main
cmds[args.command].main(args)
File "/home/mylinux/.local/lib/python3.10/site-packages/smcpp/commands/vcf2smc.py", line 134, in main
raise RuntimeError("Distinguished lineages not found in data?")
RuntimeError: Distinguished lineages not found in data?

mylinux@ChiragsPC:~/smcppdata$ smc++ vcf2smc -d NA12878 NA12879 example.vcf.gz chr1.smc.gz chr1 CEU:NA12878,NA12879
2028 smcpp.commands.vcf2smc WARNING Neither missing cutoff (-c) or mask (-m) has been specified. This means that stretches of the chromosome that do not have any VCF entries (for example, centromeres) will be interpreted as homozygous recessive.
2029 smcpp.commands.vcf2smc INFO Population 1:
2029 smcpp.commands.vcf2smc INFO Distinguished lineages: NA12878:0, NA12879:1
2029 smcpp.commands.vcf2smc INFO Undistinguished lineages: NA12878:1, NA12879:0
[E::idx_find_and_load] Could not retrieve index file for 'example.vcf.gz'
Traceback (most recent call last):
File "/home/mylinux/.local/bin/smc++", line 8, in
sys.exit(main())
File "/home/mylinux/.local/lib/python3.10/site-packages/smcpp/frontend/console.py", line 28, in main
cmds[args.command].main(args)
File "/home/mylinux/.local/lib/python3.10/site-packages/smcpp/commands/vcf2smc.py", line 134, in main
raise RuntimeError("Distinguished lineages not found in data?")
RuntimeError: Distinguished lineages not found in data?

mylinux@ChiragsPC:~/smcppdata$ for i in {7..9};

do smc++ vcf2smc -d NA1287$i NA1287$i example.vcf.gz out.$i.txt chr1 NA12877 NA12878 NA12890;
done
usage: smc++ vcf2smc [-h] [-v] [--cores CORES] [-d sample_id sample_id] [--length LENGTH] [--ignore-missing] [--missing-cutoff c] [--mask MASK] [--drop-first-last] vcf.gz out[.gz] contig pop1 [pop2]
smc++ vcf2smc: error: argument pop1: 'NA12877' should be a comma-separated list of sample ids preceded by a population identifier. See 'smc++ vcf2smc -h'.
usage: smc++ vcf2smc [-h] [-v] [--cores CORES] [-d sample_id sample_id] [--length LENGTH] [--ignore-missing] [--missing-cutoff c] [--mask MASK] [--drop-first-last] vcf.gz out[.gz] contig pop1 [pop2]
smc++ vcf2smc: error: argument pop1: 'NA12877' should be a comma-separated list of sample ids preceded by a population identifier. See 'smc++ vcf2smc -h'.
usage: smc++ vcf2smc [-h] [-v] [--cores CORES] [-d sample_id sample_id] [--length LENGTH] [--ignore-missing] [--missing-cutoff c] [--mask MASK] [--drop-first-last] vcf.gz out[.gz] contig pop1 [pop2]
smc++ vcf2smc: error: argument pop1: 'NA12877' should be a comma-separated list of sample ids preceded by a population identifier. See 'smc++ vcf2smc -h'.

smc++ vcf2smc example.vcf.gz chr1.smc.gz chr1 CEU:NA1885,NA3861
827 smcpp.commands.vcf2smc WARNING Neither missing cutoff (-c) or mask (-m) has been specified. This means that stretches of the chromosome that do not have a
ny VCF entries (for example, centromeres) will be interpreted as homozygous recessive.
827 smcpp.commands.vcf2smc INFO Population 1:
827 smcpp.commands.vcf2smc INFO Distinguished lineages: NA1885:0, NA1885:1
827 smcpp.commands.vcf2smc INFO Undistinguished lineages: NA3861:0, NA3861:1
Traceback (most recent call last):
File "/home/exouser/.local/bin/smc++", line 8, in
sys.exit(main())
File "/home/exouser/.local/lib/python3.8/site-packages/smcpp/frontend/console.py", line 28, in main
cmds[args.command].main(args)
File "/home/exouser/.local/lib/python3.8/site-packages/smcpp/commands/vcf2smc.py", line 128, in main
vcf = VariantFile(args.vcf)
File "pysam/libcbcf.pyx", line 4117, in pysam.libcbcf.VariantFile.init
File "pysam/libcbcf.pyx", line 4347, in pysam.libcbcf.VariantFile.open
ValueError: invalid file b'example.vcf.gz' (mode=b'r') - is it VCF/BCF format?

@willright28 kindly send me your header info from vcf.gz file. If, possible then example data set from your original data,
so that i can do necessary changes accordingly

@terhorst @willright28 i'm using ubuntu linux application on windows10

Regards
Thankyou

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions