-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hi,
I used the -C option to compare two eigenstrat databases and found a few duplicates.
So, I executed the following to remove the dublicates from one of the databases:
eigenstrat_database_tools.py -g v54.1_1240K_public_Olalde2019.geno -s v54.1_1240K_public_Olalde2019.snp -i v54.1_1240K_public_Olalde2019.ind -o v54.1_1240K_public_Olalde2019_no_duplicates -L Olalde2019_duplicates.txt -R
and I met the following message
Traceback (most recent call last):
File "/home/psonis/software/EigenStratDatabaseTools/eigenstrat_database_tools.py", line 86, in
validate_eigenstrat(args.genoFn, args.snpFn, args.indFn)
File "/home/psonis/software/EigenStratDatabaseTools/eigenstrat_database_tools.py", line 21, in validate_eigenstrat
dimsGeno = [file_len(genof), file_width(genof)]
File "/home/psonis/software/EigenStratDatabaseTools/eigenstrat_database_tools.py", line 8, in file_len
for i, l in enumerate(f):
File "/home/psonis/miniconda3/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 90: invalid start byte
Any thoughs on how to resolve this?
Nikos
I just figure out that the geno files in Reich dataset are PACKEDANCESTRYMAP (binary) so your tool needs the non packed EIGENSTRAT (I converted it with convertf). I think that you should either inform the user that the files with geno extension could be not EIGENSTRAT or allow your tool to accept binary files.