Search tandem repeats in given folder with fasta files:
python parallel_trf.py input_folder output_folder mask threadsExample:
python parallel_trf.py ~/human_genome/fasta ~/human_genome/trf fa 20Compute and draw distribution of PE fragment lengths:
python fragments_length_from_sam.py -o image_file -i sam_fileCount unmapped reads:
from PyBioSnippets.sam.sam_functions import count_unmapped
(mapped, unmapped) = count_unmapped(sam_file)Save unmapped reads from SAM file to fasta file:
from PyBioSnippets.sam.sam_functions import save_unmapped_to_fasta
save_unmapped_to_fasta(sam_file, fasta_file)Compute fragment lengths statistics for first l lines.
python fragments_length_from_sam.py -o stat.png -i data.sam -l 100000Count FLAG values for given SAM file:
python hiseq/sam_stats.py -i data.samJoin splitted HiSeq files:
python hiseq/join_fastq.py --remove False --input some_folder --mask read_L001_R1Fix too long quality scores in corrupted HiSeq files
fix_uncorrect_long_quality(fastq_file, corrected_fastq_output)Iterator for pair end files:
for read_obj1, read_obj2 in iter_pe_data(fastq_file1, fastq_file2):
do_somethind()Convert fastq to fasta:
python hiseq/fastq_to_fasta.py -i data.fastq -o data.fastaCompute kmer frequences percents for coverage plot.
python compute_kmer_coverage.py input_file output_fileConvert bax.h5 files into fasta and fastq files.
ls | grep bax.h5 | xargs -n 1 --max-procs 64 python baxh5_to_fastq.py
cat *fasta > pacbio.fasta
cat *fastq > pacbio.fastqGet dictionary with chromosome lengths
chr2length = get_chromosome_lengths(rerence_multifasta)