An unofficial re-implementation of FoldingDiff, a diffusion-based generative model for protein backbone structure generation. The official implementation of FoldingDiff can be found here.
Install through pip.
$ pip install foldingdiff-pytorch$ python -m foldingdiff_pytorch.train --meta data/meta.csv \
--data-dir data/npy --batch-size 64$ python -m foldingdiff_pytorch.sample --ckpt [CHECKPOINT_PATH] \
--timepoints 1000 --out [OUTPUT_PATH]With the snakemake command below, you can simply run unconditional protein backbone generation pipeline to obtain .pt files containing backbone coordinates and .gif files showing the whole denoising process.
$ snakemake -s sample.smk -j1Download non-redundant protein backbone structure data (40% similary cutoff) from CATH.
$ wget ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/non-redundant-data-sets/cath-dataset-nonredundant-S40.pdb.tgzExtract the downloaded file and attach .pdb extension to files
$ tar xvf cath-dataset-nonredundant-S40.pdb.tgz && cd dompdb
$ for f in *; do mv "$f" "$f.pdb"; doneRun snakemake pipeline to convert pdb files to npy files containing angle information of shape (n, 6).
$ snakemake -s preprocess.smk -prq -j [CORES] --keep-going
Model training for reproduction is currently running. The live training log is available at here.
Visualized Ramachandran plot for 10 samples of length 64 for sanity check while training. Looks like the model is learning to produce reasonable secondary structures.
@misc{wu2022protein,
title={Protein structure generation via folding diffusion},
author={Kevin E. Wu and Kevin K. Yang and Rianne van den Berg and James Y. Zou and Alex X. Lu and Ava P. Amini},
year={2022},
eprint={2209.15611},
archivePrefix={arXiv},
primaryClass={q-bio.BM}
}





