Using AlphaFold2 locallyAlphaFold2 has been installed locally on the bioinformatics infrastructure. As of March 4May 16, 2022, version 2.1.0 is accessible, and include access to the multimer module. In order to run AlphaFold2 locally, you need to connect to the jacquere host (either via SSH or physically). Once connected, open a terminal window and browse to the folder where you want the data to be saved. From there, create a text file containing your amino acid squence of interest in FASTA format and paste the amino acid sequence into the file. Bloc de code |
---|
| touch NSs.fasta
echo ">RVFV NSs
MDYFPVISVDLQSGRRVVSVEYFRGDGPPRIPYSMVGPCCVFLMHHRPSHEVRLRFSDFY
NVGEFPYRVGLGDFASNVAPPPAKPFQRLIDLIGHMTLSDFTRFPNLKEAISWPLGEPSL
AFFDLSSTRVHRNDDIRRDQIATLAMRSCKITNDLEDSFVGLHRMIATEAILRGIDLCLL
PGFDLMYEVAHVQCVRLLQAAKEDISNAVVPNSALIVLMEESLMLRSSLPSMMGRNNWIP
VIPPIPDVEMESEEESDDDGFVEVD" > NSs.fasta |
Then use the following wrapper script in order to run AlphaFold2. You may want to adjust the parameters to your specific case. Make the file executable and run it. Bloc de code |
---|
| chmod +x run.sh
./run.sh NSs.fasta |
As a reference, it took about 2h50 for the calculation to predict 5 models for RVFV NSs (the sequence given as example above) to take place on jacquere . Once it is done, you can go to the results folder and analyze your results. A description of the various files produced by the program is provided on the GitHub page of AlphaFold. A wiki page is also dedicated to the vizualisation of the result using PyMOL. Below is the content of the script. Bloc de code |
---|
| #!/bin/bash
#
# Alphafold 2, version 2.1.0
# wrapper script to run Alphafold2 for a monomer
#
# usage:
# ./run.sh sequence.fasta
#
# adjust parameters as required
DATA_DIR="/home/sbio/afdb/"
OUTPUT_DIR=$(pwd)
MAX_TEMPLATE_DATE=$(date -I)
DB_PRESET="full_dbs"
BFD_DB="${DATA_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt"
UNICLUST30_DB="${DATA_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
UNIREF90_DB="${DATA_DIR}/uniref90/uniref90.fasta"
MGNIFY_DB="${DATA_DIR}/mgnify/mgy_clusters_2018_12.fa"
TEMPLATE_MMCIF_DIR="${DATA_DIR}/pdb_mmcif/mmcif_files"
PDB70_DB="${DATA_DIR}/pdb70/pdb70"
OBSOLETE_PATH="${DATA_DIR}/pdb_mmcif/obsolete.dat"
python3 /usr/local/alphafold/run_alphafold.py \
--data_dir=$DATA_DIR \
--output_dir=$OUTPUT_DIR \
--max_template_date=$MAX_TEMPLATE_DATE \
--db_preset=$DB_PRESET \
--bfd_database_path=$BFD_DB \
--uniclust30_database_path=$UNICLUST30_DB \
--uniref90_database_path=$UNIREF90_DB \
--mgnify_database_path=$MGNIFY_DB \
--template_mmcif_dir=$TEMPLATE_MMCIF_DIR \
--pdb70_database_path=$PDB70_DB \
--obsolete_pdbs_path=$OBSOLETE_PATH \
--use_gpu_relax="true" \
--fasta_paths=${1}
|
Using the multimer optionIt is now possible to run AlphaFold2 in multimer mode (AlphaFold-Multimer) where the prediction of a complex formation is done (more information can be found on the original publication descibing the process). To do this locally, a modified wrapper script should be used where the multimer flag is set. An example can be downloaded here. The content is the following: Bloc de code |
---|
language | bash |
---|
title | Multimer |
---|
| #!/bin/bash
# Alphafold 2, version 2.1.0
# wrapper script to run Alphafold for a multimer
#
# usage:
# ./run_alphafold_tempate_multimer.sh multiple_sequences.fasta
#
# adjust parameters as required
DATA_DIR="/home/sbio/afdb"
OUTPUT_DIR=$(pwd)
MAX_TEMPLATE_DATE=$(date -I)
DB_PRESET="full_dbs"
BFD_DB="${DATA_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt"
UNICLUST30_DB="${DATA_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
UNIPROT_DB="${DATA_DIR}/uniprot/uniprot.fasta"
UNIREF90_DB="${DATA_DIR}/uniref90/uniref90.fasta"
MGNIFY_DB="${DATA_DIR}/mgnify/mgy_clusters_2018_12.fa"
TEMPLATE_MMCIF_DIR="${DATA_DIR}/pdb_mmcif/mmcif_files"
PDB_SEQRES_DB="${DATA_DIR}/pdb_seqres/pdb_seqres.txt"
PDB70_DB="${DATA_DIR}/pdb70/pdb70"
OBSOLETE_PATH="${DATA_DIR}/pdb_mmcif/obsolete.dat"
python3 /usr/local/alphafold/run_alphafold.py \
--data_dir=$DATA_DIR \
--output_dir=$OUTPUT_DIR \
--max_template_date=$MAX_TEMPLATE_DATE \
--db_preset=$DB_PRESET \
--bfd_database_path=$BFD_DB \
--uniclust30_database_path=$UNICLUST30_DB \
--uniprot_database_path=$UNIPROT_DB \
--uniref90_database_path=$UNIREF90_DB \
--mgnify_database_path=$MGNIFY_DB \
--template_mmcif_dir=$TEMPLATE_MMCIF_DIR \
--pdb_seqres_database_path=$PDB_SEQRES_DB \
--obsolete_pdbs_path=$OBSOLETE_PATH \
--use_gpu_relax="true" \
--model_preset=multimer \
--fasta_paths=${1} |
Beware that this will likely take a lot of time (expect run times over 48 hours) depending on the complexity of your submission. To give an idea, a recent calculation for a homohexamer of 250 amino acids per monomer took about 65 hours. It is also important to format the FASTA file properly. For a homomer, use a single FASTA file with this format: Bloc de code |
---|
| >sequence_1
<SEQUENCE>
>sequence_2
<SEQUENCE>
>sequence_3
<SEQUENCE> |
And for a heteromer: Bloc de code |
---|
| >sequence_1
<SEQUENCE A>
>sequence_2
<SEQUENCE A>
>sequence_3
<SEQUENCE B>
>sequence_4
<SEQUENCE B>
>sequence_5
<SEQUENCE B> |
Visualizing the results with PyMOLOnce the calculation is done, a subdirectory will be created and will contain your results. The description of each file can be found here. You can then open your results using the ranked_*.pdb files (the relaxed predicted structure) in your preferred molecular visualization software. The per-residue confidence score (pLDDT), is stored in the B-factor field of the PDB file and the values can be used to colour the structure accordingly using the command spectrum b in PyMOL. Higher score means better confidence. A description can be found here. The PyMOLWiki website describes well the spectrum command and the various options that can be applied, including the various colour palettes available. |