Using AlphaFold2 via SBGrid Avertissement |
---|
SBGrid is a software infrastructure that includes a library of over 400 structural biology applications, including AlphaFold. The SBGrid software collection is restricted to SBGrid member laboratories. |
The current version installed on our server is Alphafold2 2.1.1. More details are given on the SBGrid Wiki page for AlphaFold. Info |
---|
A username and password are required to access the infrastructure. They can be requested by email to Normand Cyr or Ryan Richter. |
Avertissement |
---|
You need to log into the host named mazuelo . This computer has a GPU card powerful enough for the calculations required by AlphaFold. It will not work on the other computers of the network. |
Once logged in, open a terminal window and on the command prompt, type This will activate the SBGrid environment and allow you to access the various applications. You should see the following welcome message Bloc de code |
---|
| ********************************************************************************
Software Support by SBGrid (www.sbgrid.org)
********************************************************************************
Your use of the applications contained in the /programs directory constitutes
acceptance of the terms of the SBGrid License Agreement included in the file
/programs/share/LICENSE. The applications distributed by SBGrid are licensed
exclusively to member laboratories of the SBGrid Consortium.
Run sbgrid-accept-license to remove the above message.
********************************************************************************
SBGrid was developed with support from its members, Harvard Medical School,
HHMI, and NSF. If use of SBGrid compiled software was an important element
in your publication, please include the following reference in your work:
Software used in the project was installed and configured by SBGrid.
cite: eLife 2013;2:e01456, Collaboration gets the most out of software.
********************************************************************************
SBGrid installation last updated: 2022-01-16
Please submit bug reports and help requests to: <bugs@sbgrid.org> or
<http://sbgrid.org/bugs>
For additional information visit https://sbgrid.org/wiki
******************************************************************************** |
If you do not activate the SBGrid environment, you will get the following error message Bloc de code |
---|
| SBGrid shell environment is not initialized! Please source
/programs/sbgrid.shrc or /programs/sbgrid.cshrc to use the
software. |
In order to run AlphaFold2, you will need at least two text files: - A modified version of the AlphaFold2 wrapper provided by SBGrid that can be downloaded here.
- Your protein sequence in FASTA format (a complete description of the FASTA format can be found here). The description line is mandatory. An example file can be downloaded here.
Example FASTA file: Bloc de code |
---|
| > Rift Valley Fever Virus NSs protein
MDYFPVISVDLQSGRRVVSVEYFRGDGPPRIPYSMVGPCCVFLMHHRPSHEVRLRFSDFY
NVGEFPYRVGLGDFASNVAPPPAKPFQRLIDLIGHMTLSDFTRFPNLKEAISWPLGEPSL
AFFDLSSTRVHRNDDIRRDQIATLAMRSCKITNDLEDSFVGLHRMIATEAILRGIDLCLL
PGFDLMYEVAHVQCVRLLQAAKEDISNAVVPNSALIVLMEESLMLRSSLPSMMGRNNWIP
VIPPIPDVEMESEEESDDDGFVEVD |
Content of the run_alphafold_template.sh wrapper: Bloc de code |
---|
| # Alphafold2, version 2.1.1
# wrapper script to run AlphaFold2 for a monomer
#
# usage:
# ./run_alphafold_tempate.sh sequence.fasta
#
# adjust parameters as required
DATA_DIR="/home/sbio/afdb/"
OUTPUT_DIR=$(pwd)
MAX_TEMPLATE_DATE=$(date -I)
DB_PRESET="full_dbs"
BFD_DB="${DATA_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt"
UNICLUST30_DB="${DATA_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
UNIREF90_DB="${DATA_DIR}/uniref90/uniref90.fasta"
MGNIFY_DB="${DATA_DIR}/mgnify/mgy_clusters_2018_12.fa"
TEMPLATE_MMCIF_DIR="${DATA_DIR}/pdb_mmcif/mmcif_files"
PDB70_DB="${DATA_DIR}/pdb70/pdb70"
OBSOLETE_PATH="${DATA_DIR}/pdb_mmcif/obsolete.dat"
run_alphafold.py \
--data_dir=$DATA_DIR \
--output_dir=$OUTPUT_DIR \
--max_template_date=$MAX_TEMPLATE_DATE \
--db_preset=$DB_PRESET \
--bfd_database_path=$BFD_DB \
--uniclust30_database_path=$UNICLUST30_DB \
--uniref90_database_path=$UNIREF90_DB \
--mgnify_database_path=$MGNIFY_DB \
--template_mmcif_dir=$TEMPLATE_MMCIF_DIR \
--pdb70_database_path=$PDB70_DB \
--obsolete_pdbs_path=$OBSOLETE_PATH \
--fasta_paths=${1} \ |
Replace the --fasta_paths= value to point to your FASTA file in the run_alphafold_template.sh wrapper, and make the wrapper script executable. Bloc de code |
---|
| chmod +x run_alphafold_template.sh |
Then, to run AlphaFold, enter the following command in the terminal prompt: Bloc de code |
---|
| ./run_alphafold_template.sh [FASTA file location] |
As a reference, it took about 60 minutes for the calculation to predict 5 models for RVFV NSs (the sequence given as example above) to take place on mazuelo . Once it is done, you can go to the results folder and analyze your results. A description of the various files produced by the program is provided on the GitHub page of AlphaFold. Using pTM modelsIn order to use pTM models, you need to set the ALPHAFOLD_PTM environment variable before running AlphaFold: Bloc de code |
---|
| export ALPHAFOLD_PTM=true |
Using the multimer optionIt is now possible to run predictions with multimers. A wrapper script can be downloaded here. Content of the run_alphafold_template_multimer.sh wrapper: Bloc de code |
---|
| # AlphaFold2 2, version 2.1.1
# wrapper script to run AlphaFold2 for a multimer
#
# usage:
# ./run_alphafold_tempate_multimer.sh multiple_sequences.fasta
#
# adjust parameters as required
DATA_DIR="/home/sbio/afdb/"
OUTPUT_DIR=$(pwd)
MAX_TEMPLATE_DATE=$(date -I)
DB_PRESET="full_dbs"
BFD_DB="${DATA_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt"
UNICLUST30_DB="${DATA_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
UNIREF90_DB="${DATA_DIR}/uniref90/uniref90.fasta"
MGNIFY_DB="${DATA_DIR}/mgnify/mgy_clusters_2018_12.fa"
TEMPLATE_MMCIF_DIR="${DATA_DIR}/pdb_mmcif/mmcif_files"
PDB70_DB="${DATA_DIR}/pdb70/pdb70"
OBSOLETE_PATH="${DATA_DIR}/pdb_mmcif/obsolete.dat"
PDB_SEQRES_DATABASE_PATH="${DATA_DIR}/pdb_seqres/pdb_seqres.txt"
UNIPROT_DATABASE_PATH="${DATA_DIR}/uniprot/uniprot.fasta"
run_alphafold.py \
--data_dir=$DATA_DIR \
--output_dir=$OUTPUT_DIR \
--max_template_date=$MAX_TEMPLATE_DATE \
--db_preset=$DB_PRESET \
--bfd_database_path=$BFD_DB \
--uniclust30_database_path=$UNICLUST30_DB \
--uniref90_database_path=$UNIREF90_DB \
--mgnify_database_path=$MGNIFY_DB \
--template_mmcif_dir=$TEMPLATE_MMCIF_DIR \
--pdb_seqres_database_path=$PDB_SEQRES_DATABASE_PATH \
--uniprot_database_path=$UNIPROT_DATABASE_PATH \
--obsolete_pdbs_path=$OBSOLETE_PATH \
--model_preset=multimer \
--fasta_paths=${1} |
All protein sequences have to be in the same FASTA file. |