Comparaison des versions

Légende

  • Ces lignes ont été ajoutées. Ce mot a été ajouté.
  • Ces lignes ont été supprimées. Ce mot a été supprimé.
  • La mise en forme a été modifiée.

...

Tabs Container
directionhorizontal


Tabs Page
idaf2_local
titleLocally

Using AlphaFold2 locally
Ancre
locally
locally

AlphaFold2 has been installed locally on the bioinformatics infrastructure. As of May 16, 2022, version 2.1.0 is accessible, and include access to the multimer module. More on calculating multimeric structures below.

In order to run AlphaFold2 locally, you need to connect to the jacquere host (either via SSH or physically). Once connected, open a terminal window and browse to the folder where you want the data to be saved. From there, create a text file containing your amino acid squence of interest in FASTA format and paste the amino acid sequence into the file.

Bloc de code
languagebash
touch NSs.fasta
echo ">RVFV NSs
MDYFPVISVDLQSGRRVVSVEYFRGDGPPRIPYSMVGPCCVFLMHHRPSHEVRLRFSDFY
NVGEFPYRVGLGDFASNVAPPPAKPFQRLIDLIGHMTLSDFTRFPNLKEAISWPLGEPSL
AFFDLSSTRVHRNDDIRRDQIATLAMRSCKITNDLEDSFVGLHRMIATEAILRGIDLCLL
PGFDLMYEVAHVQCVRLLQAAKEDISNAVVPNSALIVLMEESLMLRSSLPSMMGRNNWIP
VIPPIPDVEMESEEESDDDGFVEVD" > NSs.fasta

Then use the following wrapper script in order to run AlphaFold2. You may want to adjust the parameters to your specific case. Make the file executable and run it.

Bloc de code
languagebash
chmod +x run.sh
./run.sh NSs.fasta

As a reference, it took about 2h50 for the calculation to predict 5 models for RVFV NSs (the sequence given as example above) to take place on jacquere. Once it is done, you can go to the results folder and analyze your results. A description of the various files produced by the program is provided on the GitHub page of AlphaFold.

Below is the content of the script.

Bloc de code
languagebash
#!/bin/bash
#
# Alphafold 2, version 2.1.0
# wrapper script to run Alphafold2 for a monomer
#
# usage:
# ./run.sh sequence.fasta
#
# adjust parameters as required

DATA_DIR="/home/sbio/afdb/"
OUTPUT_DIR=$(pwd)
MAX_TEMPLATE_DATE=$(date -I)
DB_PRESET="full_dbs"
BFD_DB="${DATA_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt"
UNICLUST30_DB="${DATA_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
UNIREF90_DB="${DATA_DIR}/uniref90/uniref90.fasta"
MGNIFY_DB="${DATA_DIR}/mgnify/mgy_clusters_2018_12.fa"
TEMPLATE_MMCIF_DIR="${DATA_DIR}/pdb_mmcif/mmcif_files"
PDB70_DB="${DATA_DIR}/pdb70/pdb70"
OBSOLETE_PATH="${DATA_DIR}/pdb_mmcif/obsolete.dat"

python3 /usr/local/alphafold/run_alphafold.py \
--data_dir=$DATA_DIR \
--output_dir=$OUTPUT_DIR \
--max_template_date=$MAX_TEMPLATE_DATE \
--db_preset=$DB_PRESET \
--bfd_database_path=$BFD_DB \
--uniclust30_database_path=$UNICLUST30_DB \
--uniref90_database_path=$UNIREF90_DB \
--mgnify_database_path=$MGNIFY_DB \
--template_mmcif_dir=$TEMPLATE_MMCIF_DIR \
--pdb70_database_path=$PDB70_DB \
--obsolete_pdbs_path=$OBSOLETE_PATH \
--use_gpu_relax="true" \
--fasta_paths=${1}

Using the multimer option
Ancre
multimer
multimer

It is now possible to run AlphaFold2 in multimer mode (AlphaFold-Multimer) where the prediction of a complex formation is done (more information can be found on the original publication descibing the process). To do this locally, a modified wrapper script should be used where the multimer flag is set. An example can be downloaded here. The content is the following:

Bloc de code
languagebash
titleMultimer
#!/bin/bash
# Alphafold 2, version 2.1.0
# wrapper script to run Alphafold for a multimer
#
# usage:
# ./run_alphafold_tempate_multimer.sh multiple_sequences.fasta
#
# adjust parameters as required

DATA_DIR="/home/sbio/afdb"
OUTPUT_DIR=$(pwd)
MAX_TEMPLATE_DATE=$(date -I)
DB_PRESET="full_dbs"
BFD_DB="${DATA_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt"
UNICLUST30_DB="${DATA_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
UNIPROT_DB="${DATA_DIR}/uniprot/uniprot.fasta"
UNIREF90_DB="${DATA_DIR}/uniref90/uniref90.fasta"
MGNIFY_DB="${DATA_DIR}/mgnify/mgy_clusters_2018_12.fa"
TEMPLATE_MMCIF_DIR="${DATA_DIR}/pdb_mmcif/mmcif_files"
PDB_SEQRES_DB="${DATA_DIR}/pdb_seqres/pdb_seqres.txt"
PDB70_DB="${DATA_DIR}/pdb70/pdb70"
OBSOLETE_PATH="${DATA_DIR}/pdb_mmcif/obsolete.dat"

python3 /usr/local/alphafold/run_alphafold.py \
--data_dir=$DATA_DIR \
--output_dir=$OUTPUT_DIR \
--max_template_date=$MAX_TEMPLATE_DATE \
--db_preset=$DB_PRESET \
--bfd_database_path=$BFD_DB \
--uniclust30_database_path=$UNICLUST30_DB \
--uniprot_database_path=$UNIPROT_DB \
--uniref90_database_path=$UNIREF90_DB \
--mgnify_database_path=$MGNIFY_DB \
--template_mmcif_dir=$TEMPLATE_MMCIF_DIR \
--pdb_seqres_database_path=$PDB_SEQRES_DB \
--obsolete_pdbs_path=$OBSOLETE_PATH \
--use_gpu_relax="true" \
--model_preset=multimer \
--fasta_paths=${1}

Beware that this will likely take a lot of time (expect run times over 48 hours) depending on the complexity of your submission. To give an idea, a recent calculation for a homohexamer of 250 amino acids per monomer took about 65 hours.

It is also important to format the FASTA file properly. For a homomer, use a single FASTA file with this format:

Bloc de code
languagebash
>sequence_1
<SEQUENCE>
>sequence_2
<SEQUENCE>
>sequence_3
<SEQUENCE>

And for a heteromer:

Bloc de code
languagebash
>sequence_1
<SEQUENCE A>
>sequence_2
<SEQUENCE A>
>sequence_3
<SEQUENCE B>
>sequence_4
<SEQUENCE B>
>sequence_5
<SEQUENCE B>

Visualizing the results with PyMOL

Once the calculation is done, a subdirectory will be created and will contain your results. The description of each file can be found here. You can then open your results using the ranked_*.pdb files (the relaxed predicted structure) in your preferred molecular visualization software.

The per-residue confidence score (pLDDT), is stored in the B-factor field of the PDB file and the values can be used to colour the structure accordingly using the command spectrum b in PyMOL. Higher score means better confidence. A description can be found here. The PyMOLWiki website describes well the spectrum command and the various options that can be applied, including the various colour palettes available.


Tabs Page
idaf2_sbgrid
titleSBGrid

Using AlphaFold2 via SBGrid
Ancre
sbgrid
sbgrid

Avertissement

SBGrid is a software infrastructure that includes a library of over 400 structural biology applications, including AlphaFold. The SBGrid software collection is restricted to SBGrid member laboratories.

 The current version installed on our server is Alphafold2 2.1.1. More details are given on the SBGrid Wiki page for AlphaFold.

Info

A username and password are required to access the infrastructure. They can be requested by email to Normand Cyr or Ryan Richter.


Avertissement

You need to log into the host named mazuelo. This computer has a GPU card powerful enough for the calculations required by AlphaFold. It will not work on the other computers of the network.

Once logged in, open a terminal window and on the command prompt, type

Bloc de code
languagebash
sbgrid

This will activate the SBGrid environment and allow you to access the various applications. You should see the following welcome message

Bloc de code
languagebash
********************************************************************************
                  Software Support by SBGrid (www.sbgrid.org)
********************************************************************************
 Your use of the applications contained in the /programs  directory constitutes
 acceptance of  the terms of the SBGrid License Agreement included  in the file
 /programs/share/LICENSE.  The applications  distributed by SBGrid are licensed
 exclusively to member laboratories of the SBGrid Consortium.
              Run sbgrid-accept-license to remove the above message.  
********************************************************************************
 SBGrid was developed with support from its members, Harvard Medical School,    
 HHMI, and NSF. If use of SBGrid compiled software was an important element     
 in your publication, please include the following reference in your work:      
                                                                                      
 Software used in the project was installed and configured by SBGrid.                   
 cite: eLife 2013;2:e01456, Collaboration gets the most out of software.                
********************************************************************************
 SBGrid installation last updated: 2022-01-16
 Please submit bug reports and help requests to:       <bugs@sbgrid.org>  or
                                                       <http://sbgrid.org/bugs>
            For additional information visit https://sbgrid.org/wiki
********************************************************************************

If you do not activate the SBGrid environment, you will get the following error message

Bloc de code
languagebash
SBGrid shell environment is not initialized! Please source
/programs/sbgrid.shrc or /programs/sbgrid.cshrc to use the
software.

In order to run AlphaFold2, you will need at least two text files:

  1. A modified version of the AlphaFold2 wrapper provided by SBGrid that can be downloaded here.
  2. Your protein sequence in FASTA format (a complete description of the FASTA format can be found here). The description line is mandatory. An example file can be downloaded here.

Example FASTA file:

Bloc de code
languagebash
> Rift Valley Fever Virus NSs protein
MDYFPVISVDLQSGRRVVSVEYFRGDGPPRIPYSMVGPCCVFLMHHRPSHEVRLRFSDFY
NVGEFPYRVGLGDFASNVAPPPAKPFQRLIDLIGHMTLSDFTRFPNLKEAISWPLGEPSL
AFFDLSSTRVHRNDDIRRDQIATLAMRSCKITNDLEDSFVGLHRMIATEAILRGIDLCLL
PGFDLMYEVAHVQCVRLLQAAKEDISNAVVPNSALIVLMEESLMLRSSLPSMMGRNNWIP
VIPPIPDVEMESEEESDDDGFVEVD

Content of the run_alphafold_template.sh wrapper:

Bloc de code
languagebash
# Alphafold2, version 2.1.1
# wrapper script to run AlphaFold2 for a monomer
#
# usage:
# ./run_alphafold_tempate.sh sequence.fasta
#
# adjust parameters as required

DATA_DIR="/home/sbio/afdb/"
OUTPUT_DIR=$(pwd)
MAX_TEMPLATE_DATE=$(date -I)
DB_PRESET="full_dbs"
BFD_DB="${DATA_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt"
UNICLUST30_DB="${DATA_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
UNIREF90_DB="${DATA_DIR}/uniref90/uniref90.fasta"
MGNIFY_DB="${DATA_DIR}/mgnify/mgy_clusters_2018_12.fa"
TEMPLATE_MMCIF_DIR="${DATA_DIR}/pdb_mmcif/mmcif_files"
PDB70_DB="${DATA_DIR}/pdb70/pdb70"
OBSOLETE_PATH="${DATA_DIR}/pdb_mmcif/obsolete.dat"

run_alphafold.py \
--data_dir=$DATA_DIR \
--output_dir=$OUTPUT_DIR \
--max_template_date=$MAX_TEMPLATE_DATE \
--db_preset=$DB_PRESET \
--bfd_database_path=$BFD_DB \
--uniclust30_database_path=$UNICLUST30_DB \
--uniref90_database_path=$UNIREF90_DB \
--mgnify_database_path=$MGNIFY_DB \
--template_mmcif_dir=$TEMPLATE_MMCIF_DIR \
--pdb70_database_path=$PDB70_DB \
--obsolete_pdbs_path=$OBSOLETE_PATH \
--fasta_paths=${1} \

Replace the --fasta_paths= value to point to your FASTA file in the run_alphafold_template.sh wrapper, and make the wrapper script executable.

Bloc de code
languagebash
chmod +x run_alphafold_template.sh

Then, to run AlphaFold, enter the following command in the terminal prompt:

Bloc de code
languagebash
./run_alphafold_template.sh [FASTA file location]

As a reference, it took about 60 minutes for the calculation to predict 5 models for RVFV NSs (the sequence given as example above) to take place on mazuelo. Once it is done, you can go to the results folder and analyze your results. A description of the various files produced by the program is provided on the GitHub page of AlphaFold.

Using pTM models

In order to use pTM models, you need to set the ALPHAFOLD_PTM environment variable before running AlphaFold:

Bloc de code
languagebash
export ALPHAFOLD_PTM=true

Using the multimer option

It is now possible to run predictions with multimers. A wrapper script can be downloaded here. Content of the run_alphafold_template_multimer.sh wrapper:

Bloc de code
languagebash
# AlphaFold2 2, version 2.1.1
# wrapper script to run AlphaFold2 for a multimer
#
# usage:
# ./run_alphafold_tempate_multimer.sh multiple_sequences.fasta
#
# adjust parameters as required

DATA_DIR="/home/sbio/afdb/"
OUTPUT_DIR=$(pwd)
MAX_TEMPLATE_DATE=$(date -I)
DB_PRESET="full_dbs"
BFD_DB="${DATA_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt"
UNICLUST30_DB="${DATA_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
UNIREF90_DB="${DATA_DIR}/uniref90/uniref90.fasta"
MGNIFY_DB="${DATA_DIR}/mgnify/mgy_clusters_2018_12.fa"
TEMPLATE_MMCIF_DIR="${DATA_DIR}/pdb_mmcif/mmcif_files"
PDB70_DB="${DATA_DIR}/pdb70/pdb70"
OBSOLETE_PATH="${DATA_DIR}/pdb_mmcif/obsolete.dat"
PDB_SEQRES_DATABASE_PATH="${DATA_DIR}/pdb_seqres/pdb_seqres.txt"
UNIPROT_DATABASE_PATH="${DATA_DIR}/uniprot/uniprot.fasta"

run_alphafold.py \
--data_dir=$DATA_DIR \
--output_dir=$OUTPUT_DIR \
--max_template_date=$MAX_TEMPLATE_DATE \
--db_preset=$DB_PRESET \
--bfd_database_path=$BFD_DB \
--uniclust30_database_path=$UNICLUST30_DB \
--uniref90_database_path=$UNIREF90_DB \
--mgnify_database_path=$MGNIFY_DB \
--template_mmcif_dir=$TEMPLATE_MMCIF_DIR \
--pdb_seqres_database_path=$PDB_SEQRES_DATABASE_PATH \
--uniprot_database_path=$UNIPROT_DATABASE_PATH \
--obsolete_pdbs_path=$OBSOLETE_PATH \
--model_preset=multimer \
--fasta_paths=${1}

All protein sequences have to be in the same FASTA file.


Tabs Page
idaf2_googlecolab
titleGoogle Colab

Using AlphaFold2 via Google Colab
Ancre
google_colab
google_colab

DeepMind, in collaboration with Google, has provided users with a Colab notebook to run AlphaFold2 (currently version 2.1.0). The notebook can be opened at the address below, and Instructions are provided in the notebook as to how to proceed:

https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb

Avertissement

As stated in the Colab notebook:

"In comparison to AlphaFold2 v2.0, this Colab notebook uses no templates (homologous structures) and a selected portion of the BFD database. We have validated these changes on several thousand recent PDB structures. While accuracy will be near-identical to the full AlphaFold2 system on many targets, a small fraction have a large drop in accuracy due to the smaller MSA and lack of templates. For best reliability, we recommend instead using the full open source AlphaFold, or the AlphaFold2 Protein Structure Database."

There is now a faster approach to multiple sequence alignment within AlphaFold2 developped by the group of Martin Steinegger (reference to the preprint paper). It has been implemented in a Google Colab notebook similarly to the above. The prediction should take about 10 minutes to run (vs. 60+ minutes with the DeepMind implementation above).

https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb


Tabs Page
idaf2_nmrbox
titleNMRbox

Using AlphaFold2 in NMRbox
Ancre
nmrbox
nmrbox

"NMRbox is a resource for biomolecular NMR (Nuclear Magnetic Resonance) software. It provides tools for finding the software you need, documentation and tutorials for getting the most out of the software, and cloud-based virtual machines for executing the software."

After registering, it is possible to log into a virtual machine and run AlphaFold2 predictions (version 2.1.1). At the moment, only T4 equipped VMs are available for this task: argon.nmrbox.org, oxygen.nmrbox.org, rubidium.nmrbox.org, and zirconium.nmrbox.org. Once logged in:

  • Open a terminal and go to the desired output directory
  • Create a FASTA file containing the protein sequence of interest
  • Run AlphaFold2 using the AlphaFold2 FASTA_file.fasta command. Other options are availables, see AlphaFold2 -h.


...