Comparaison des versions

Légende

  • Ces lignes ont été ajoutées. Ce mot a été ajouté.
  • Ces lignes ont été supprimées. Ce mot a été supprimé.
  • La mise en forme a été modifiée.

...

  1. Locally
  2. SBGrid
  3. Google Colab
  4. NMRbox


Tabs Container
directionhorizontal


Tabs Page
idaf2_local
titleUsing AlphaFold2 locally

Using AlphaFold2 locally
Ancre
locally
locally

AlphaFold2 has been installed locally on the bioinformatics infrastructure. As of March 4, 2022, version 2.1.0 is accessible, and include access to the multimer module.

In order to run AlphaFold2 locally, you need to connect to the jacquere host. Once connected, open a terminal window and browse to the folder where you want the data to be saved. From there, create a text file containing your amino acid squence of interest in FASTA format and paste the amino acid sequence into the file.

Bloc de code
languagebash
touch NSs.fasta
echo ">RVFV NSs
MDYFPVISVDLQSGRRVVSVEYFRGDGPPRIPYSMVGPCCVFLMHHRPSHEVRLRFSDFY
NVGEFPYRVGLGDFASNVAPPPAKPFQRLIDLIGHMTLSDFTRFPNLKEAISWPLGEPSL
AFFDLSSTRVHRNDDIRRDQIATLAMRSCKITNDLEDSFVGLHRMIATEAILRGIDLCLL
PGFDLMYEVAHVQCVRLLQAAKEDISNAVVPNSALIVLMEESLMLRSSLPSMMGRNNWIP
VIPPIPDVEMESEEESDDDGFVEVD" > NSs.fasta

Then use the following wrapper script in order to run AlphaFold2. You may want to adjust the parameters to your specific case. Make the file executable and run it.

Bloc de code
languagebash
chmod +x run.sh
./run.sh NSs.fasta

As a reference, it took about 2h50 for the calculation to predict 5 models for RVFV NSs (the sequence given as example above) to take place on jacquere. Once it is done, you can go to the results folder and analyze your results. A description of the various files produced by the program is provided on the GitHub page of AlphaFold. A wiki page is also dedicated to the vizualisation of the result using PyMOL.

Below is the content of the script.

Bloc de code
languagebash
#!/bin/bash
#
# Alphafold 2, version 2.1.0
# wrapper script to run Alphafold2 for a monomer
#
# usage:
# ./run.sh sequence.fasta
#
# adjust parameters as required

DATA_DIR="/home/sbio/afdb/"
OUTPUT_DIR=$(pwd)
MAX_TEMPLATE_DATE=$(date -I)
DB_PRESET="full_dbs"
BFD_DB="${DATA_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt"
UNICLUST30_DB="${DATA_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
UNIREF90_DB="${DATA_DIR}/uniref90/uniref90.fasta"
MGNIFY_DB="${DATA_DIR}/mgnify/mgy_clusters_2018_12.fa"
TEMPLATE_MMCIF_DIR="${DATA_DIR}/pdb_mmcif/mmcif_files"
PDB70_DB="${DATA_DIR}/pdb70/pdb70"
OBSOLETE_PATH="${DATA_DIR}/pdb_mmcif/obsolete.dat"

python3 /usr/local/alphafold/run_alphafold.py \
--data_dir=$DATA_DIR \
--output_dir=$OUTPUT_DIR \
--max_template_date=$MAX_TEMPLATE_DATE \
--db_preset=$DB_PRESET \
--bfd_database_path=$BFD_DB \
--uniclust30_database_path=$UNICLUST30_DB \
--uniref90_database_path=$UNIREF90_DB \
--mgnify_database_path=$MGNIFY_DB \
--template_mmcif_dir=$TEMPLATE_MMCIF_DIR \
--pdb70_database_path=$PDB70_DB \
--obsolete_pdbs_path=$OBSOLETE_PATH \
--use_gpu_relax="true" \
--fasta_paths=${1}




Tabs Page
idaf2_sbgrid
titleUsing AlphaFold2 via SBGrid

Using AlphaFold2 via SBGrid
Ancre
sbgrid
sbgrid

Avertissement

SBGrid is a software infrastructure that includes a library of over 400 structural biology applications, including AlphaFold. The SBGrid software collection is restricted to SBGrid member laboratories.

 The current version installed on our server is Alphafold2 2.1.1. More details are given on the SBGrid Wiki page for AlphaFold.

Info

A username and password are required to access the infrastructure. They can be requested by email to Normand Cyr or Ryan Richter.


Avertissement

You need to log into the host named mazuelo. This computer has a GPU card powerful enough for the calculations required by AlphaFold. It will not work on the other computers of the network.

Once logged in, open a terminal window and on the command prompt, type

Bloc de code
languagebash
sbgrid

This will activate the SBGrid environment and allow you to access the various applications. You should see the following welcome message

Bloc de code
languagebash
********************************************************************************
                  Software Support by SBGrid (www.sbgrid.org)
********************************************************************************
 Your use of the applications contained in the /programs  directory constitutes
 acceptance of  the terms of the SBGrid License Agreement included  in the file
 /programs/share/LICENSE.  The applications  distributed by SBGrid are licensed
 exclusively to member laboratories of the SBGrid Consortium.
              Run sbgrid-accept-license to remove the above message.  
********************************************************************************
 SBGrid was developed with support from its members, Harvard Medical School,    
 HHMI, and NSF. If use of SBGrid compiled software was an important element     
 in your publication, please include the following reference in your work:      
                                                                                      
 Software used in the project was installed and configured by SBGrid.                   
 cite: eLife 2013;2:e01456, Collaboration gets the most out of software.                
********************************************************************************
 SBGrid installation last updated: 2022-01-16
 Please submit bug reports and help requests to:       <bugs@sbgrid.org>  or
                                                       <http://sbgrid.org/bugs>
            For additional information visit https://sbgrid.org/wiki
********************************************************************************

If you do not activate the SBGrid environment, you will get the following error message

Bloc de code
languagebash
SBGrid shell environment is not initialized! Please source
/programs/sbgrid.shrc or /programs/sbgrid.cshrc to use the
software.

In order to run AlphaFold2, you will need at least two text files:

  1. A modified version of the AlphaFold2 wrapper provided by SBGrid that can be downloaded here.
  2. Your protein sequence in FASTA format (a complete description of the FASTA format can be found here). The description line is mandatory. An example file can be downloaded here.

Example FASTA file:

Bloc de code
languagebash
> Rift Valley Fever Virus NSs protein
MDYFPVISVDLQSGRRVVSVEYFRGDGPPRIPYSMVGPCCVFLMHHRPSHEVRLRFSDFY
NVGEFPYRVGLGDFASNVAPPPAKPFQRLIDLIGHMTLSDFTRFPNLKEAISWPLGEPSL
AFFDLSSTRVHRNDDIRRDQIATLAMRSCKITNDLEDSFVGLHRMIATEAILRGIDLCLL
PGFDLMYEVAHVQCVRLLQAAKEDISNAVVPNSALIVLMEESLMLRSSLPSMMGRNNWIP
VIPPIPDVEMESEEESDDDGFVEVD

Content of the run_alphafold_template.sh wrapper:

Bloc de code
languagebash
# Alphafold2, version 2.1.1
# wrapper script to run AlphaFold2 for a monomer
#
# usage:
# ./run_alphafold_tempate.sh sequence.fasta
#
# adjust parameters as required

DATA_DIR="/home/sbio/afdb/"
OUTPUT_DIR=$(pwd)
MAX_TEMPLATE_DATE=$(date -I)
DB_PRESET="full_dbs"
BFD_DB="${DATA_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt"
UNICLUST30_DB="${DATA_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
UNIREF90_DB="${DATA_DIR}/uniref90/uniref90.fasta"
MGNIFY_DB="${DATA_DIR}/mgnify/mgy_clusters_2018_12.fa"
TEMPLATE_MMCIF_DIR="${DATA_DIR}/pdb_mmcif/mmcif_files"
PDB70_DB="${DATA_DIR}/pdb70/pdb70"
OBSOLETE_PATH="${DATA_DIR}/pdb_mmcif/obsolete.dat"

run_alphafold.py \
--data_dir=$DATA_DIR \
--output_dir=$OUTPUT_DIR \
--max_template_date=$MAX_TEMPLATE_DATE \
--db_preset=$DB_PRESET \
--bfd_database_path=$BFD_DB \
--uniclust30_database_path=$UNICLUST30_DB \
--uniref90_database_path=$UNIREF90_DB \
--mgnify_database_path=$MGNIFY_DB \
--template_mmcif_dir=$TEMPLATE_MMCIF_DIR \
--pdb70_database_path=$PDB70_DB \
--obsolete_pdbs_path=$OBSOLETE_PATH \
--fasta_paths=${1} \

Replace the --fasta_paths= value to point to your FASTA file in the run_alphafold_template.sh wrapper, and make the wrapper script executable.

Bloc de code
languagebash
chmod +x run_alphafold_template.sh

Then, to run AlphaFold, enter the following command in the terminal prompt:

Bloc de code
languagebash
./run_alphafold_template.sh [FASTA file location]

As a reference, it took about 60 minutes for the calculation to predict 5 models for RVFV NSs (the sequence given as example above) to take place on mazuelo. Once it is done, you can go to the results folder and analyze your results. A description of the various files produced by the program is provided on the GitHub page of AlphaFold.

Using pTM models

In order to use pTM models, you need to set the ALPHAFOLD_PTM environment variable before running AlphaFold:

Bloc de code
languagebash
export ALPHAFOLD_PTM=true

Using the multimer option

It is now possible to run predictions with multimers. A wrapper script can be downloaded here. Content of the run_alphafold_template_multimer.sh wrapper:

Bloc de code
languagebash
# AlphaFold2 2, version 2.1.1
# wrapper script to run AlphaFold2 for a multimer
#
# usage:
# ./run_alphafold_tempate_multimer.sh multiple_sequences.fasta
#
# adjust parameters as required

DATA_DIR="/home/sbio/afdb/"
OUTPUT_DIR=$(pwd)
MAX_TEMPLATE_DATE=$(date -I)
DB_PRESET="full_dbs"
BFD_DB="${DATA_DIR}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt"
UNICLUST30_DB="${DATA_DIR}/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
UNIREF90_DB="${DATA_DIR}/uniref90/uniref90.fasta"
MGNIFY_DB="${DATA_DIR}/mgnify/mgy_clusters_2018_12.fa"
TEMPLATE_MMCIF_DIR="${DATA_DIR}/pdb_mmcif/mmcif_files"
PDB70_DB="${DATA_DIR}/pdb70/pdb70"
OBSOLETE_PATH="${DATA_DIR}/pdb_mmcif/obsolete.dat"
PDB_SEQRES_DATABASE_PATH="${DATA_DIR}/pdb_seqres/pdb_seqres.txt"
UNIPROT_DATABASE_PATH="${DATA_DIR}/uniprot/uniprot.fasta"

run_alphafold.py \
--data_dir=$DATA_DIR \
--output_dir=$OUTPUT_DIR \
--max_template_date=$MAX_TEMPLATE_DATE \
--db_preset=$DB_PRESET \
--bfd_database_path=$BFD_DB \
--uniclust30_database_path=$UNICLUST30_DB \
--uniref90_database_path=$UNIREF90_DB \
--mgnify_database_path=$MGNIFY_DB \
--template_mmcif_dir=$TEMPLATE_MMCIF_DIR \
--pdb_seqres_database_path=$PDB_SEQRES_DATABASE_PATH \
--uniprot_database_path=$UNIPROT_DATABASE_PATH \
--obsolete_pdbs_path=$OBSOLETE_PATH \
--model_preset=multimer \
--fasta_paths=${1}

All protein sequences have to be in the same FASTA file.




Using AlphaFold2 via Google Colab
Ancre
google_colab
google_colab

...