NCBI BLAST+ command line applications in a Docker image.
- What is NCBI BLAST?
- How to use this image?
- Support
- License
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
With this Docker image you can run BLAST+ in an isolated container, facilitating reproducibility of BLAST results. As a user of this Docker image, you are expected to provide BLAST databases and query sequence(s) to run BLAST as well as a location outside the container to save the results.
NOTE: The commands on this page will work on modern Linux and macos distributions.
One way to provide data for the container is to make it available on the local host and use Docker bind mounts to make these accessible from the container. In the examples below, it is assumed that the following directories exist and are writable by the user.
To create them, please run the following command:
cd ; mkdir blastdb queries fasta results blastdb_customTo populate these directories with sample data used in these examples, please run the commands below:
docker run --rm ncbi/blast efetch -db protein -format fasta \
-id P01349 > $HOME/queries/P01349.fsa
docker run --rm ncbi/blast efetch -db protein -format fasta \
-id Q90523,P80049,P83981,P83982,P83983,P83977,P83984,P83985,P27950 \
> $HOME/fasta/nurse-shark-proteins.fsaFor additional documentation on the docker run command, please see its
documentation.
| Directory | Purpose | Notes |
|---|---|---|
$HOME/blastdb |
Stores NCBI provided BLAST databases | If set to a single, absolute path, the $BLASTDB environment variable could be used instead (see Configuring BLAST via environment variables). |
$HOME/queries |
Stores user provided query sequence(s) | |
$HOME/fasta |
Stores user provided FASTA sequences to create BLAST database(s) | |
$HOME/results |
Stores BLAST results | Mount with rw permissions |
$HOME/blastdb_custom |
Stores user provided BLAST databases |
The following command will download the swissprot_v5 BLAST database from
Google Cloud Platform (GCP) into $HOME/blastdb (notice the -w argument,
which sets the working directory for that command):
docker run --rm \
-v $HOME/blastdb:/blast/blastdb:rw \
-w /blast/blastdb \
ncbi/blast \
update_blastdb.pl --source gcp swissprot_v5If you have your own sequence data in a file called
$HOME/fasta/sequences.fsa and want to make a BLAST database, please run the
command below:
docker run --rm \
-v $HOME/blastdb_custom:/blast/blastdb_custom:rw \
-v $HOME/fasta:/blast/fasta:ro \
-w /blast/blastdb_custom \
ncbi/blast \
makeblastdb -in /blast/fasta/nurse-shark-proteins.fsa -dbtype prot \
-parse_seqids -out nurse-shark-proteins -title "Nurse shark proteins" \
-taxid 7801 -blastdb_version 5To verify the newly created BLAST database above, one can run the command below to display the accessions, sequence length and common name of the sequences in the database:
docker run --rm \
-v $HOME/blastdb:/blast/blastdb:ro \
-v $HOME/blastdb_custom:/blast/blastdb_custom:ro \
ncbi/blast \
blastdbcmd -entry all -db nurse-shark-proteins -outfmt "%a %l %C"One way to make the query sequence data accessible in the container is to use
Docker bind mounts. For instance, assuming your query
sequences are stored in the $HOME/queries directory on the local host, you
can use the following parameter to docker run to make that directory
accessible inside the container in /blast/queries as a read-only directory:
-v $HOME/queries:/blast/queries:ro.
The command below mounts the $HOME/blastdb path on the local machine as
/blast/blastdb on the container and blastdbcmd shows the available BLAST
databases at this location.
docker run --rm \
-v $HOME/blastdb:/blast/blastdb:ro \
ncbi/blast \
blastdbcmd -list /blast/blastdb -remove_redundant_dbsdocker run --rm ncbi/blast update_blastdb.pl --showall --source ncbiFor instructions on how to download them, please the documentation for update_blastdb.pl.
This feature is experimental.
docker run --rm ncbi/blast update_blastdb.pl --showall pretty --source gcpFor instructions on how to download them, please the documentation for update_blastdb.pl.
When running BLAST in a Docker container, note the mounts specified to the
docker run command to make the input and outputs accessible. In the examples
below, the first two mounts provide access to the BLAST databases, the third
mount provides access to the query sequence(s) and the fourth mount provides a
directory to save the results (notice the :ro and :rw options which mount
the directories are read-only and read-write respectively).
When to use: This is useful for running a few (e.g.: less than 5-10) BLAST searches on small BLAST databases where one expects the search to run in a few seconds.
In this case one can login to the container and run BLAST commands inside the container:
docker run --rm -it \
-v $HOME/blastdb:/blast/blastdb:ro -v $HOME/blastdb_custom:/blast/blastdb_custom:ro \
-v $HOME/queries:/blast/queries:ro \
-v $HOME/results:/blast/results:rw \
ncbi/blast \
/bin/bashThis will open a login shell in the container and one can run BLAST+ as if it was locally installed.
When to use: This is a more practical approach if one has many (e.g.: 10 or more) BLAST searches to run or these take a long time to execute.
In this case it may be better to start the blast container in detached mode and execute commands on it.
NOTE: Be sure to mount all required directories, as these need to be specified when the container is started.
# Start a container named 'blast' in detached mode
docker run --rm -dit --name blast \
-v $HOME/blastdb:/blast/blastdb:ro -v $HOME/blastdb_custom:/blast/blastdb_custom:ro \
-v $HOME/queries:/blast/queries:ro \
-v $HOME/results:/blast/results:rw \
ncbi/blast \
sleep infinity
# Check the container is running
docker psTo run a BLAST search in this container, one can issue the following command:
docker exec blast blastp -query /blast/queries/P01349.fsa -db nurse-shark-proteins -out /blast/results/blastp.outThe results will be stored on the local host's $HOME/results directory.
To stop the container started in these examples run the command below:
docker stop blastFor additional documentation on the docker run command, please see its
documentation.
The command below shows how to display the latest BLAST version:
# Create a new container, run the `blastn -version` command and immediately
# remove the container image
docker run --rm ncbi/blast blastn -versionAppending a tag to the image name (ncbi/blast) allows you to use a
different version of BLAST+ (see below for supported versions). For
example:
docker run --rm ncbi/blast:2.7.1 blastn -version
You may have to check with your local system administrator, or install docker yourself. In Ubuntu linux, you can run the commands below to do that:
sudo snap install docker
sudo apt install -y docker.io
sudo usermod -aG docker $USER
# Log out and log back inRun the following command: docker run --rm hello-world.
If this command fails, you may not have docker installed or permissions to run docker.
BLAST: Check out the BLAST+ Cookbook, consult the BLAST Knowledge Base, or email us at blast-help@ncbi.nlm.nih.gov. Docker: the Docker Community Forums, the Docker Community Slack, or Stack Overflow
Please email us at blast-help@ncbi.nlm.nih.gov.
National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH)
amd64
View refer to the license and copyright information for the software contained in this image.
As with all Docker images, these likely also contain other software which may be under other licenses (such as Bash, etc from the base distribution, along with any direct or indirect dependencies of the primary software being contained).
As for any pre-built image usage, it is the image user's responsibility to ensure that any use of this image complies with any relevant licenses for all software contained within.
