Installing the bioBakery Docker container and downloading its required databases

Here is the step-by-step guide to installing the bioBakery Docker container and downloading its required databases.

Step 1: Install Docker [1]

Ensure Docker is installed and running on your system.

  • Linux/Mac/Windows: Download from the official Docker website.
  • Verification: Open your terminal and run docker –version to confirm installation. [2, 3, 4]

Step 2: Pull the bioBakery Image [5]

Download the official bioBakery workflows image from Docker Hub. This image contains all the tools but does not contain the massive reference databases.

docker pull biobakery/biobakery_workflows:latest

Step 3: Create a Local Database Directory

Create a folder on your host computer. You will mount this folder inside the Docker container so that the downloaded databases persist on your hard drive after the container closes.

mkdir -p /path/to/local/biobakery_db

(Replace /path/to/local/biobakery_db with your actual desired local path).

Step 4: Run the Container and Download Databases [6]

Run the container interactively while mounting your local directory. Then, use the internal bioBakery utility to download the databases.

  1. Start the container:

docker run -it -v /path/to/local/biobakery_db:/tmp/databases biobakery/biobakery_workflows:latest /bin/bash

  1. Download all workflow databases inside the container:

biobakery_workflows_databases –install wmgx –output /tmp/databases

(Note: wmgx installs shotgun metagenomics databases. Change to wtsx for metatranscriptomics or 16s for amplicon sequencing).

  1. Exit the container once the download finishes:

exit

Step 5: Run Analyses with the Mounted Databases

Now that the databases are saved locally on your host machine, you can run bioBakery workflows on your data by mounting both your data folder and your database folder.

docker run -it \ -v /path/to/local/biobakery_db:/biobakery_databases \ -v /path/to/your/input_data:/data \ biobakery/biobakery_workflows:latest \ biobakery_wmgx –input /data –output /data/output –databases /biobakery_databases


If you want to customize this setup, let me know:

  • Which specific tool you plan to use most (e.g., MetaPhlAn, HUMAnN, or the full wmgx pipeline).
  • Your operating system (Linux, Mac, or Windows).
  • The volume of data you need to process.

I can provide the exact optimized execution command for your specific environment.

[1] https://ecogenomics.github.io [2] https://www.honeybadger.io [3] https://dev.to [4] https://adjackp.github.io [5] https://cyverse-foundational-open-science-skills-2019.readthedocs-hosted.com [6] https://www.commandprompt.com

Leave a Reply

Your email address will not be published. Required fields are marked *