Installation and Quickstart

Installation

Using pip

inStrain is written in python. To install inStrain using the PyPi python repository, simply run

$ pip install instrain

Or to install from GitHub run

$ git clone https://github.com/MrOlm/instrain.git

$ cd instrain

$ pip install .

That’s it!

Pip is a great package with many options to change the installation parameters in various ways. For details, see pip documentation

Dependencies

inStrain requires a few other programs to run. Not all dependencies are needed for all operations. There are a number of python package dependencies, but those should install automatically when inStrain is installed using pip

Essential

Optional

  • coverM This is needed for the quick_profile operation
  • Prodigal This is needed to profile on a gene by gene level

Quick Start

The functionality of inStrain is broken up into several core modules. For more details on these modules, see module_descriptions.:

$ inStrain -h

              ...::: inStrain v1.0.0 :::...

Matt Olm and Alex Crits-Christoph. MIT License. Banfield Lab, UC Berkeley. 2019

Choose one of the operations below for more detailed help. See https://instrain.readthedocs.io for documentation.
Example: inStrain profile -h

  profile         -> Create an inStrain profile (microdiversity analysis) from a mapping.
  compare         -> Compare multiple inStrain profiles (popANI, coverage_overlap, etc.)
  profile_genes   -> Calculate gene-level metrics on an inStrain profile
  genome_wide     -> Calculate genome-level metrics on an inStrain profile
  quick_profile   -> Quickly calculate coverage and breadth of a mapping using coverM
  filter_reads    -> Commands related to filtering reads from .bam files
  plot            -> Make figures from the results of "profile" or "compare"
  other           -> Other miscellaneous operations

Below is a list of brief descriptions of each of the modules. For more information see module_descriptions, for help understanding the output, see Example output and explanations, and to change the parameters see choosing_parameters

See also

module_descriptions
for more information on the modules
Example output and explanations
to view example output
choosing_parameters
for guidance changing parameters
preparing_input
for information on how to prepare data for inStrain

profile

inStrain profile is the main method of the program. It takes a .fasta file and a .bam file (consisting of reads mapping to the .fasta file) and runs a series of steps to characterize the microdiversity, SNPs, linkage, etc. Details on how to generate the mapping, how the profiling is done, explanations of the output, how to choose the parameters can be found at preparing_input and module_descriptions

To run inStrain on a mapping run the following command:

$ inStrain profile .bam_file .fasta_file -o IS_output_name

compare

inStrain is able to compare multiple read mappings to the same .fasta file. Each mapping file must first be make into an inStrain profile using the above command. The coverage overlap and popANI between all pairs is calculated:

$ inStrain compare -i IS_output_1 IS_output_2 IS_output_3

profile_genes

Once you’ve run inStrain profile, you can also calculate gene-wise microdiversity, coverage, and SNP functions using this command. It relies on having gene calls in the .fna format from the program prodigal:

$ inStrain profile_genes -i IS_output -g called_genes.fna

genome_wide

This module is able to translate scaffold-level results to genome-level results. If the .fasta file you mapped to consists of a single genome, running this module on its own will average the results among all scaffolds. If the .fasta file you mapped to consists of several genomes, by providing a scaffold to bin file or a list of the individual .fasta files making up the combined .fasta file, you can get summary results for each individual genome. Running this module is also required before generating plots.

$ inStrain genome_wide -i IS_output -s genome1.fasta genome2.fasta genome3.fasta

quick_profile

This auxiliary module is merely a quick way to calculate the coverage and breadth using the blazingly fast program coverM. This can be useful for quickly figuring out which scaffolds have any coverage, and then generating a list of these scaffolds to profile with inStrain profile, making it run faster:

$ inStrain quick_profile -b .bam_file -f .fasta_file -s scaffold_to_bin_file -o output_name

filter_reads

This auxiliary module lets you do various tasks to filter and/or characterize a mapping file, and then generate a new mapping file with those filters applied:

$ inStrain filter_reads .bam_file .fasta_file -g new_sam_file_location

plot

This method makes a number of plots from an inStrain object. It is required that you run genome_wide first before running this module:

$ inStrain plot -i IS_output

other

This module lets you do random small things, like convert IS_profile objects that are in an old format to the newest format.