Benchmarks

This page contains data from tests performed to evaluate the accuracy inStrain. In most cases similar tests were performed to compare inStrain’s accuracy to other leading tools. Most of these benchmarks are adapted from the inStrain publication where you can find more details on how they were performed.

Strain-level comparisons

This section contains a series of benchmarks evaluating the ability of inStrain to perform detailed strain-level comparisons. In all cases inStrain is benchmarked against three leading tools:

MIDAS - an integrated pipeline to estimate bacterial species abundance and strain-level genomic variation. Strain-level comparisons are based on consensus alleles called on whole genomes. The script strain_tracking.py was used for benchmarks.

StrainPhlAn - a tool for strain-level resolution of species across large sample sets, based on consensus single nucleotide polymorphisms (SNPs) called on species marker genes. Based on the MetaPhlAn2 db_v20 database.

dRep - a genome alignment program. dRep does not purport to have accuracy over 99.9% ANI and was just used for comparison purposes.

Benchmark with synthetic data

A straightforward in silico test. A randomly selected E. coli genome was downloaded and mutated to various chosen ANI levels using SNP Mutator. The original genome was compared to the mutated genome, and we looked for the difference between the actual ANI and the calculated ANI (the ANI reported by each program).

_images/Figure2_b.png

All four methods performed well on this test, although dRep, inStrain, and MIDAS had lower errors in the ANI calculation than StrainPhlAn overall (0.00001%, 0.002%, 0.006% and 0.03%, respectively; average discrepancy between the true and calculated ANI). This is likely because dRep, inStrain, and MIDAS compare positions from across the entire genome (99.99998%, 99.7%, and 85.8% of the genome, respectively) and StrainPhlAn does not (0.3% of the genome).

Methods for synthetic data benchmark:

For dRep, mutated genomes were compared to the reference genome using dRep on default settings. For inStrain, MIDAS, andStrainPhlAn, Illumina reads were simulated for all genomes at 20x coverage using pIRS.

For inStrain, synthetic reads were mapped back to the reference genome using Bowtie 2, profiled using “inStrain profile” under default settings, and compared using “inStrain compare” under default settings.

For StrainPhlAn, synthetic reads profiled with Metaphlan2, resulting marker genes were aligned using StrainPhlan, and the ANI of resulting nucleotide alignments was calculated using the class “Bio.Phylo.TreeConstruction.DistanceCalculator(‘identity’)” from the BioPython python package.

For MIDAS, synthetic reads were provided to the program directly using the “run_midas.py species” command, and compared using the “run_midas.py snps” command. The ANI of the resulting comparisons was calculated as “[mean(sample1_bases, sample2_bases) - count_either] / mean(sample1_bases, sample2_bases)”.

Benchmark with defined microbial communities

_images/ZymoExpt_1.png

This test (schematic above) involved comparing real metagenomes derived from defined bacterial communities. The ZymoBIOMICS Microbial Community Standard, which contains cells from eight bacterial species at defined abundances, was divided into three aliquots and subjected to DNA extraction, library preparation, and metagenomic sequencing. The same community of 8 bacterial species was sequenced 3 times, so each program should report 100% ANI for all species comparisons. Deviations from this ideal either represent errors in sequence alignment, the presence of microdiversity consistent with maintenance of cultures in the laboratory, or inability of programs to handle errors and biases introduced during routine DNA extraction, library preparation, and sequencing with Illumina).

_images/Figure2_c.png

MIDAS, dRep, StrainPhlAn, and inStrain reported average ANI values of 99.97%, 99.98%, 99.990% and 99.999998%, respectively, with inStrain reporting average popANI values of 100% for 23 of the 24 comparisons and 99.99996% for one comparison. The difference in performance likely arises because the Zymo cultures contain non-fixed nucleotide variants that inStrain uses to confirm population overlap but that confuse the consensus sequences reported by dRep, StrainPhlAn, and MIDAS (conANI). We also used this data to establish a threshold for the detection of “same” versus “different” strains. The thresholds for MIDAS, dRep, StrainPhlAn, and inStrain, calculated based on the comparison with the lowest average ANI across all 24 sequence comparisons, are shown in the table below.

Program Minimum reported ANI Years divergence
MIDAS 99.92% 3771
dRep 99.94% 2528
StrainPhlAn 99.97% 1307
InStrain 99.99996% 2.2

Years divergence was calculated from “minimum reported ANI” using the previously reported rate of 0.9 SNSs accumulated per genome per year in the gut microbiome of healthy human adults (Zhao 2019) . This benchmark demonstrates that inStrain can be used for detection of identical microbial strains with a stringency that is substantially higher than other tools. Stringent thresholds are useful for strain tracking, as strains that have diverged for hundreds to thousands of years are clearly not linked by a recent transmission event.

We also performed an additional benchmark with this data on inStrain only. InStrain relies on representative genomes to calculate popANI, so we wanted to know whether using non-ideal reference genomes would impact it’s accuracy. By mapping reads to all 4,644 representative genomes in the Unified Human Gastrointestinal Genome (UHGG) collection we identified the 8 relevant representative genomes. These genomes had between 93.9% - 99.6% ANI to the organisms present in the Zymo samples. InStrain comparisons based on these genomes were still highly accurate (average 99.9998% ANI, lowest 99.9995% ANI, limit of detection 32.2 years), highlighting that inStrain can be used with reference genomes from databases when sample-specific reference genomes cannot be assembled.

Methods for defined microbial community benchmark:

Reads from Zymo samples are available under BioProject PRJNA648136

For dRep, reads from each sample were assembled independently using IDBA_UD, binned into genomes based off of alignment to the provided reference genomes using nucmer, and compared using dRep on default settings.

For StrainPhlAn, reads from Zymo samples profiled with Metaphlan2, resulting marker genes were aligned using StrainPhlan, and the ANI of resulting nucleotide alignments was calculated as described in the synthetic benchmark above.

For MIDAS, reads from Zymo samples were provided to MIDAS directly and the ANI of sample comparisons was calculated as described in the synthetic benchmark above.

For inStrain, reads from Zymo samples were aligned to the provided reference genomes using Bowtie 2, profiled using “inStrain profile” under default settings, and compared using “inStrain compare” under default settings. popANI values were used for inStrain.

Eukaryotic genomes were excluded from this analysis, and raw values are available in Supplemental Table S1 of the inStrain manuscript. To evaluate inStrain when using genomes from public databases, all reference genomes from the UHGG collection were downloaded and concatenated into a single .fasta file. Reads from the Zymo sample were mapped against this database and processed with inStrain as described above. The ability of each method to detect genomes was performed using all Zymo reads concatenated together.

Benchmark with true microbial communities

This test evaluated the stringency with which each tool can detect shared strains in genuine microbial communities. Tests like this are hard to perform because it is difficult to know the ground truth. We can never really know whether two true genuine communities actually share strains. For this test we leveraged the fact that new-born siblings share far more strains than unrelated newborns.. In this test, we compared the ability of the programs to detect strains shared by twin premature infants (presumably True Positives) vs. their detection of strains shared by unrelated infants (presumably False Positives).

_images/Figure2_d.png

All methods identified significantly more strain sharing among twin pairs than pairs of unrelated infants, as expected, and inStrain remained sensitive at substantially higher ANI thresholds than the other tools. The reduced ability of StrainPhlAn and MIDAS to identify shared strains is likely based on their reliance on consensus-based ANI (conANI) measurements. We know that microbiomes can contain multiple coexisting strains, and when two or more strains of a species are in a sample at similar abundance levels it can lead to pileups of reads from multiple strains and chimeric sequences. The popANI metric is designed to account for this complexity.

It is also worth discussing Supplemental Figure S5 from the inStrain manuscript here.

_images/SupplementalFigure_FS5_v1.1.png

This figure was generated from genomic comparisons between genomes present in the same infant over time (longitudinal data). In cases where the same genome was detected in multiple time-points over the time-series sampling of an infant, the percentage of comparisons between genomes that exceed various popANI (a) and conANI (b) thresholds is plotted. This figure shows that the use of popANI allows greater stringency than conANI.

Note

Based on the data presented in the above tests, a threshold of 99.999% popANI was chosen as the threshold to define bacterial, bacteriophage, and plasmid strains for the work presented in the inStrain manuscript. This is likely a good threshold for a variety of communities.

Methods for true microbial community benchmark:

Twin-based comparisons were performed on three randomly chosen sets of twins that were sequenced during a previous study (Olm 2019). Reads can be found under Bioproject PRJNA294605

For StrainPhlAn, all reads sequenced from each infant were concatenated and profiled using Metaphlan2, compared using StrainPhlAn, and the ANI of resulting nucleotide alignments was calculated as described for the synthetic benchmark.

For MIDAS, all reads sequenced from each infant were concatenated and profiled with MIDAS, and the ANI of species profiled in multiple infants was calculated as described for the synthetic benchmark.

For dRep, all de-replicated bacterial genomes assembled and binned from each infant (available from (Olm 2019)) were compared in a pairwise manner using dRep under default settings.

For inStain, strain-sharing from these six infants was determined using the methods described below.

ANI values from all compared genomes and the number of genomes shared at a number of ANI thresholds are available for all methods in Supplemental Table S1 of the inStrain publication.

Species-level community profiling

This section contains tests evaluating the ability of inStrain and other tools to accurately profile microbial communities. Here inStrain is benchmarked against two other tools:

MIDAS - an integrated pipeline to estimate bacterial species abundance and strain-level genomic variation.

MetaPhlAn 2 - a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn 2 uses unique clade-specific marker genes.

Benchmark with defined microbial communities

This test evaluated the ability of programs to identify the microbial species present in metagenomes of defined bacterial communities. For this test we purchased, extracted DNA from, and sequenced a ZymoBIOMICS Microbial Community Standard. The reads used for this test are available here. This community contains 8 defined bacterial species, and we simply evaluated the ability of each program to identify those and only those 8 bacterial species. Results in the table below.

Program True species detected False species detected Accuracy
MIDAS 8 15 35%
MetaPhlAn 2 8 11 42%
InStrain 8 0 100%

All programs successfully identified the 8 bacteria present in the metagenome, but MIDAS and StrainPhlAn detected an additional 15 and 11 bacterial species as well. The raw tables produced by each tool are available at the bottom of this section. Looking at these tables, you’ll notice that many of these False positive species detected are related to species that are actually present in the community. For example, MetaPhlAn2 reported the detection of Bacillus cereus thuringiensis (False Positive) as well as the detection of Bacillus subtilis (True Positive). Similarly, MIDAS reported the detection of Escherichia fergusonii (related to True Positive Escherichia coli) and Bacillus anthracis (related to True Positive Bacillus subtilis).

Importantly inStrain detected many of these same False Positives as well. However inStrain also provides a set of other metrics that properly filter out erroneous detections. Taking a look at the information reported by inStrain (at the very bottom of this page) shows that many genomes besides the 8 True Positives were detected. When using the recommended genome breadth cutoff of 50%, only the 8 True Positive genomes remain (see section “Detecting organisms in metagenomic data” in Important concepts for more info). You’ll notice that no such info is reported with MIDAS or MetaPhlAn 2. While relative abundance could conceivably be used to filter out erroneous taxa with these tools, doing so would majorly limit their ability to detect genuine low-abundance taxa.

It’s also worth noting that if one is just interested in measuring community presence / absence, as in this test, any program that accurately reports breadth should give similar results to inStrain when mapped against the UHGG genome set. One such program is coverM, a fast program for calculating genome coverage and breadth that can be run on its own or through inStrain using the command inStrain quick_profile.

Methods for defined microbial community profiling experiment:

For inStrain, all reference genomes from the UHGG collection were downloaded and concatenated into a single .fasta file, reads from the Zymo sample were mapped against this database, and inStrain profile was run on default settings.

Note

The UHGG genome database used for this section is available for download in the Tutorial section.

For MIDAS, the command run_midas.py species was used along with the default database. In cases where the same species was detected multiple times as part of multiple genomes, the species was only counted once.

For MetaPhlAn 2, the command metaphlan2.py was used along with the MetaPhlAn2 db_v20 database.

Eukaryotic genomes were excluded from this analysis.

Raw data for defined microbial community profiling experiment:

MetaPhlAn 2:

metaphlan2/S3_CON_017Z2_profile.txt
species abundance Metaphlan2_species
Lactobacillus fermentum 23.1133 k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Lactobacillaceae|g__Lactobacillus|s__Lactobacillus_fermentum
Escherichia coli 20.0587 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia_coli
Salmonella enterica 18.44954 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Salmonella|s__Salmonella_enterica
Pseudomonas aeruginosa 14.42109 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pseudomonadales|f__Pseudomonadaceae|g__Pseudomonas|s__Pseudomonas_aeruginosa
Enterococcus faecalis 12.21137 k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Enterococcaceae|g__Enterococcus|s__Enterococcus_faecalis
Staphylococcus aureus 6.36267 k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Staphylococcaceae|g__Staphylococcus|s__Staphylococcus_aureus
Bacillus subtilis 2.44228 k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Bacillaceae|g__Bacillus|s__Bacillus_subtilis
Listeria monocytogenes 1.8644 k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Listeriaceae|g__Listeria|s__Listeria_monocytogenes
Salmonella unclassified 0.67363 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Salmonella|s__Salmonella_unclassified
Saccharomyces cerevisiae 0.20426 k__Eukaryota|p__Ascomycota|c__Saccharomycetes|o__Saccharomycetales|f__Saccharomycetaceae|g__Saccharomyces|s__Saccharomyces_cerevisiae
Cryptococcus neoformans 0.05417 k__Eukaryota|p__Basidiomycota|c__Tremellomycetes|o__Tremellales|f__Tremellaceae|g__Filobasidiella|s__Cryptococcus_neoformans
Listeria unclassified 0.02341 k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Listeriaceae|g__Listeria|s__Listeria_unclassified
Klebsiella oxytoca 0.0165 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Klebsiella|s__Klebsiella_oxytoca
Naumovozyma unclassified 0.01337 k__Eukaryota|p__Ascomycota|c__Saccharomycetes|o__Saccharomycetales|f__Saccharomycetaceae|g__Naumovozyma|s__Naumovozyma_unclassified
Klebsiella unclassified 0.01307 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Klebsiella|s__Klebsiella_unclassified
Bacillus cereus thuringiensis 0.00809 k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Bacillaceae|g__Bacillus|s__Bacillus_cereus_thuringiensis
Clostridium perfringens 0.00554 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Clostridiaceae|g__Clostridium|s__Clostridium_perfringens
Eremothecium unclassified 0.00319 k__Eukaryota|p__Ascomycota|c__Saccharomycetes|o__Saccharomycetales|f__Saccharomycetaceae|g__Eremothecium|s__Eremothecium_unclassified
Veillonella parvula 0.0015 k__Bacteria|p__Firmicutes|c__Negativicutes|o__Selenomonadales|f__Veillonellaceae|g__Veillonella|s__Veillonella_parvula
Clostridium butyricum 0.00054 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Clostridiaceae|g__Clostridium|s__Clostridium_butyricum
Enterobacter cloacae 0.00051 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Enterobacter|s__Enterobacter_cloacae

MIDAS:

S3_CON_017Z2_MIDAS/species/species_profile.txt
species_id count_reads coverage relative_abundance species
Lactobacillus_fermentum_54035 22305 322.661072 0.202032 Lactobacillus fermentum
Salmonella_enterica_58156 18045 296.117276 0.185412 Salmonella enterica
Escherichia_coli_58110 19262 286.702733 0.179517 Escherichia coli
Pseudomonas_aeruginosa_57148 14214 214.266462 0.134162 Pseudomonas aeruginosa
Enterococcus_faecalis_56297 12382 183.37939 0.114822 Enterococcus faecalis
Staphylococcus_aureus_56630 6146 89.116402 0.0558 Staphylococcus aureus
Bacillus_subtilis_57806 3029 44.275375 0.027723 Bacillus subtilis
Salmonella_enterica_58266 3027 41.774295 0.026157 Salmonella enterica
Listeria_monocytogenes_53478 2250 33.367947 0.020893 Listeria monocytogenes
Escherichia_fergusonii_56914 2361 33.034998 0.020685 Escherichia fergusonii
Pseudomonas_aeruginosa_55861 927 12.402473 0.007766 Pseudomonas aeruginosa
Salmonella_enterica_53987 791 10.982231 0.006876 Salmonella enterica
Escherichia_coli_57907 713 9.860496 0.006174 Escherichia coli
Escherichia_albertii_56276 457 6.543769 0.004097 Escherichia albertii
Citrobacter_youngae_61659 455 6.248948 0.003913 Citrobacter youngae
Salmonella_bongori_55351 314 4.187424 0.002622 Salmonella bongori
Staphylococcus_aureus_37016 62 0.907418 0.000568 Staphylococcus aureus
Klebsiella_oxytoca_54123 29 0.418764 0.000262 Klebsiella oxytoca
Bacillus_sp_58480 17 0.233451 0.000146 Bacillus sp
Clostridium_perfringens_56840 12 0.182686 0.000114 Clostridium perfringens
Listeria_monocytogenes_56337 11 0.162597 0.000102 Listeria monocytogenes
Bacillus_subtilis_55718 9 0.127828 0.00008 Bacillus subtilis
Bacillus_anthracis_57688 2 0.031576 0.00002 Bacillus anthracis
Bacillus_cereus_58113 1 0.014684 0.000009 Bacillus cereus
Enterococcus_faecium_56947 1 0.014791 0.000009 Enterococcus faecium
Klebsiella_pneumoniae_54788 1 0.014852 0.000009 Klebsiella pneumoniae
Veillonella_parvula_57794 1 0.014925 0.000009 Veillonella parvula
Haemophilus_haemolyticus_58350 1 0.01351 0.000008 Haemophilus haemolyticus
Veillonella_parvula_58184 1 0.012646 0.000008 Veillonella parvula
Enterobacter_sp_59441 1 0.003478 0.000002 Enterobacter sp
Pseudomonas_sp_59807 1 0.003203 0.000002 Pseudomonas sp

InStrain:

S3_CON_017Z2.genomeInfo.csv
genome species breadth relative_abundance coverage nucl_diversity length true_scaffolds detected_scaffolds coverage_median coverage_std coverage_SEM breadth_minCov breadth_expected nucl_diversity_rarefied conANI_reference popANI_reference iRep iRep_GC_corrected linked_SNV_count SNV_distance_mean r2_mean d_prime_mean consensus_divergent_sites population_divergent_sites SNS_count SNV_count filtered_read_pair_count reads_unfiltered_pairs reads_mean_PID reads_unfiltered_reads divergent_site_count Genome lineage genus
GUT_GENOME142031.fna.gz Salmonella enterica 0.890900711 0.192920699 418.6273152 0.001575425 4955431 2 1 470 167.6327653 0.075307073 0.889570251 1 0.0012784 0.988936764 0.989338515   FALSE 29019 66.11144423 0.579492141 0.962855588 48769 46998 46575 5776 7448284 7508196 0.988009952 15332041 52351 GUT_GENOME142031 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella enterica Salmonella
GUT_GENOME143383.fna.gz Pseudomonas aeruginosa 0.894646033 0.176078828 280.4839772 0.001768407 6750396 80 74 312 110.1119447 0.042431183 0.892788808 1 0.001509706 0.99020306 0.990485637   FALSE 38085 95.96318761 0.622403987 0.959864075 59043 57340 56922 6330 6780948 6815597 0.985681579 13897895 63252 GUT_GENOME143383 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas;s__Pseudomonas aeruginosa Pseudomonas
GUT_GENOME144544.fna.gz Escherichia coli 0.777878058 0.138653747 279.2311026 0.001677588 5339468 2 2 358 191.117164 0.082711711 0.772478831 1 0.001249049 0.976197599 0.976755468   FALSE 38443 87.25424655 0.594190893 0.979230955 98176 95875 95304 7777 5341780 5396883 0.949021155 11455710 103081 GUT_GENOME144544 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli_D Escherichia
GUT_GENOME000862.fna.gz Lactobacillus fermentum 0.862275194 0.096747331 528.5687445 0.002324426 1968193 80 71 512 531.1341113 0.380139439 0.861009566 1 0.001853701 0.992704615 0.99333307   FALSE 24523 43.65012437 0.472475022 0.941868743 12363 11298 10943 4409 3897780 3915245 0.987724733 8141064 15352 GUT_GENOME000862 d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus_H;s__Lactobacillus_H fermentum Lactobacillus_H
GUT_GENOME103721.fna.gz Enterococcus faecalis 0.890365837 0.064796999 247.8988054 0.001359966 2810675 1 1 263 106.7589636 0.063681687 0.889176088 1 0.001009117 0.992572379 0.992928895   FALSE 16206 70.9866099 0.521939243 0.942993799 18563 17672 17443 3088 2521061 2542269 0.991283432 5173192 20531 GUT_GENOME103721 d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus;s__Enterococcus faecalis Enterococcus
GUT_GENOME141183.fna.gz Staphylococcus aureus 0.941567947 0.03353288 131.1378903 0.001375635 2749621 2 2 142 41.45182417 0.024999936 0.93883157 1 0.000914462 0.99279082 0.993222751   FALSE 22344 72.09188149 0.540395915 0.915284188 18610 17495 17253 4097 1305000 1316045 0.959624337 2691675 21350 GUT_GENOME141183 d__Bacteria;p__Firmicutes;c__Bacilli;o__Staphylococcales;f__Staphylococcaceae;g__Staphylococcus;s__Staphylococcus aureus Staphylococcus
GUT_GENOME145983.fna.gz Escherichia fergusonii 0.336139906 0.013061031 30.24324694 0.001418115 4643861 2 1 0 73.62873041 0.034168543 0.286148746 1 0.00110964 0.96878243 0.969196326   FALSE 5682 86.23970433 0.70271658 0.99040633 41483 40933 40787 1849 507148 517498 0.971752174 1312986 42636 GUT_GENOME145983 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia fergusonii Escherichia
GUT_GENOME141005.fna.gz Listeria monocytogenes 0.924613578 0.012249816 43.6464116 0.000913049 3017944 1 1 47 15.93616519 0.009173661 0.920719868 1 0.000838075 0.995876101 0.995969311   FALSE 4390 69.92277904 0.693771377 0.988098539 11459 11200 11171 1341 475888 477867 0.994961092 980123 12512 GUT_GENOME141005 d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Listeriaceae;g__Listeria;s__Listeria monocytogenes_B Listeria
GUT_GENOME000031.fna.gz Bacillus subtilis 0.796819131 0.011738601 31.12210212 0.00098138 4055810 14 13 28 32.20366662 0.015996189 0.729697397 1 0.000859228 0.939115003 0.93935457   FALSE 1163 21.13241617 0.702766252 0.973300401 180190 179481 179281 2155 454156 593524 0.940236231 1292915 181436 GUT_GENOME000031 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;g__Bacillus;s__Bacillus subtilis Bacillus
GUT_GENOME140826.fna.gz Escherichia sp000208585 0.278930014 0.007440286 17.72017163 0.001540747 4514939 29 17 0 56.92691685 0.0268084 0.206155831 0.99999984 0.001388545 0.967896852 0.968409325   FALSE 5947 105.5668404 0.766140483 0.983050761 29881 29404 29302 1691 290320 305163 0.960887088 820252 30993 GUT_GENOME140826 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia sp000208585 Escherichia
GUT_GENOME145378.fna.gz Escherichia albertii 0.122179339 0.003167813 6.860465646 0.00235464 4965193 164 83 0 39.5967056 0.017829135 0.081533789 0.997660437 0.00218204 0.961660545 0.962826463   FALSE 12166 157.8004274 0.794119571 0.973439113 15521 15049 14939 1495 123735 140151 0.953424167 407112 16434 GUT_GENOME145378 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia albertii Escherichia
GUT_GENOME143726.fna.gz Salmonella bongori 0.070875018 0.003160648 7.43337583 0.001600002 4572147 84 48 0 96.31442795 0.045126398 0.038681827 0.998589302 0.001128128 0.972543099 0.973068942   FALSE 552 38.54710145 0.364521691 0.826322255 4856 4763 4731 305 122896 130428 0.968771686 341983 5036 GUT_GENOME143726 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella bongori Salmonella
GUT_GENOME140808.fna.gz Escherichia marmotae 0.120972135 0.002155978 5.167057447 0.001726413 4486744 47 38 0 33.4691403 0.015817374 0.074067966 0.989564186 0.001588357 0.965091296 0.965708164   FALSE 2054 74.47565725 0.741056343 0.989146608 11601 11396 11330 792 84663 96297 0.955863071 277452 12122 GUT_GENOME140808 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia marmotae Escherichia
GUT_GENOME142492.fna.gz Listeria monocytogenes 0.126213955 0.001176886 4.302069537 0.001059694 2941624 14 11 0 34.20030999 0.01995002 0.059155079 0.977600741 0.000372324 0.982937958 0.983231042   FALSE 206 9.058252427 0.844509747 1 2969 2918 2910 151 45956 47236 0.983439086 109110 3061 GUT_GENOME142492 d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Listeriaceae;g__Listeria;s__Listeria monocytogenes Listeria
GUT_GENOME146010.fna.gz Metakosakonia intermedia 0.011269041 0.0007257 1.265470484 0.001803324 6166452 5 4 0 20.17351043 0.008124545 0.0086103 0.672874189 0.001151362 0.987399943 0.988096808   FALSE 68 6.632352941 0.519849079 0.971330957 669 632 623 100 28027 28249 0.990376613 65337 723 GUT_GENOME146010 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Metakosakonia;s__Metakosakonia intermedia Metakosakonia
GUT_GENOME143527.fna.gz Cronobacter malonaticus 0.0056892 0.000591325 1.422193205 0.001895836 4470927 309 24 0 24.52026825 0.011677475 0.004481621 0.715151153 0.000863784 0.991316065 0.99186505   FALSE 43 3.348837209 0.323536849 0.895980567 174 163 157 52 22952 23584 0.95668831 51969 209 GUT_GENOME143527 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Cronobacter;s__Cronobacter malonaticus Cronobacter
GUT_GENOME147796.fna.gz Staphylococcus argenteus 0.072414548 0.000549275 2.122002981 0.002476746 2783391 89 66 0 13.35139068 0.008028467 0.046540353 0.846449939 0.001206762 0.96824147 0.969777675   FALSE 1626 68.76383764 0.880724477 0.994276953 4114 3915 3871 560 21703 24408 0.955482052 67154 4431 GUT_GENOME147796 d__Bacteria;p__Firmicutes;c__Bacilli;o__Staphylococcales;f__Staphylococcaceae;g__Staphylococcus;s__Staphylococcus argenteus Staphylococcus
GUT_GENOME095995.fna.gz Citrobacter portucalensis_A 0.009471768 0.000474738 0.990436461 0.001709209 5154159 10 8 0 19.06540862 0.008399463 0.005825781 0.5829526 0.001430662 0.953575116 0.954008059   FALSE 36 67.25 0.565195465 1 1394 1381 1381 49 18414 20414 0.952624755 48144 1430 GUT_GENOME095995 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Citrobacter;s__Citrobacter portucalensis_A Citrobacter
GUT_GENOME000024.fna.gz Lactobacillus_B murinus 0.001870138 0.000441384 2.136753757 0.014148006 2221226 144 7 0 107.2864275 0.072457344 0.001074182 0.84843695 0.009892552 0.959346186 0.964375524   FALSE 542 135.7435424 0.169030486 0.84185153 97 85 84 72 17418 17595 0.941170306 40278 156 GUT_GENOME000024 d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus_B;s__Lactobacillus_B murinus Lactobacillus_B
GUT_GENOME078306.fna.gz Lactobacillus_H oris 0.001220271 0.000399444 2.141072453 0.012295828 2006111 94 4 0 77.23755615 0.054789295 0.000866353 0.849013821 0.010541041 0.995972382 0.998849252   FALSE 398 16.92713568 0.302731013 0.826865848 7 2 2 40 16099 16158 0.936115413 38348 42 GUT_GENOME078306 d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus_H;s__Lactobacillus_H oris Lactobacillus_H
GUT_GENOME225144.fna.gz   0.009928145 0.000347535 1.549658971 0.002163833 2411528 1340 22 0 23.45474186 0.016020135 0.009063133 0.745473131 0.000727385 0.979730966 0.980646047   FALSE 66 3.333333333 0.700603722 0.963204482 443 423 414 70 13562 13757 0.97127631 33858 484 GUT_GENOME225144 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Faecalicatena;s__ Faecalicatena
GUT_GENOME038289.fna.gz   0.00099285 0.000296215 1.742384732 0.010683495 1828071 235 2 0 74.35939765 0.055717981 0.000973157 0.785302607 0.008367741 0.983136594 0.988757729   FALSE 240 68.54166667 0.330322272 0.915762753 30 20 20 36 11886 11920 0.982379242 28904 56 GUT_GENOME038289 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Pasteurellaceae;g__Haemophilus_D;s__ Haemophilus_D
GUT_GENOME143493.fna.gz Lactobacillus_G kefiri 0.00864699 0.000280612 1.173767593 0.006727598 2570721 10 3 0 19.31409016 0.0120508 0.007831655 0.645283636 0.005352071 0.992251527 0.995380718   FALSE 1002 65.8992016 0.448661201 0.961561942 156 93 82 276 11319 11437 0.990954152 28111 358 GUT_GENOME143493 d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus_G;s__Lactobacillus_G kefiri Lactobacillus_G
GUT_GENOME212929.fna.gz   0.004865114 0.000265953 1.504297544 0.006577654 1901086 52 12 0 31.30317094 0.022765581 0.004161306 0.735071349 0.003925862 0.982176716 0.986348123   FALSE 730 96.23287671 0.844539753 0.995313875 141 108 99 114 10256 10633 0.958751213 27570 213 GUT_GENOME212929 d__Bacteria;p__Firmicutes_C;c__Negativicutes;o__Veillonellales;f__Veillonellaceae;g__F0422;s__ F0422
GUT_GENOME141398.fna.gz Lactobacillus crispatus 0.001957446 0.000245098 1.440641732 0.016147943 1829425 63 3 0 38.97971904 0.028918934 0.001950886 0.719753764 0.01075873 0.985149902 0.995236761   FALSE 919 85.0968444 0.532244527 0.953133466 53 17 15 120 9893 10121 0.984396172 25843 135 GUT_GENOME141398 d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;s__Lactobacillus crispatus Lactobacillus
GUT_GENOME229203.fna.gz   0.001076576 0.000237003 1.216154076 0.009783963 2095533 369 2 0 50.30343773 0.035378211 0.0010656 0.658314327 0.005923767 0.991939095 0.997313032   FALSE 176 7.352272727 0.255200979 0.773989489 18 6 4 42 9197 9229 0.996035052 24260 46 GUT_GENOME229203 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Prevotella;s__ Prevotella
GUT_GENOME140701.fna.gz Lactobacillus_H mucosae 0.006795042 0.00022553 1.023402847 0.003734144 2369669 12 9 0 18.31360326 0.011902826 0.005363618 0.594917575 0.001824337 0.959480724 0.960582219   FALSE 154 89.28571429 0.726658579 0.984646084 515 501 493 83 9190 10224 0.968193502 25547 576 GUT_GENOME140701 d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus_H;s__Lactobacillus_H mucosae Lactobacillus_H
GUT_GENOME001416.fna.gz Vagococcus teuberi 0.001203634 0.000195751 0.932135043 0.001248896 2258161 50 4 0 37.87233052 0.02525855 0.001052626 0.560920699 0.001109488 0.991165334 0.991586033   FALSE 4 6.25 1 1 21 20 19 5 7731 7736 0.976960693 21662 24 GUT_GENOME001416 d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Vagococcaceae;g__Vagococcus;s__Vagococcus teuberi Vagococcus