Benchmarks¶

This page contains data from tests performed to evaluate the accuracy inStrain. In most cases similar tests were performed to compare inStrain’s accuracy to other leading tools. Most of these benchmarks are adapted from the inStrain publication where you can find more details on how they were performed.

Strain-level comparisons¶

This section contains a series of benchmarks evaluating the ability of inStrain to perform detailed strain-level comparisons. In all cases inStrain is benchmarked against three leading tools:

MIDAS - an integrated pipeline to estimate bacterial species abundance and strain-level genomic variation. Strain-level comparisons are based on consensus alleles called on whole genomes. The script strain_tracking.py was used for benchmarks.

StrainPhlAn - a tool for strain-level resolution of species across large sample sets, based on consensus single nucleotide polymorphisms (SNPs) called on species marker genes. Based on the MetaPhlAn2 db_v20 database.

dRep - a genome alignment program. dRep does not purport to have accuracy over 99.9% ANI and was just used for comparison purposes.

Benchmark with synthetic data¶

A straightforward in silico test. A randomly selected E. coli genome was downloaded and mutated to various chosen ANI levels using SNP Mutator. The original genome was compared to the mutated genome, and we looked for the difference between the actual ANI and the calculated ANI (the ANI reported by each program).

All four methods performed well on this test, although dRep, inStrain, and MIDAS had lower errors in the ANI calculation than StrainPhlAn overall (0.00001%, 0.002%, 0.006% and 0.03%, respectively; average discrepancy between the true and calculated ANI). This is likely because dRep, inStrain, and MIDAS compare positions from across the entire genome (99.99998%, 99.7%, and 85.8% of the genome, respectively) and StrainPhlAn does not (0.3% of the genome).

Methods for synthetic data benchmark:

For dRep, mutated genomes were compared to the reference genome using dRep on default settings. For inStrain, MIDAS, andStrainPhlAn, Illumina reads were simulated for all genomes at 20x coverage using pIRS.

For inStrain, synthetic reads were mapped back to the reference genome using Bowtie 2, profiled using “inStrain profile” under default settings, and compared using “inStrain compare” under default settings.

For StrainPhlAn, synthetic reads profiled with Metaphlan2, resulting marker genes were aligned using StrainPhlan, and the ANI of resulting nucleotide alignments was calculated using the class “Bio.Phylo.TreeConstruction.DistanceCalculator(‘identity’)” from the BioPython python package.

For MIDAS, synthetic reads were provided to the program directly using the “run_midas.py species” command, and compared using the “run_midas.py snps” command. The ANI of the resulting comparisons was calculated as “[mean(sample1_bases, sample2_bases) - count_either] / mean(sample1_bases, sample2_bases)”.

Benchmark with defined microbial communities¶

This test (schematic above) involved comparing real metagenomes derived from defined bacterial communities. The ZymoBIOMICS Microbial Community Standard, which contains cells from eight bacterial species at defined abundances, was divided into three aliquots and subjected to DNA extraction, library preparation, and metagenomic sequencing. The same community of 8 bacterial species was sequenced 3 times, so each program should report 100% ANI for all species comparisons. Deviations from this ideal either represent errors in sequence alignment, the presence of microdiversity consistent with maintenance of cultures in the laboratory, or inability of programs to handle errors and biases introduced during routine DNA extraction, library preparation, and sequencing with Illumina).

MIDAS, dRep, StrainPhlAn, and inStrain reported average ANI values of 99.97%, 99.98%, 99.990% and 99.999998%, respectively, with inStrain reporting average popANI values of 100% for 23 of the 24 comparisons and 99.99996% for one comparison. The difference in performance likely arises because the Zymo cultures contain non-fixed nucleotide variants that inStrain uses to confirm population overlap but that confuse the consensus sequences reported by dRep, StrainPhlAn, and MIDAS (conANI). We also used this data to establish a threshold for the detection of “same” versus “different” strains. The thresholds for MIDAS, dRep, StrainPhlAn, and inStrain, calculated based on the comparison with the lowest average ANI across all 24 sequence comparisons, are shown in the table below.

Program	Minimum reported ANI	Years divergence
MIDAS	99.92%	3771
dRep	99.94%	2528
StrainPhlAn	99.97%	1307
InStrain	99.99996%	2.2

Years divergence was calculated from “minimum reported ANI” using the previously reported rate of 0.9 SNSs accumulated per genome per year in the gut microbiome of healthy human adults (Zhao 2019) . This benchmark demonstrates that inStrain can be used for detection of identical microbial strains with a stringency that is substantially higher than other tools. Stringent thresholds are useful for strain tracking, as strains that have diverged for hundreds to thousands of years are clearly not linked by a recent transmission event.

We also performed an additional benchmark with this data on inStrain only. InStrain relies on representative genomes to calculate popANI, so we wanted to know whether using non-ideal reference genomes would impact it’s accuracy. By mapping reads to all 4,644 representative genomes in the Unified Human Gastrointestinal Genome (UHGG) collection we identified the 8 relevant representative genomes. These genomes had between 93.9% - 99.6% ANI to the organisms present in the Zymo samples. InStrain comparisons based on these genomes were still highly accurate (average 99.9998% ANI, lowest 99.9995% ANI, limit of detection 32.2 years), highlighting that inStrain can be used with reference genomes from databases when sample-specific reference genomes cannot be assembled.

Methods for defined microbial community benchmark:

Reads from Zymo samples are available under BioProject PRJNA648136

For dRep, reads from each sample were assembled independently using IDBA_UD, binned into genomes based off of alignment to the provided reference genomes using nucmer, and compared using dRep on default settings.

For StrainPhlAn, reads from Zymo samples profiled with Metaphlan2, resulting marker genes were aligned using StrainPhlan, and the ANI of resulting nucleotide alignments was calculated as described in the synthetic benchmark above.

For MIDAS, reads from Zymo samples were provided to MIDAS directly and the ANI of sample comparisons was calculated as described in the synthetic benchmark above.

For inStrain, reads from Zymo samples were aligned to the provided reference genomes using Bowtie 2, profiled using “inStrain profile” under default settings, and compared using “inStrain compare” under default settings. popANI values were used for inStrain.

Eukaryotic genomes were excluded from this analysis, and raw values are available in Supplemental Table S1 of the inStrain manuscript. To evaluate inStrain when using genomes from public databases, all reference genomes from the UHGG collection were downloaded and concatenated into a single .fasta file. Reads from the Zymo sample were mapped against this database and processed with inStrain as described above. The ability of each method to detect genomes was performed using all Zymo reads concatenated together.

Benchmark with true microbial communities¶

This test evaluated the stringency with which each tool can detect shared strains in genuine microbial communities. Tests like this are hard to perform because it is difficult to know the ground truth. We can never really know whether two true genuine communities actually share strains. For this test we leveraged the fact that new-born siblings share far more strains than unrelated newborns.. In this test, we compared the ability of the programs to detect strains shared by twin premature infants (presumably True Positives) vs. their detection of strains shared by unrelated infants (presumably False Positives).

All methods identified significantly more strain sharing among twin pairs than pairs of unrelated infants, as expected, and inStrain remained sensitive at substantially higher ANI thresholds than the other tools. The reduced ability of StrainPhlAn and MIDAS to identify shared strains is likely based on their reliance on consensus-based ANI (conANI) measurements. We know that microbiomes can contain multiple coexisting strains, and when two or more strains of a species are in a sample at similar abundance levels it can lead to pileups of reads from multiple strains and chimeric sequences. The popANI metric is designed to account for this complexity.

It is also worth discussing Supplemental Figure S5 from the inStrain manuscript here.

This figure was generated from genomic comparisons between genomes present in the same infant over time (longitudinal data). In cases where the same genome was detected in multiple time-points over the time-series sampling of an infant, the percentage of comparisons between genomes that exceed various popANI (a) and conANI (b) thresholds is plotted. This figure shows that the use of popANI allows greater stringency than conANI.

Note

Based on the data presented in the above tests, a threshold of 99.999% popANI was chosen as the threshold to define bacterial, bacteriophage, and plasmid strains for the work presented in the inStrain manuscript. This is likely a good threshold for a variety of communities.

Methods for true microbial community benchmark:

Twin-based comparisons were performed on three randomly chosen sets of twins that were sequenced during a previous study (Olm 2019). Reads can be found under Bioproject PRJNA294605

For StrainPhlAn, all reads sequenced from each infant were concatenated and profiled using Metaphlan2, compared using StrainPhlAn, and the ANI of resulting nucleotide alignments was calculated as described for the synthetic benchmark.

For MIDAS, all reads sequenced from each infant were concatenated and profiled with MIDAS, and the ANI of species profiled in multiple infants was calculated as described for the synthetic benchmark.

For dRep, all de-replicated bacterial genomes assembled and binned from each infant (available from (Olm 2019)) were compared in a pairwise manner using dRep under default settings.

For inStain, strain-sharing from these six infants was determined using the methods described below.

ANI values from all compared genomes and the number of genomes shared at a number of ANI thresholds are available for all methods in Supplemental Table S1 of the inStrain publication.

Species-level community profiling¶

This section contains tests evaluating the ability of inStrain and other tools to accurately profile microbial communities. Here inStrain is benchmarked against two other tools:

MIDAS - an integrated pipeline to estimate bacterial species abundance and strain-level genomic variation.

MetaPhlAn 2 - a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn 2 uses unique clade-specific marker genes.

Benchmark with defined microbial communities¶

This test evaluated the ability of programs to identify the microbial species present in metagenomes of defined bacterial communities. For this test we purchased, extracted DNA from, and sequenced a ZymoBIOMICS Microbial Community Standard. The reads used for this test are available here. This community contains 8 defined bacterial species, and we simply evaluated the ability of each program to identify those and only those 8 bacterial species. Results in the table below.

Program	True species detected	False species detected	Accuracy
MIDAS	8	15	35%
MetaPhlAn 2	8	11	42%
InStrain	8	0	100%

All programs successfully identified the 8 bacteria present in the metagenome, but MIDAS and StrainPhlAn detected an additional 15 and 11 bacterial species as well. The raw tables produced by each tool are available at the bottom of this section. Looking at these tables, you’ll notice that many of these False positive species detected are related to species that are actually present in the community. For example, MetaPhlAn2 reported the detection of Bacillus cereus thuringiensis (False Positive) as well as the detection of Bacillus subtilis (True Positive). Similarly, MIDAS reported the detection of Escherichia fergusonii (related to True Positive Escherichia coli) and Bacillus anthracis (related to True Positive Bacillus subtilis).

Importantly inStrain detected many of these same False Positives as well. However inStrain also provides a set of other metrics that properly filter out erroneous detections. Taking a look at the information reported by inStrain (at the very bottom of this page) shows that many genomes besides the 8 True Positives were detected. When using the recommended genome breadth cutoff of 50%, only the 8 True Positive genomes remain (see section “Detecting organisms in metagenomic data” in Important concepts for more info). You’ll notice that no such info is reported with MIDAS or MetaPhlAn 2. While relative abundance could conceivably be used to filter out erroneous taxa with these tools, doing so would majorly limit their ability to detect genuine low-abundance taxa.

It’s also worth noting that if one is just interested in measuring community presence / absence, as in this test, any program that accurately reports breadth should give similar results to inStrain when mapped against the UHGG genome set. One such program is coverM, a fast program for calculating genome coverage and breadth that can be run on its own or through inStrain using the command inStrain quick_profile.

Methods for defined microbial community profiling experiment:

For inStrain, all reference genomes from the UHGG collection were downloaded and concatenated into a single .fasta file, reads from the Zymo sample were mapped against this database, and inStrain profile was run on default settings.

Note

The UHGG genome database used for this section is available for download in the Tutorial section.

For MIDAS, the command run_midas.py species was used along with the default database. In cases where the same species was detected multiple times as part of multiple genomes, the species was only counted once.

For MetaPhlAn 2, the command metaphlan2.py was used along with the MetaPhlAn2 db_v20 database.

Eukaryotic genomes were excluded from this analysis.

Raw data for defined microbial community profiling experiment:

MetaPhlAn 2:

metaphlan2/S3_CON_017Z2_profile.txt¶
species	abundance	Metaphlan2_species
Lactobacillus fermentum	23.1133	k__Bacteria\|p__Firmicutes\|c__Bacilli\|o__Lactobacillales\|f__Lactobacillaceae\|g__Lactobacillus\|s__Lactobacillus_fermentum
Escherichia coli	20.0587	k__Bacteria\|p__Proteobacteria\|c__Gammaproteobacteria\|o__Enterobacteriales\|f__Enterobacteriaceae\|g__Escherichia\|s__Escherichia_coli
Salmonella enterica	18.44954	k__Bacteria\|p__Proteobacteria\|c__Gammaproteobacteria\|o__Enterobacteriales\|f__Enterobacteriaceae\|g__Salmonella\|s__Salmonella_enterica
Pseudomonas aeruginosa	14.42109	k__Bacteria\|p__Proteobacteria\|c__Gammaproteobacteria\|o__Pseudomonadales\|f__Pseudomonadaceae\|g__Pseudomonas\|s__Pseudomonas_aeruginosa
Enterococcus faecalis	12.21137	k__Bacteria\|p__Firmicutes\|c__Bacilli\|o__Lactobacillales\|f__Enterococcaceae\|g__Enterococcus\|s__Enterococcus_faecalis
Staphylococcus aureus	6.36267	k__Bacteria\|p__Firmicutes\|c__Bacilli\|o__Bacillales\|f__Staphylococcaceae\|g__Staphylococcus\|s__Staphylococcus_aureus
Bacillus subtilis	2.44228	k__Bacteria\|p__Firmicutes\|c__Bacilli\|o__Bacillales\|f__Bacillaceae\|g__Bacillus\|s__Bacillus_subtilis
Listeria monocytogenes	1.8644	k__Bacteria\|p__Firmicutes\|c__Bacilli\|o__Bacillales\|f__Listeriaceae\|g__Listeria\|s__Listeria_monocytogenes
Salmonella unclassified	0.67363	k__Bacteria\|p__Proteobacteria\|c__Gammaproteobacteria\|o__Enterobacteriales\|f__Enterobacteriaceae\|g__Salmonella\|s__Salmonella_unclassified
Saccharomyces cerevisiae	0.20426	k__Eukaryota\|p__Ascomycota\|c__Saccharomycetes\|o__Saccharomycetales\|f__Saccharomycetaceae\|g__Saccharomyces\|s__Saccharomyces_cerevisiae
Cryptococcus neoformans	0.05417	k__Eukaryota\|p__Basidiomycota\|c__Tremellomycetes\|o__Tremellales\|f__Tremellaceae\|g__Filobasidiella\|s__Cryptococcus_neoformans
Listeria unclassified	0.02341	k__Bacteria\|p__Firmicutes\|c__Bacilli\|o__Bacillales\|f__Listeriaceae\|g__Listeria\|s__Listeria_unclassified
Klebsiella oxytoca	0.0165	k__Bacteria\|p__Proteobacteria\|c__Gammaproteobacteria\|o__Enterobacteriales\|f__Enterobacteriaceae\|g__Klebsiella\|s__Klebsiella_oxytoca
Naumovozyma unclassified	0.01337	k__Eukaryota\|p__Ascomycota\|c__Saccharomycetes\|o__Saccharomycetales\|f__Saccharomycetaceae\|g__Naumovozyma\|s__Naumovozyma_unclassified
Klebsiella unclassified	0.01307	k__Bacteria\|p__Proteobacteria\|c__Gammaproteobacteria\|o__Enterobacteriales\|f__Enterobacteriaceae\|g__Klebsiella\|s__Klebsiella_unclassified
Bacillus cereus thuringiensis	0.00809	k__Bacteria\|p__Firmicutes\|c__Bacilli\|o__Bacillales\|f__Bacillaceae\|g__Bacillus\|s__Bacillus_cereus_thuringiensis
Clostridium perfringens	0.00554	k__Bacteria\|p__Firmicutes\|c__Clostridia\|o__Clostridiales\|f__Clostridiaceae\|g__Clostridium\|s__Clostridium_perfringens
Eremothecium unclassified	0.00319	k__Eukaryota\|p__Ascomycota\|c__Saccharomycetes\|o__Saccharomycetales\|f__Saccharomycetaceae\|g__Eremothecium\|s__Eremothecium_unclassified
Veillonella parvula	0.0015	k__Bacteria\|p__Firmicutes\|c__Negativicutes\|o__Selenomonadales\|f__Veillonellaceae\|g__Veillonella\|s__Veillonella_parvula
Clostridium butyricum	0.00054	k__Bacteria\|p__Firmicutes\|c__Clostridia\|o__Clostridiales\|f__Clostridiaceae\|g__Clostridium\|s__Clostridium_butyricum
Enterobacter cloacae	0.00051	k__Bacteria\|p__Proteobacteria\|c__Gammaproteobacteria\|o__Enterobacteriales\|f__Enterobacteriaceae\|g__Enterobacter\|s__Enterobacter_cloacae

MIDAS:

S3_CON_017Z2_MIDAS/species/species_profile.txt¶
species_id	count_reads	coverage	relative_abundance	species
Lactobacillus_fermentum_54035	22305	322.661072	0.202032	Lactobacillus fermentum
Salmonella_enterica_58156	18045	296.117276	0.185412	Salmonella enterica
Escherichia_coli_58110	19262	286.702733	0.179517	Escherichia coli
Pseudomonas_aeruginosa_57148	14214	214.266462	0.134162	Pseudomonas aeruginosa
Enterococcus_faecalis_56297	12382	183.37939	0.114822	Enterococcus faecalis
Staphylococcus_aureus_56630	6146	89.116402	0.0558	Staphylococcus aureus
Bacillus_subtilis_57806	3029	44.275375	0.027723	Bacillus subtilis
Salmonella_enterica_58266	3027	41.774295	0.026157	Salmonella enterica
Listeria_monocytogenes_53478	2250	33.367947	0.020893	Listeria monocytogenes
Escherichia_fergusonii_56914	2361	33.034998	0.020685	Escherichia fergusonii
Pseudomonas_aeruginosa_55861	927	12.402473	0.007766	Pseudomonas aeruginosa
Salmonella_enterica_53987	791	10.982231	0.006876	Salmonella enterica
Escherichia_coli_57907	713	9.860496	0.006174	Escherichia coli
Escherichia_albertii_56276	457	6.543769	0.004097	Escherichia albertii
Citrobacter_youngae_61659	455	6.248948	0.003913	Citrobacter youngae
Salmonella_bongori_55351	314	4.187424	0.002622	Salmonella bongori
Staphylococcus_aureus_37016	62	0.907418	0.000568	Staphylococcus aureus
Klebsiella_oxytoca_54123	29	0.418764	0.000262	Klebsiella oxytoca
Bacillus_sp_58480	17	0.233451	0.000146	Bacillus sp
Clostridium_perfringens_56840	12	0.182686	0.000114	Clostridium perfringens
Listeria_monocytogenes_56337	11	0.162597	0.000102	Listeria monocytogenes
Bacillus_subtilis_55718	9	0.127828	0.00008	Bacillus subtilis
Bacillus_anthracis_57688	2	0.031576	0.00002	Bacillus anthracis
Bacillus_cereus_58113	1	0.014684	0.000009	Bacillus cereus
Enterococcus_faecium_56947	1	0.014791	0.000009	Enterococcus faecium
Klebsiella_pneumoniae_54788	1	0.014852	0.000009	Klebsiella pneumoniae
Veillonella_parvula_57794	1	0.014925	0.000009	Veillonella parvula
Haemophilus_haemolyticus_58350	1	0.01351	0.000008	Haemophilus haemolyticus
Veillonella_parvula_58184	1	0.012646	0.000008	Veillonella parvula
Enterobacter_sp_59441	1	0.003478	0.000002	Enterobacter sp
Pseudomonas_sp_59807	1	0.003203	0.000002	Pseudomonas sp

InStrain:

S3_CON_017Z2.genomeInfo.csv¶
genome	species	breadth	relative_abundance	coverage	nucl_diversity	length	true_scaffolds	detected_scaffolds	coverage_median	coverage_std	coverage_SEM	breadth_minCov	breadth_expected	nucl_diversity_rarefied	conANI_reference	popANI_reference	iRep	iRep_GC_corrected	linked_SNV_count	SNV_distance_mean	r2_mean	d_prime_mean	consensus_divergent_sites	population_divergent_sites	SNS_count	SNV_count	filtered_read_pair_count	reads_unfiltered_pairs	reads_mean_PID	reads_unfiltered_reads	divergent_site_count	Genome	lineage	genus
GUT_GENOME142031.fna.gz	Salmonella enterica	0.890900711	0.192920699	418.6273152	0.001575425	4955431	2	1	470	167.6327653	0.075307073	0.889570251	1	0.0012784	0.988936764	0.989338515		FALSE	29019	66.11144423	0.579492141	0.962855588	48769	46998	46575	5776	7448284	7508196	0.988009952	15332041	52351	GUT_GENOME142031	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella enterica	Salmonella
GUT_GENOME143383.fna.gz	Pseudomonas aeruginosa	0.894646033	0.176078828	280.4839772	0.001768407	6750396	80	74	312	110.1119447	0.042431183	0.892788808	1	0.001509706	0.99020306	0.990485637		FALSE	38085	95.96318761	0.622403987	0.959864075	59043	57340	56922	6330	6780948	6815597	0.985681579	13897895	63252	GUT_GENOME143383	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas;s__Pseudomonas aeruginosa	Pseudomonas
GUT_GENOME144544.fna.gz	Escherichia coli	0.777878058	0.138653747	279.2311026	0.001677588	5339468	2	2	358	191.117164	0.082711711	0.772478831	1	0.001249049	0.976197599	0.976755468		FALSE	38443	87.25424655	0.594190893	0.979230955	98176	95875	95304	7777	5341780	5396883	0.949021155	11455710	103081	GUT_GENOME144544	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli_D	Escherichia
GUT_GENOME000862.fna.gz	Lactobacillus fermentum	0.862275194	0.096747331	528.5687445	0.002324426	1968193	80	71	512	531.1341113	0.380139439	0.861009566	1	0.001853701	0.992704615	0.99333307		FALSE	24523	43.65012437	0.472475022	0.941868743	12363	11298	10943	4409	3897780	3915245	0.987724733	8141064	15352	GUT_GENOME000862	d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus_H;s__Lactobacillus_H fermentum	Lactobacillus_H
GUT_GENOME103721.fna.gz	Enterococcus faecalis	0.890365837	0.064796999	247.8988054	0.001359966	2810675	1	1	263	106.7589636	0.063681687	0.889176088	1	0.001009117	0.992572379	0.992928895		FALSE	16206	70.9866099	0.521939243	0.942993799	18563	17672	17443	3088	2521061	2542269	0.991283432	5173192	20531	GUT_GENOME103721	d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus;s__Enterococcus faecalis	Enterococcus
GUT_GENOME141183.fna.gz	Staphylococcus aureus	0.941567947	0.03353288	131.1378903	0.001375635	2749621	2	2	142	41.45182417	0.024999936	0.93883157	1	0.000914462	0.99279082	0.993222751		FALSE	22344	72.09188149	0.540395915	0.915284188	18610	17495	17253	4097	1305000	1316045	0.959624337	2691675	21350	GUT_GENOME141183	d__Bacteria;p__Firmicutes;c__Bacilli;o__Staphylococcales;f__Staphylococcaceae;g__Staphylococcus;s__Staphylococcus aureus	Staphylococcus
GUT_GENOME145983.fna.gz	Escherichia fergusonii	0.336139906	0.013061031	30.24324694	0.001418115	4643861	2	1	0	73.62873041	0.034168543	0.286148746	1	0.00110964	0.96878243	0.969196326		FALSE	5682	86.23970433	0.70271658	0.99040633	41483	40933	40787	1849	507148	517498	0.971752174	1312986	42636	GUT_GENOME145983	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia fergusonii	Escherichia
GUT_GENOME141005.fna.gz	Listeria monocytogenes	0.924613578	0.012249816	43.6464116	0.000913049	3017944	1	1	47	15.93616519	0.009173661	0.920719868	1	0.000838075	0.995876101	0.995969311		FALSE	4390	69.92277904	0.693771377	0.988098539	11459	11200	11171	1341	475888	477867	0.994961092	980123	12512	GUT_GENOME141005	d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Listeriaceae;g__Listeria;s__Listeria monocytogenes_B	Listeria
GUT_GENOME000031.fna.gz	Bacillus subtilis	0.796819131	0.011738601	31.12210212	0.00098138	4055810	14	13	28	32.20366662	0.015996189	0.729697397	1	0.000859228	0.939115003	0.93935457		FALSE	1163	21.13241617	0.702766252	0.973300401	180190	179481	179281	2155	454156	593524	0.940236231	1292915	181436	GUT_GENOME000031	d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;g__Bacillus;s__Bacillus subtilis	Bacillus
GUT_GENOME140826.fna.gz	Escherichia sp000208585	0.278930014	0.007440286	17.72017163	0.001540747	4514939	29	17	0	56.92691685	0.0268084	0.206155831	0.99999984	0.001388545	0.967896852	0.968409325		FALSE	5947	105.5668404	0.766140483	0.983050761	29881	29404	29302	1691	290320	305163	0.960887088	820252	30993	GUT_GENOME140826	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia sp000208585	Escherichia
GUT_GENOME145378.fna.gz	Escherichia albertii	0.122179339	0.003167813	6.860465646	0.00235464	4965193	164	83	0	39.5967056	0.017829135	0.081533789	0.997660437	0.00218204	0.961660545	0.962826463		FALSE	12166	157.8004274	0.794119571	0.973439113	15521	15049	14939	1495	123735	140151	0.953424167	407112	16434	GUT_GENOME145378	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia albertii	Escherichia
GUT_GENOME143726.fna.gz	Salmonella bongori	0.070875018	0.003160648	7.43337583	0.001600002	4572147	84	48	0	96.31442795	0.045126398	0.038681827	0.998589302	0.001128128	0.972543099	0.973068942		FALSE	552	38.54710145	0.364521691	0.826322255	4856	4763	4731	305	122896	130428	0.968771686	341983	5036	GUT_GENOME143726	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella bongori	Salmonella
GUT_GENOME140808.fna.gz	Escherichia marmotae	0.120972135	0.002155978	5.167057447	0.001726413	4486744	47	38	0	33.4691403	0.015817374	0.074067966	0.989564186	0.001588357	0.965091296	0.965708164		FALSE	2054	74.47565725	0.741056343	0.989146608	11601	11396	11330	792	84663	96297	0.955863071	277452	12122	GUT_GENOME140808	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia marmotae	Escherichia
GUT_GENOME142492.fna.gz	Listeria monocytogenes	0.126213955	0.001176886	4.302069537	0.001059694	2941624	14	11	0	34.20030999	0.01995002	0.059155079	0.977600741	0.000372324	0.982937958	0.983231042		FALSE	206	9.058252427	0.844509747	1	2969	2918	2910	151	45956	47236	0.983439086	109110	3061	GUT_GENOME142492	d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Listeriaceae;g__Listeria;s__Listeria monocytogenes	Listeria
GUT_GENOME146010.fna.gz	Metakosakonia intermedia	0.011269041	0.0007257	1.265470484	0.001803324	6166452	5	4	0	20.17351043	0.008124545	0.0086103	0.672874189	0.001151362	0.987399943	0.988096808		FALSE	68	6.632352941	0.519849079	0.971330957	669	632	623	100	28027	28249	0.990376613	65337	723	GUT_GENOME146010	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Metakosakonia;s__Metakosakonia intermedia	Metakosakonia
GUT_GENOME143527.fna.gz	Cronobacter malonaticus	0.0056892	0.000591325	1.422193205	0.001895836	4470927	309	24	0	24.52026825	0.011677475	0.004481621	0.715151153	0.000863784	0.991316065	0.99186505		FALSE	43	3.348837209	0.323536849	0.895980567	174	163	157	52	22952	23584	0.95668831	51969	209	GUT_GENOME143527	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Cronobacter;s__Cronobacter malonaticus	Cronobacter
GUT_GENOME147796.fna.gz	Staphylococcus argenteus	0.072414548	0.000549275	2.122002981	0.002476746	2783391	89	66	0	13.35139068	0.008028467	0.046540353	0.846449939	0.001206762	0.96824147	0.969777675		FALSE	1626	68.76383764	0.880724477	0.994276953	4114	3915	3871	560	21703	24408	0.955482052	67154	4431	GUT_GENOME147796	d__Bacteria;p__Firmicutes;c__Bacilli;o__Staphylococcales;f__Staphylococcaceae;g__Staphylococcus;s__Staphylococcus argenteus	Staphylococcus
GUT_GENOME095995.fna.gz	Citrobacter portucalensis_A	0.009471768	0.000474738	0.990436461	0.001709209	5154159	10	8	0	19.06540862	0.008399463	0.005825781	0.5829526	0.001430662	0.953575116	0.954008059		FALSE	36	67.25	0.565195465	1	1394	1381	1381	49	18414	20414	0.952624755	48144	1430	GUT_GENOME095995	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Citrobacter;s__Citrobacter portucalensis_A	Citrobacter
GUT_GENOME000024.fna.gz	Lactobacillus_B murinus	0.001870138	0.000441384	2.136753757	0.014148006	2221226	144	7	0	107.2864275	0.072457344	0.001074182	0.84843695	0.009892552	0.959346186	0.964375524		FALSE	542	135.7435424	0.169030486	0.84185153	97	85	84	72	17418	17595	0.941170306	40278	156	GUT_GENOME000024	d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus_B;s__Lactobacillus_B murinus	Lactobacillus_B
GUT_GENOME078306.fna.gz	Lactobacillus_H oris	0.001220271	0.000399444	2.141072453	0.012295828	2006111	94	4	0	77.23755615	0.054789295	0.000866353	0.849013821	0.010541041	0.995972382	0.998849252		FALSE	398	16.92713568	0.302731013	0.826865848	7	2	2	40	16099	16158	0.936115413	38348	42	GUT_GENOME078306	d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus_H;s__Lactobacillus_H oris	Lactobacillus_H
GUT_GENOME225144.fna.gz		0.009928145	0.000347535	1.549658971	0.002163833	2411528	1340	22	0	23.45474186	0.016020135	0.009063133	0.745473131	0.000727385	0.979730966	0.980646047		FALSE	66	3.333333333	0.700603722	0.963204482	443	423	414	70	13562	13757	0.97127631	33858	484	GUT_GENOME225144	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Faecalicatena;s__	Faecalicatena
GUT_GENOME038289.fna.gz		0.00099285	0.000296215	1.742384732	0.010683495	1828071	235	2	0	74.35939765	0.055717981	0.000973157	0.785302607	0.008367741	0.983136594	0.988757729		FALSE	240	68.54166667	0.330322272	0.915762753	30	20	20	36	11886	11920	0.982379242	28904	56	GUT_GENOME038289	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Pasteurellaceae;g__Haemophilus_D;s__	Haemophilus_D
GUT_GENOME143493.fna.gz	Lactobacillus_G kefiri	0.00864699	0.000280612	1.173767593	0.006727598	2570721	10	3	0	19.31409016	0.0120508	0.007831655	0.645283636	0.005352071	0.992251527	0.995380718		FALSE	1002	65.8992016	0.448661201	0.961561942	156	93	82	276	11319	11437	0.990954152	28111	358	GUT_GENOME143493	d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus_G;s__Lactobacillus_G kefiri	Lactobacillus_G
GUT_GENOME212929.fna.gz		0.004865114	0.000265953	1.504297544	0.006577654	1901086	52	12	0	31.30317094	0.022765581	0.004161306	0.735071349	0.003925862	0.982176716	0.986348123		FALSE	730	96.23287671	0.844539753	0.995313875	141	108	99	114	10256	10633	0.958751213	27570	213	GUT_GENOME212929	d__Bacteria;p__Firmicutes_C;c__Negativicutes;o__Veillonellales;f__Veillonellaceae;g__F0422;s__	F0422
GUT_GENOME141398.fna.gz	Lactobacillus crispatus	0.001957446	0.000245098	1.440641732	0.016147943	1829425	63	3	0	38.97971904	0.028918934	0.001950886	0.719753764	0.01075873	0.985149902	0.995236761		FALSE	919	85.0968444	0.532244527	0.953133466	53	17	15	120	9893	10121	0.984396172	25843	135	GUT_GENOME141398	d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;s__Lactobacillus crispatus	Lactobacillus
GUT_GENOME229203.fna.gz		0.001076576	0.000237003	1.216154076	0.009783963	2095533	369	2	0	50.30343773	0.035378211	0.0010656	0.658314327	0.005923767	0.991939095	0.997313032		FALSE	176	7.352272727	0.255200979	0.773989489	18	6	4	42	9197	9229	0.996035052	24260	46	GUT_GENOME229203	d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Prevotella;s__	Prevotella
GUT_GENOME140701.fna.gz	Lactobacillus_H mucosae	0.006795042	0.00022553	1.023402847	0.003734144	2369669	12	9	0	18.31360326	0.011902826	0.005363618	0.594917575	0.001824337	0.959480724	0.960582219		FALSE	154	89.28571429	0.726658579	0.984646084	515	501	493	83	9190	10224	0.968193502	25547	576	GUT_GENOME140701	d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus_H;s__Lactobacillus_H mucosae	Lactobacillus_H
GUT_GENOME001416.fna.gz	Vagococcus teuberi	0.001203634	0.000195751	0.932135043	0.001248896	2258161	50	4	0	37.87233052	0.02525855	0.001052626	0.560920699	0.001109488	0.991165334	0.991586033		FALSE	4	6.25	1	1	21	20	19	5	7731	7736	0.976960693	21662	24	GUT_GENOME001416	d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Vagococcaceae;g__Vagococcus;s__Vagococcus teuberi	Vagococcus