In this section of the workflow we reconstruct metagenome assembled genomes (MAGs), first using CONCOCT for automated binning of the assembled contigs followed by manual refinement.
::opts_chunk$set(collapse = TRUE)
knitr::p_load(kableExtra, DT, htmlwidgets, htmltools,
pacmaninstall = FALSE, update = FALSE)
All files generated in this workflow can be downloaded from figshare.
File names and descriptions:
Binning summary data for four collections: 1) CONCOCT_5, the automatic binning, 2) VIRAL_FINAL, the bins after manual refinement deemed to be viral, 3) MICROBIAL_FINAL, the bins after manual refinement deemed to be microbial, 4) MAGS the 5 MAG bins only. Within each directory is an index.html that can be opened in a browser for closer inspection. Self-Contained profile and contig databases for each MAG are also provided.
This workflow is largely based on the Recovering Microbial Genomes from TARA Oceans Metagenomes binning section provided by Delmont and Eren.
The latest iteration of anvi’o (v6 as of this writing) ports several popular automated binning algorithms into its ecosystem. We will bin the water meatgenomic assembly using CONCOCT (Clustering cONtigs with COverage and ComposiTion) (Alneberg et al. 2014)—a program for unsupervised binning of metagenomic contigs by using nucleotide composition, coverage data in multiple samples and linkage data from paired end reads. But you can also use MetaBAT2 (Kang et al. 2019), MaxBIN2 (Wu, Simmons, and Singer 2016) and/or DASTOOL (Sieber et al. 2018) if you wish.
When we run CONCOCT, we can specify the number of bins the program generates. To my knowledge CONCOCT is the only binning tools that allows you to specify the number of clusters. The command is pretty straightforward. We will set the number of automated clusters to 5. Why 5? If you remember back to the Assembly results section, anvi’o estimated the total number of MAGs is the data set was 15 (based on the presence of signle copy genes). In my experience, shoosing a value smaller than this gives you greater control over the manual refinement.
anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C CONCOCT_5 --driver concoct --just-do-it --clusters 5
If you are interested in using MetaBAT2, MaxBIN2, and/or DASTOOL we include those commands in this Hydra script. Edit the script to suite your needs.
# /bin/sh
# ----------------Parameters---------------------- #
#$ -S /bin/sh
#$ -pe mthread 5
#$ -q sThC.q
#$ -l mres=25G,h_data=5G,h_vmem=5G
#$ -cwd
#$ -j y
#$ -N water_job_11_cluster_contigs
#$ -o hydra_logs/job_11_cluster_contigs_water.log
#$ -M scottjj@si.edu
#
# ----------------Modules------------------------- #
module load gcc/4.9.2
#
# ----------------Your Commands------------------- #
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS
#
# ----------------CALLING ANVIO------------------- #
#
export PATH=/home/scottjj/miniconda3:$PATH
export PATH=/home/scottjj/miniconda3/bin:$PATH
export PATH=/home/scottjj/miniconda3/envs:$PATH
source activate anvio-master
#
# ----------------CHECKING EVERYTHING------------------- #
#
which python
python --version
source /home/scottjj/virtual-envs/anvio-master/bin/activate
which python
python --version
which anvi-interactive
diamond --version
anvi-self-test -v
#
# ----------------BINNING CONDA------------------- #
#
source activate binning
conda activate binning
which run_MaxBin.pl
which concoct
#
# ----------------SETUP TEMP DIRECTORIES------------------- #
#
rm -r /pool/genomics/stri_istmobiome/dbs/tmp_data_WATER/
mkdir -p /pool/genomics/stri_istmobiome/dbs/tmp_data_WATER/
TMPDIR="/pool/genomics/stri_istmobiome/dbs/tmp_data_WATER/"
#
# ----------------CONCOCT------------------- #
#
anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C CONCOCT --driver concoct --just-do-it --debug
anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C CONCOCT_5 --driver concoct --just-do-it --clusters 5 --debug
#
# ----------------MetaBAT2------------------- #
#
anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C MetaBAT2 --driver metabat2 --just-do-it --debug --minContig 1500
#
# ----------------MaxBIN2------------------- #
#
anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C MaxBIN2 --driver maxbin2 --just-do-it --min-contig-length 1000 --debug
#
# ----------------DASTOOL------------------- #
#
anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C DASTOOL --driver dastool --just-do-it --search-engine diamond -S CONCOCT,MaxBIN2,MetaBAT2 --debug
anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C DASTOOL_5 --driver dastool --just-do-it --search-engine diamond -S CONCOCT_5,MaxBIN2,MetaBAT2 --debug
#
echo = `date` job $JOB_NAME done
Let’s take a look at the results of automated binning using the command anvi-export-collection
with the --list-collections
flag. Anvio won’t do anything here except show us what collections are in the PROFILE.db
.
anvi-export-collection -p 06_MERGED/WATER/PROFILE.db --list-collections
A collection is what anvi’o uses to organize contigs into bins. And here is the output of that command. The VIRSORTER collection was imported during the annotation phase and can be ignored for now.
COLLECTIONS FOUND
===============================================
* VIRSORTER (2840 bins, representing 2857 items).
* CONCOCT_5 (5 bins, representing 23716 items).
Here we can see the number of bins (which we forced to be 5) and the total number of contigs included in the collection. Remember the original assembly had 23,758 contigs (at a minimum length of 1000bp). We can take a quick look at the estimates of genome completion and redundancy using domain-specific single-copy core genes for each collection using anvi-estimate-genome-completeness
.
The command anvi-export-collection
produces two files—one is an items file, which is a two-column text file that contains the contig name and the bin it belongs to. The other is a three-column bins info file. These are important files and you will use them a lot to manipulate collections.
One thing to remember if you use the other tools is how each algorithm names bins, or else this can get confusing. CONCOCT adds the prefix Bin_
, MaxBin adds the prefix MAXBIN_
, and MetaBAT adds the prefix METABAT__
(with two underscores). DAS Tool then adds its own Bin_
prefix to the parent name. For example, DAS Tool Bin_Bin_11
is CONCOCT bin Bin_11
while DASTOOL Bin_METABAT__2
is MetaBAT2 bin METABAT__2
.
Now we can have a look at the results of the initial binning and for that we use anvi-summarize
. This command is very useful, especially during bin refinement. It produces a lot of report files will that will help you assess your bins. We need to give the command contig and profile databases, a collection name, and an output directory.
anvi-summarize -c 03_CONTIGS/WATER-contigs.db \
-p 06_MERGED/WATER/PROFILE.db \
-C CONCOCT_5 -o CONCOCT_5
The output of this command is a bunch of summary data all tied together in an interactive HTML document making it very easy to explore. You are encouraged to use this tool. Let’s just look at the bins summary file.
Messy.
We can also the estimate genome completeness of each. The output of this command is a simple table and can give you quick access to some important details.
anvi-estimate-genome-completeness -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -C CONCOCT_5
Here are the results for the 5 CONCOCT bins with the --clusters
flag equal to 5.
In truth, none of this is particularly useful right now since we know are bins are messy. These tell us just how messy the bins are but later on commands like anvi-summarize
and anvi-estimate-genome-completeness
will be indispensable.
Once I have gone through all the bins and am happy with the manual refinement, I make a new collection to have a clean slate. This involves a little text file manipulation but its pretty easy. First I run…
anvi-export-collection -p 06_MERGED/WATER/PROFILE.db -C CONCOCT_5
And I get the same two output files described above. Since I have a lot of viral bins, I decided to make two new collections, one microbial and the other viral. Since I named all viral bins with the prefix v_Bin
it was easy to parse out the viral bins. We will use anvi-import-collection
to get the collections into the PROFILE database. Learn to love this command.
anvi-import-collection collection-MICROBIAL.txt \
-p 06_MERGED/WATER/PROFILE.db \
-c 03_CONTIGS/WATER-contigs.db \
-C MICROBIAL_REFINED \
--bins-info collection-MICROBIAL-info.txt/
anvi-import-collection collection-VIRAL.txt \
-p 06_MERGED/WATER/PROFILE.db \
-c 03_CONTIGS/WATER-contigs.db \
-C VIRAL_REFINED \
--bins-info collection-VIRAL-info.txt
And running anvi-export-collection
again
anvi-export-collection -p 06_MERGED/WATER/PROFILE.db --list-collections
give us a PROFILE.db with the new collections.
COLLECTIONS FOUND
===============================================
* VIRSORTER (2840 bins, representing 2857 items).
* CONCOCT_5 (5 bins, representing 23716 items).
* CONCOCT_MANUAL (136 bins, representing 23716 items).
* VIRAL_REFINED (92 bins, representing 12144 items).
* MICROBIAL_REFINED (44 bins, representing 11572 items).
Now that we have gone through the process of refining the bins and modifying the collections it is time to a define metagenomic bins with > 70% completion or > 2 Mbp in length AND < 10% redundancy1 as metagenome-assembled genomes (MAGs), and b) rename the MAGs and all the remaining bins.
anvi-rename-bins -c 03_CONTIGS/WATER-contigs.db \
-p 06_MERGED/WATER/PROFILE.db \
--collection-to-read MICROBIAL_REFINED \
--collection-to-write MICROBIAL_FINAL \
--call-MAGs --size-for-MAG 2 \
--min-completion-for-MAG 70 \
--max-redundancy-for-MAG 10 \
--prefix WATER \
--report-file MICROBIAL.renaming_bins.txt
anvi-rename-bins -c 03_CONTIGS/WATER-contigs.db
-p 06_MERGED/WATER/PROFILE.db \
--collection-to-read VIRAL_REFINED \
--collection-to-write VIRAL_FINAL \
--call-MAGs --size-for-MAG 2 \
--min-completion-for-MAG 70 \
--max-redundancy-for-MAG 10 \
--prefix WATER_v -/
-report-file VIRAL.renaming_bins.txt
As you can see, we use our new collections plus a few criteria to rename all the bins. Anything fitting those critera will have the prefix WATER_MAG_
and those that do not will be WATER_BIN_
. Of course, we do not expect any viral bins to meet these criteria and while calling VAGs (viral assembled genomes) can be done (I think), it is beyond the scope of this study.
And now, generate summaries of the final microbial and viral collections.
anvi-summarize -c 03_CONTIGS/WATER-contigs.db \
-p 06_MERGED/WATER/PROFILE.db \
-C MICROBIAL_FINAL \
-o MICROBIAL_FINAL
anvi-summarize -c 03_CONTIGS/WATER-contigs.db \
-p 06_MERGED/WATER/PROFILE.db \
-C VIRAL_FINAL \
-o VIRAL_FINAL
Now, for a little sanity check we can summarize the MAGs. We can employ the same trick above to make a collection of just MAGs called, well, MAGS
.
anvi-summarize -c 03_CONTIGS/WATER-contigs.db
-p 06_MERGED/WATER/PROFILE.db \
-C MAGS -o MAGS-SUMMARY
anvi-estimate-genome-completeness
-c 03_CONTIGS/WATER-contigs.db \
-p 06_MERGED/WATER/PROFILE.db \
-C MAGS -o MAGS.info
And combine the bins_summary.txt
with the MAGS.info
table to check out the MAGs.
As you can see, the genomes are pretty fragmented, but you get what you get so let’s move on.
Since we only have one metagenomic assembly, we shouldn’t need to worry about having redundant MAGs. But I like to check anyway.
We need fasta files with proper deflines for each MAG.
mkdir 11_REDUNDANT-MAGs
# get each MAG name in the set:
MAGs=`grep MAG 10_SUMMARY_MAGS/MAGS.info | awk '{print $1}'`
# go through each MAG, in each SUMMARY directory, and store a
# copy of the FASTA file with proper deflines in the REDUNDANT-MAGs
# directory:
for MAG in `echo $MAGs`;
do
anvi-script-reformat-fasta
10_SUMMARY_MAGS/MAGS-SUMMARY/bin_by_bin/$MAG/$MAG-contigs.fa
--simplify-names --prefix $MAG
-o 11_REDUNDANT-MAGs/$MAG.fa;
done
This step provides a convenient naming scheme for all contigs. For example, a scaffolds in MAG 5 will be named with the prefix WATER_MAG_00005_
. This gives contigs from all MAGs a unique name.
Then, we run anvi-compute-genome-similarity
to calculate the similarity of the MAGs. If you are working with multiple assemblies, it is important to tweak the parameters but since we have a single assembly (and likely no redundant MAGS), we just used the default settings. We do need an additional file that let’s anvi’o know the name of the MAG and the location of its fasta file. We call it mag_fasta.txt
and you can find the file here. This file is two column, tab delimited.
anvi-compute-genome-similarity -f mag_fasta.txt \
-o 12_GENOME-SIMILARITY \
--program pyANI
And then we run anvi-dereplicate-genomes
to see if we have any redundant MAGs.
anvi-dereplicate-genomes --ani-dir 12_GENOME-SIMILARITY
-o 13_DERELICATED-GENOMES \
--similarity-threshold 0.90 \
--program pyANI
Inspecting the CLUSTER_REPORT.txt
file we can see that each MAG is in it’s own cluster, meaning no redundancy.
cluster | size | representative | genomes |
---|---|---|---|
cluster_000001 | 1 | WATER_MAG_00002 | WATER_MAG_00002 |
cluster_000002 | 1 | WATER_MAG_00001 | WATER_MAG_00001 |
cluster_000003 | 1 | WATER_MAG_00004 | WATER_MAG_00004 |
cluster_000004 | 1 | WATER_MAG_00003 | WATER_MAG_00003 |
cluster_000005 | 1 | WATER_MAG_00005 | WATER_MAG_00005 |
After the binning, curation of the MAGs, and renaming of scaffolds, we can start focusing on the distribution and detection of each MAG across the metagenomes. For this we will need a contigs database of the 5 MAGs. We will add HMM profiles and predict taxonomy. This is very similar to what we did in the initial assembly part of the Snakemake Workflow.
mkdir 14_NON-REDUNDANT-MAGs-CONTIGS
cat 11_REDUNDANT-MAGs/*.fa > 14_NR-MAGs-CONTIGS/NR-MAGs.fa
anvi-gen-contigs-database -f 14_NR-MAGs-CONTIGS/NR-MAGs.fa
-o 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db
anvi-run-hmms -c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db
--num-threads $NSLOTS
anvi-run-scg-taxonomy -c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db
We then use the scaffolds to recruit short reads from all the 4 metagenomes, the program anvi-profile
to profile the BAM files, and anvi-merge
to generate a merged anvi’o profile database:
mkdir 15_MAPPING_NR_MAGS
# building the Botwie2 database
bowtie2-build 14_NR-MAGs-CONTIGS/NR-MAGs.fa 15_MAPPING_NR_MAGS/NR-MAGs
# going through each metagenomic sample, and mapping short reads
# against the 5 nonredundant MAGs
for sample in `cat samples_WATER.txt`
do
bowtie2 --threads $NSLOTS -x 15_MAPPING_NR_MAGS/NR-MAGs \
-1 01_QC/$sample-QUALITY_PASSED_R1.fastq.gz \
-2 01_QC/$sample-QUALITY_PASSED_R2.fastq.gz \
--no-unal -S 15_MAPPING_NR_MAGS/$sample-in-NRMAGs.sam
# covert the resulting SAM file to a BAM file:
samtools view -F 4
-bS 15_MAPPING_NR_MAGS/$sample-in-NRMAGs.sam \
> 15_MAPPING_NR_MAGS/$sample-in-NRMAGs-RAW.bam
# sort and index the BAM file:
samtools sort 15_MAPPING_NR_MAGS/$sample-in-NRMAGs-RAW.bam \
-o 15_MAPPING_NR_MAGS/$sample-in-NRMAGs.bam \
$sample-in-NRMAGs.bam
samtools index 15_MAPPING_NR_MAGS/
# remove temporary files:
rm 15_MAPPING_NR_MAGS/$sample-in-NRMAGs.sam \
$sample-in-NRMAGs-RAW.bam
15_MAPPING_NR_MAGS/done
Then we profile each BAM file and merge resulting profiles into a single anvi’o merged profile.
mkdir 16_NR_MAGS_PROFILES
for sample in `cat samples_WATER.txt`
do
anvi-profile -c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db \
-i 15_MAPPING_NR_MAGS/$sample-in-NRMAGs.bam \
--write-buffer-size 2000 --profile-SCVs \
--num-threads $NSLOTS \
-o 16_NON_REDUNDANT_MAGS_PROFILES/$sample-in-NRMAGs
done
anvi-merge 16_NR_MAGS_PROFILES/*-in-NRMAGs/PROFILE.db \
-c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db \
-o 17_NR-MAGs-MERGED/
The anvi’o profile database in 17_NR-MAGs-MERGED
describes the distribution and detection statistics of all scaffolds in all MAGs, however it does not contain a collection that describes the scaffold-bin affiliations. Thanks to our previous naming consistency, here we can implement a simple workaround to generate a text file that describes these connections:
for split_name in `sqlite3 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db 'select split from splits_basic_info'`
do
# in this loop $split_name goes through names like this:
# WATER_MAG_00001_000000000001_split_00001,
# WATER_MAG_00001_000000000001_split_00002,
# WATER_MAG_00001_000000000001_split_00003, ...; so we can extract
# the MAG name it belongs to:
# This command depends on the name of your MAGs. Adjust accordingly
MAG=`echo $split_name | awk 'BEGIN{FS="_"}{print $1"_"$2"_"$3}'`
# print it out with a TAB character
echo -e "$split_name\t$MAG"
done > 17_NR-MAGs-MERGED/NR-MAGs-COLLECTION.txt
anvi-import-collection 17_NR-MAGs-MERGED/NR-MAGs-COLLECTION.txt
-c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db
-p 17_NR-MAGs-MERGED/PROFILE.db -C NON_REDUNDANT_MAGs_MAGs
A quick check.
anvi-export-collection -p 17_NON-REDUNDANT-MAGs-MERGED/PROFILE.db --list-collections
COLLECTIONS FOUND
===============================================
* NON_REDUNDANT_MAGs (5 bins, representing 3404 items).
anvi-split -c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db \
-p 17_NR-MAGs-MERGED/PROFILE.db \
-C NR_MAGs -o 18_NR-MAGs-SPLIT
That’s it for this section. Next we will perform a phylogenomic analysis on a MAG or two.
One last thing to do is look at the predicted taxonomy for each MAG. To do this we will run the command anvi-estimate-scg-taxonomy
, which uses 22 ribosomal genes (Parks et al. 2018) from the Genome Taxonomy Database (GTDB) to estimate taxonomy. We added these classifications earlier when we ran anvi-run-scg-taxonomy
. We can either run anvi-estimate-scg-taxonomy
on the self-contained profile and contig databases we just create OR just use the combined databases. We will do the latter, so we only need to run a single command. You can add the --debug
flag to see a more verbose output. We will keep it simple for now.
anvi-estimate-scg-taxonomy -c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db
-p 17_NR-MAGs-MERGED/PROFILE.db
-C NR_MAGs
And here is the output.
Contigs DB ...................................: 14_NON-REDUNDANT-MAGs-CONTIGS/NON-REDUNDANT-MAGs-CONTIGS.db
Profile DB ...................................: 17_NON-REDUNDANT-MAGs-MERGED/PROFILE.db
Metagenome mode ..............................: False
* 3,404 split names associated with 5 bins of in collection 'NON_REDUNDANT_MAGs'
have been successfully recovered
Estimated taxonomy for collection "NON_REDUNDANT_MAGs"
===============================================
╒═════════════════╤══════════════╤═══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│ │ total_scgs │ supporting_scgs │ taxonomy │
╞═════════════════╪══════════════╪═══════════════════╪═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ WATER_MAG_00001 │ 22 │ 22 │ Bacteria / Cyanobacteria / Cyanobacteriia / Synechococcales / Cyanobiaceae / Synechococcus / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00003 │ 22 │ 21 │ Bacteria / Cyanobacteria / Cyanobacteriia / Synechococcales / Cyanobiaceae / RCC307 / RCC307 sp000063525 │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00002 │ 20 │ 9 │ Bacteria / Campylobacterota / Campylobacteria / Campylobacterales / Arcobacteraceae / Poseidonibacter / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00004 │ 18 │ 10 │ Bacteria / Proteobacteria / Alphaproteobacteria / Rhodobacterales / Rhodobacteraceae / Aliiroseovarius / Aliiroseovarius pelagivivens │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00005 │ 16 │ 16 │ Bacteria / Proteobacteria / Alphaproteobacteria / Pelagibacterales / Pelagibacteraceae / Pelagibacter / │
╘═════════════════╧══════════════╧═══════════════════╧═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛
Heck, since we’re here we can also estimate the taxonomy of the bins. We wil leave out the MAGs since we just did that.
anvi-estimate-scg-taxonomy -p 06_MERGED/WATER/PROFILE.db \
-c 03_CONTIGS/WATER-contigs.db
-C MICROBIAL_FINAL
-o microbial-bins-gtdb.txt
Contigs DB ...................................: 03_CONTIGS/WATER-contigs.db
Profile DB ...................................: 06_MERGED/WATER/PROFILE.db
Metagenome mode ..............................: False
* 11,572 split names associated with 44 bins of in collection 'MICROBIAL_FINAL'
have been successfully recovered
Estimated taxonomy for collection "MICROBIAL_FINAL"
===============================================
╒═════════════════╤══════════════╤═══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│ │ total_scgs │ supporting_scgs │ taxonomy │
╞═════════════════╪══════════════╪═══════════════════╪═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ WATER_MAG_00001 │ 22 │ 22 │ Bacteria / Cyanobacteria / Cyanobacteriia / Synechococcales / Cyanobiaceae / Synechococcus / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00003 │ 22 │ 21 │ Bacteria / Cyanobacteria / Cyanobacteriia / Synechococcales / Cyanobiaceae / RCC307 / RCC307 sp000063525 │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00002 │ 20 │ 9 │ Bacteria / Campylobacterota / Campylobacteria / Campylobacterales / Arcobacteraceae / Poseidonibacter / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00013 │ 20 │ 6 │ Bacteria / Proteobacteria / Gammaproteobacteria / Pseudomonadales / Nitrincolaceae / ASP10-02a / ASP10-02a sp002686055 │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00004 │ 18 │ 10 │ Bacteria / Proteobacteria / Alphaproteobacteria / Rhodobacterales / Rhodobacteraceae / Aliiroseovarius / Aliiroseovarius pelagivivens │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00005 │ 16 │ 16 │ Bacteria / Proteobacteria / Alphaproteobacteria / Pelagibacterales / Pelagibacteraceae / Pelagibacter / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00007 │ 12 │ 12 │ Bacteria / Proteobacteria / Alphaproteobacteria / Puniceispirillales / Puniceispirillaceae / UBA8309 / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00012 │ 10 │ 3 │ Bacteria / Proteobacteria / Gammaproteobacteria / Pseudomonadales / Nitrincolaceae / ASP10-02a / ASP10-02a sp002312935 │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00006 │ 9 │ 8 │ Bacteria / Actinobacteriota / Acidimicrobiia / TMED189 / TMED189 / TMED189 / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00015 │ 9 │ 7 │ Bacteria / SAR324 / SAR324 / SAR324 / NAC60-12 / UBA1014 / UBA1014 sp001469005 │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00010 │ 8 │ 8 │ Bacteria / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00023 │ 7 │ 4 │ Bacteria / Proteobacteria / Alphaproteobacteria / Rhodobacterales / Rhodobacteraceae / HIMB11 / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00008 │ 6 │ 6 │ Bacteria / Actinobacteriota / Acidimicrobiia / TMED189 / TMED189 / TMED189 / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00016 │ 6 │ 6 │ Bacteria / Bacteroidota / Bacteroidia / Flavobacteriales / Flavobacteriaceae / MED-G13 / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00022 │ 6 │ 6 │ Bacteria / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00021 │ 5 │ 5 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00029 │ 5 │ 3 │ Bacteria / Proteobacteria / Alphaproteobacteria / Rhodobacterales / Rhodobacteraceae / HIMB11 / HIMB11 sp000472185 │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00011 │ 4 │ 4 │ Bacteria / Proteobacteria / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00019 │ 4 │ 4 │ Bacteria / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00017 │ 4 │ 3 │ Bacteria / Bacteroidota / Bacteroidia / Flavobacteriales / Flavobacteriaceae / UBA724 / UBA724 sp002723075 │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00009 │ 2 │ 2 │ Bacteria / Proteobacteria / Gammaproteobacteria / Chromatiales / Sedimenticolaceae / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00027 │ 2 │ 2 │ Archaea / Thermoplasmatota / Poseidoniia / Poseidoniales / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00031 │ 2 │ 2 │ Bacteria / Bacteroidota / Bacteroidia / Flavobacteriales / Flavobacteriaceae / UBA3478 / UBA3478 sp003045935 │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00014 │ 1 │ 1 │ Archaea / Thermoplasmatota / Poseidoniia / Poseidoniales / Thalassoarchaeaceae / MGIIb-N1 / MGIIb-N1 sp002505695 │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00018 │ 1 │ 1 │ Archaea / Thermoplasmatota / Poseidoniia / Poseidoniales / Thalassoarchaeaceae / MGIIb-N1 / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00020 │ 1 │ 1 │ Bacteria / Proteobacteria / Alphaproteobacteria / Rhodobacterales / Rhodobacteraceae / GCA-002705045 / GCA-002705045 sp002703515 │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00025 │ 1 │ 1 │ Archaea / Thermoplasmatota / Poseidoniia / Poseidoniales / Poseidoniaceae / MGIIa-L2 / MGIIa-L2 sp002719815 │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00026 │ 1 │ 1 │ Bacteria / SAR324 / SAR324 / SAR324 / NAC60-12 / UBA1014 / UBA1014 sp001469005 │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00028 │ 1 │ 1 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00032 │ 1 │ 1 │ Bacteria / Cyanobacteria / Cyanobacteriia / Synechococcales / Cyanobiaceae / Synechococcus / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00033 │ 1 │ 1 │ Bacteria / Proteobacteria / Alphaproteobacteria / Rhodobacterales / Rhodobacteraceae / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00024 │ 0 │ 0 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00030 │ 0 │ 0 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00034 │ 0 │ 0 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00035 │ 0 │ 0 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00036 │ 0 │ 0 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00037 │ 0 │ 0 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00038 │ 0 │ 0 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00039 │ 0 │ 0 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00040 │ 0 │ 0 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00041 │ 0 │ 0 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00042 │ 0 │ 0 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00043 │ 0 │ 0 │ / / / / / / │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00044 │ 0 │ 0 │ / / / / / / │
╘═════════════════╧══════════════╧═══════════════════╧═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛
Output file ..................................: microbial-bins-gtdb.txt
In the next section we explore WATER_MAG_00002 in a phylogenomic context. That’s all for this page.
The source code for this page can be accessed on GitHub by clicking this link.
Binning summary data can be downloaded directly from figshare at doi:10.25573/data.12809069. Summary data is included for four collections: 1) CONCOCT_5
, the automatic binning, 2) VIRAL_FINAL
, the bins after manual refinement deemed to be viral, 3) MICROBIAL_FINAL
, the bins after manual refinement deemed to be microbial, 4) MAGS
the 5 MAG bins only. Within each directory is an index.html
that can be opened in a browser for closer inspection. Individual summary files are also provided. Self-Contained profile and contigs databases for each MAG are also available for download.
This redundancy cutoff only works for bacteria or archaea. If you have eukaryotic bins, the redundancy should be set to 100.↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hypocolypse/web/, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".