No 5. Binning MAGs

In this section of the workflow we reconstruct metagenome assembled genomes (MAGs), first using CONCOCT for automated binning of the assembled contigs followed by manual refinement.

Show setup information.
knitr::opts_chunk$set(collapse = TRUE)
pacman::p_load(kableExtra, DT, htmlwidgets, htmltools,
               install = FALSE, update = FALSE)

Data Availability

All files generated in this workflow can be downloaded from figshare.

File names and descriptions:

Binning summary data for four collections: 1) CONCOCT_5, the automatic binning, 2) VIRAL_FINAL, the bins after manual refinement deemed to be viral, 3) MICROBIAL_FINAL, the bins after manual refinement deemed to be microbial, 4) MAGS the 5 MAG bins only. Within each directory is an index.html that can be opened in a browser for closer inspection. Self-Contained profile and contig databases for each MAG are also provided.

This workflow is largely based on the Recovering Microbial Genomes from TARA Oceans Metagenomes binning section provided by Delmont and Eren.

Automated Binning

The latest iteration of anvi’o (v6 as of this writing) ports several popular automated binning algorithms into its ecosystem. We will bin the water meatgenomic assembly using CONCOCT (Clustering cONtigs with COverage and ComposiTion) (Alneberg et al. 2014)—a program for unsupervised binning of metagenomic contigs by using nucleotide composition, coverage data in multiple samples and linkage data from paired end reads. But you can also use MetaBAT2 (Kang et al. 2019), MaxBIN2 (Wu, Simmons, and Singer 2016) and/or DASTOOL (Sieber et al. 2018) if you wish.

When we run CONCOCT, we can specify the number of bins the program generates. To my knowledge CONCOCT is the only binning tools that allows you to specify the number of clusters. The command is pretty straightforward. We will set the number of automated clusters to 5. Why 5? If you remember back to the Assembly results section, anvi’o estimated the total number of MAGs is the data set was 15 (based on the presence of signle copy genes). In my experience, shoosing a value smaller than this gives you greater control over the manual refinement.

anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C  CONCOCT_5 --driver concoct --just-do-it --clusters 5

If you are interested in using MetaBAT2, MaxBIN2, and/or DASTOOL we include those commands in this Hydra script. Edit the script to suite your needs.

Show/hide HYDRA CONTIG AUTO BINNING job script

# /bin/sh
# ----------------Parameters---------------------- #
#$ -S /bin/sh
#$ -pe mthread 5
#$ -q sThC.q
#$ -l mres=25G,h_data=5G,h_vmem=5G
#$ -cwd
#$ -j y
#$ -N water_job_11_cluster_contigs
#$ -o hydra_logs/job_11_cluster_contigs_water.log
#$ -M scottjj@si.edu
#
# ----------------Modules------------------------- #
module load gcc/4.9.2
#
# ----------------Your Commands------------------- #
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS
#
# ----------------CALLING ANVIO------------------- #
#
export PATH=/home/scottjj/miniconda3:$PATH
export PATH=/home/scottjj/miniconda3/bin:$PATH
export PATH=/home/scottjj/miniconda3/envs:$PATH
source activate anvio-master
#
# ----------------CHECKING EVERYTHING------------------- #
#
which python
python --version
source /home/scottjj/virtual-envs/anvio-master/bin/activate
which python
python --version
which anvi-interactive
diamond --version
anvi-self-test -v
#
# ----------------BINNING CONDA------------------- #
#
source activate binning
conda activate binning
which run_MaxBin.pl
which concoct
#
# ----------------SETUP TEMP DIRECTORIES------------------- #
#
rm -r /pool/genomics/stri_istmobiome/dbs/tmp_data_WATER/
mkdir -p /pool/genomics/stri_istmobiome/dbs/tmp_data_WATER/
TMPDIR="/pool/genomics/stri_istmobiome/dbs/tmp_data_WATER/"
#
# ----------------CONCOCT------------------- #
#
anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C  CONCOCT --driver concoct --just-do-it --debug
anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C  CONCOCT_5 --driver concoct --just-do-it --clusters 5 --debug
#
# ----------------MetaBAT2------------------- #
#
anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C  MetaBAT2 --driver metabat2 --just-do-it --debug --minContig 1500
#
# ----------------MaxBIN2------------------- #
#
anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C  MaxBIN2 --driver maxbin2 --just-do-it --min-contig-length 1000 --debug
#
# ----------------DASTOOL------------------- #
#
anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C  DASTOOL --driver dastool --just-do-it --search-engine diamond -S CONCOCT,MaxBIN2,MetaBAT2 --debug
anvi-cluster-contigs -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -T $NSLOTS -C  DASTOOL_5 --driver dastool --just-do-it --search-engine diamond -S CONCOCT_5,MaxBIN2,MetaBAT2 --debug
#
echo = `date` job $JOB_NAME done

Inspect Results

Let’s take a look at the results of automated binning using the command anvi-export-collection with the --list-collections flag. Anvio won’t do anything here except show us what collections are in the PROFILE.db.

anvi-export-collection -p 06_MERGED/WATER/PROFILE.db --list-collections

A collection is what anvi’o uses to organize contigs into bins. And here is the output of that command. The VIRSORTER collection was imported during the annotation phase and can be ignored for now.

COLLECTIONS FOUND
===============================================
* VIRSORTER (2840 bins, representing 2857 items).
* CONCOCT_5 (5 bins, representing 23716 items).

Here we can see the number of bins (which we forced to be 5) and the total number of contigs included in the collection. Remember the original assembly had 23,758 contigs (at a minimum length of 1000bp). We can take a quick look at the estimates of genome completion and redundancy using domain-specific single-copy core genes for each collection using anvi-estimate-genome-completeness.

The command anvi-export-collection produces two files—one is an items file, which is a two-column text file that contains the contig name and the bin it belongs to. The other is a three-column bins info file. These are important files and you will use them a lot to manipulate collections.

One thing to remember if you use the other tools is how each algorithm names bins, or else this can get confusing. CONCOCT adds the prefix Bin_, MaxBin adds the prefix MAXBIN_, and MetaBAT adds the prefix METABAT__ (with two underscores). DAS Tool then adds its own Bin_ prefix to the parent name. For example, DAS Tool Bin_Bin_11 is CONCOCT bin Bin_11 while DASTOOL Bin_METABAT__2 is MetaBAT2 bin METABAT__2.

Summarize Initial CONCOCT Clusters

Now we can have a look at the results of the initial binning and for that we use anvi-summarize. This command is very useful, especially during bin refinement. It produces a lot of report files will that will help you assess your bins. We need to give the command contig and profile databases, a collection name, and an output directory.

anvi-summarize -c 03_CONTIGS/WATER-contigs.db \
               -p 06_MERGED/WATER/PROFILE.db \
               -C CONCOCT_5 -o CONCOCT_5

The output of this command is a bunch of summary data all tied together in an interactive HTML document making it very easy to explore. You are encouraged to use this tool. Let’s just look at the bins summary file.

Messy.

Estimate genome completeness

We can also the estimate genome completeness of each. The output of this command is a simple table and can give you quick access to some important details.

anvi-estimate-genome-completeness -c 03_CONTIGS/WATER-contigs.db -p 06_MERGED/WATER/PROFILE.db -C CONCOCT_5

Here are the results for the 5 CONCOCT bins with the --clusters flag equal to 5.

In truth, none of this is particularly useful right now since we know are bins are messy. These tell us just how messy the bins are but later on commands like anvi-summarize and anvi-estimate-genome-completeness will be indispensable.

Manual Refinement

Once I have gone through all the bins and am happy with the manual refinement, I make a new collection to have a clean slate. This involves a little text file manipulation but its pretty easy. First I run…

anvi-export-collection -p 06_MERGED/WATER/PROFILE.db -C CONCOCT_5

And I get the same two output files described above. Since I have a lot of viral bins, I decided to make two new collections, one microbial and the other viral. Since I named all viral bins with the prefix v_Bin it was easy to parse out the viral bins. We will use anvi-import-collection to get the collections into the PROFILE database. Learn to love this command.

anvi-import-collection collection-MICROBIAL.txt \
                       -p 06_MERGED/WATER/PROFILE.db \
                       -c 03_CONTIGS/WATER-contigs.db \
                       -C MICROBIAL_REFINED \
                       --bins-info collection-MICROBIAL-info.txt/

anvi-import-collection collection-VIRAL.txt \
                       -p 06_MERGED/WATER/PROFILE.db \
                       -c 03_CONTIGS/WATER-contigs.db \
                       -C VIRAL_REFINED \
                       --bins-info collection-VIRAL-info.txt

And running anvi-export-collection again

anvi-export-collection -p 06_MERGED/WATER/PROFILE.db --list-collections

give us a PROFILE.db with the new collections.

COLLECTIONS FOUND
===============================================
* VIRSORTER (2840 bins, representing 2857 items).
* CONCOCT_5 (5 bins, representing 23716 items).
* CONCOCT_MANUAL (136 bins, representing 23716 items).
* VIRAL_REFINED (92 bins, representing 12144 items).
* MICROBIAL_REFINED (44 bins, representing 11572 items).

Identification & Curation of MAGs

Rename Bins

Now that we have gone through the process of refining the bins and modifying the collections it is time to a define metagenomic bins with > 70% completion or > 2 Mbp in length AND < 10% redundancy1 as metagenome-assembled genomes (MAGs), and b) rename the MAGs and all the remaining bins.

anvi-rename-bins -c 03_CONTIGS/WATER-contigs.db \
                 -p 06_MERGED/WATER/PROFILE.db \
                 --collection-to-read MICROBIAL_REFINED \
                 --collection-to-write MICROBIAL_FINAL \
                 --call-MAGs --size-for-MAG 2 \
                 --min-completion-for-MAG 70 \
                 --max-redundancy-for-MAG 10 \
                 --prefix WATER \
                 --report-file MICROBIAL.renaming_bins.txt

anvi-rename-bins -c 03_CONTIGS/WATER-contigs.db
                 -p 06_MERGED/WATER/PROFILE.db \
                 --collection-to-read VIRAL_REFINED \
                 --collection-to-write VIRAL_FINAL \
                 --call-MAGs --size-for-MAG 2 \
                 --min-completion-for-MAG 70 \
                 --max-redundancy-for-MAG 10 \
                 --prefix WATER_v -/
                 -report-file VIRAL.renaming_bins.txt

As you can see, we use our new collections plus a few criteria to rename all the bins. Anything fitting those critera will have the prefix WATER_MAG_ and those that do not will be WATER_BIN_. Of course, we do not expect any viral bins to meet these criteria and while calling VAGs (viral assembled genomes) can be done (I think), it is beyond the scope of this study.

And now, generate summaries of the final microbial and viral collections.

anvi-summarize -c 03_CONTIGS/WATER-contigs.db \
               -p 06_MERGED/WATER/PROFILE.db \
               -C MICROBIAL_FINAL \
               -o MICROBIAL_FINAL
anvi-summarize -c 03_CONTIGS/WATER-contigs.db \
               -p 06_MERGED/WATER/PROFILE.db \
               -C VIRAL_FINAL \
               -o VIRAL_FINAL

Summarize MAGs

Now, for a little sanity check we can summarize the MAGs. We can employ the same trick above to make a collection of just MAGs called, well, MAGS.

anvi-summarize -c 03_CONTIGS/WATER-contigs.db
               -p 06_MERGED/WATER/PROFILE.db \
               -C MAGS -o MAGS-SUMMARY
anvi-estimate-genome-completeness
               -c 03_CONTIGS/WATER-contigs.db \
               -p 06_MERGED/WATER/PROFILE.db \
               -C MAGS -o MAGS.info

And combine the bins_summary.txt with the MAGS.info table to check out the MAGs.

As you can see, the genomes are pretty fragmented, but you get what you get so let’s move on.

Check for Redundant MAGs

Since we only have one metagenomic assembly, we shouldn’t need to worry about having redundant MAGs. But I like to check anyway.

We need fasta files with proper deflines for each MAG.

mkdir 11_REDUNDANT-MAGs
# get each MAG name in the set:
MAGs=`grep MAG 10_SUMMARY_MAGS/MAGS.info | awk '{print $1}'`
# go through each MAG, in each SUMMARY directory, and store a
# copy of the FASTA file with proper deflines in the REDUNDANT-MAGs
# directory:
for MAG in `echo $MAGs`;
do
    anvi-script-reformat-fasta
    10_SUMMARY_MAGS/MAGS-SUMMARY/bin_by_bin/$MAG/$MAG-contigs.fa
    --simplify-names --prefix $MAG
    -o 11_REDUNDANT-MAGs/$MAG.fa;
done

This step provides a convenient naming scheme for all contigs. For example, a scaffolds in MAG 5 will be named with the prefix WATER_MAG_00005_. This gives contigs from all MAGs a unique name.

Then, we run anvi-compute-genome-similarity to calculate the similarity of the MAGs. If you are working with multiple assemblies, it is important to tweak the parameters but since we have a single assembly (and likely no redundant MAGS), we just used the default settings. We do need an additional file that let’s anvi’o know the name of the MAG and the location of its fasta file. We call it mag_fasta.txt and you can find the file here. This file is two column, tab delimited.

anvi-compute-genome-similarity -f mag_fasta.txt \
                               -o 12_GENOME-SIMILARITY \
                               --program pyANI

And then we run anvi-dereplicate-genomes to see if we have any redundant MAGs.

anvi-dereplicate-genomes --ani-dir 12_GENOME-SIMILARITY
                         -o 13_DERELICATED-GENOMES \
                         --similarity-threshold 0.90 \
                         --program pyANI

Inspecting the CLUSTER_REPORT.txt file we can see that each MAG is in it’s own cluster, meaning no redundancy.

cluster size representative genomes
cluster_000001 1 WATER_MAG_00002 WATER_MAG_00002
cluster_000002 1 WATER_MAG_00001 WATER_MAG_00001
cluster_000003 1 WATER_MAG_00004 WATER_MAG_00004
cluster_000004 1 WATER_MAG_00003 WATER_MAG_00003
cluster_000005 1 WATER_MAG_00005 WATER_MAG_00005

Profile Nonredundant MAGs

After the binning, curation of the MAGs, and renaming of scaffolds, we can start focusing on the distribution and detection of each MAG across the metagenomes. For this we will need a contigs database of the 5 MAGs. We will add HMM profiles and predict taxonomy. This is very similar to what we did in the initial assembly part of the Snakemake Workflow.

mkdir 14_NON-REDUNDANT-MAGs-CONTIGS
cat 11_REDUNDANT-MAGs/*.fa > 14_NR-MAGs-CONTIGS/NR-MAGs.fa
anvi-gen-contigs-database -f 14_NR-MAGs-CONTIGS/NR-MAGs.fa
                          -o 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db
anvi-run-hmms -c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db
              --num-threads $NSLOTS
anvi-run-scg-taxonomy -c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db

We then use the scaffolds to recruit short reads from all the 4 metagenomes, the program anvi-profile to profile the BAM files, and anvi-merge to generate a merged anvi’o profile database:

mkdir 15_MAPPING_NR_MAGS

# building the Botwie2 database
bowtie2-build 14_NR-MAGs-CONTIGS/NR-MAGs.fa 15_MAPPING_NR_MAGS/NR-MAGs
# going through each metagenomic sample, and mapping short reads
# against the 5 nonredundant MAGs
for sample in `cat samples_WATER.txt`

do
    bowtie2 --threads $NSLOTS -x 15_MAPPING_NR_MAGS/NR-MAGs \
            -1 01_QC/$sample-QUALITY_PASSED_R1.fastq.gz \
            -2 01_QC/$sample-QUALITY_PASSED_R2.fastq.gz \
            --no-unal -S 15_MAPPING_NR_MAGS/$sample-in-NRMAGs.sam
    # covert the resulting SAM file to a BAM file:
    samtools view -F 4
                  -bS 15_MAPPING_NR_MAGS/$sample-in-NRMAGs.sam \
                  > 15_MAPPING_NR_MAGS/$sample-in-NRMAGs-RAW.bam

    # sort and index the BAM file:
    samtools sort 15_MAPPING_NR_MAGS/$sample-in-NRMAGs-RAW.bam \
                  -o 15_MAPPING_NR_MAGS/$sample-in-NRMAGs.bam \
    samtools index 15_MAPPING_NR_MAGS/$sample-in-NRMAGs.bam

    # remove temporary files:
    rm 15_MAPPING_NR_MAGS/$sample-in-NRMAGs.sam \
       15_MAPPING_NR_MAGS/$sample-in-NRMAGs-RAW.bam
done

Then we profile each BAM file and merge resulting profiles into a single anvi’o merged profile.

mkdir 16_NR_MAGS_PROFILES


for sample in `cat samples_WATER.txt`
do
    anvi-profile -c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db \
                 -i 15_MAPPING_NR_MAGS/$sample-in-NRMAGs.bam \
                 --write-buffer-size 2000  --profile-SCVs \
                 --num-threads $NSLOTS \
                 -o 16_NON_REDUNDANT_MAGS_PROFILES/$sample-in-NRMAGs
done
anvi-merge 16_NR_MAGS_PROFILES/*-in-NRMAGs/PROFILE.db \
           -c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db \
           -o 17_NR-MAGs-MERGED/

The anvi’o profile database in 17_NR-MAGs-MERGED describes the distribution and detection statistics of all scaffolds in all MAGs, however it does not contain a collection that describes the scaffold-bin affiliations. Thanks to our previous naming consistency, here we can implement a simple workaround to generate a text file that describes these connections:

for split_name in `sqlite3 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db 'select split from splits_basic_info'`

do
    # in this loop $split_name goes through names like this:
    # WATER_MAG_00001_000000000001_split_00001,
    # WATER_MAG_00001_000000000001_split_00002,
    # WATER_MAG_00001_000000000001_split_00003, ...; so we can extract
    # the MAG name it belongs to:

    # This command depends on the name of your MAGs. Adjust accordingly
    MAG=`echo $split_name | awk 'BEGIN{FS="_"}{print $1"_"$2"_"$3}'`

    # print it out with a TAB character
    echo -e "$split_name\t$MAG"

done > 17_NR-MAGs-MERGED/NR-MAGs-COLLECTION.txt

anvi-import-collection 17_NR-MAGs-MERGED/NR-MAGs-COLLECTION.txt
                    -c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db
                    -p 17_NR-MAGs-MERGED/PROFILE.db -C NON_REDUNDANT_MAGs_MAGs

A quick check.

anvi-export-collection -p 17_NON-REDUNDANT-MAGs-MERGED/PROFILE.db --list-collections
COLLECTIONS FOUND
===============================================
* NON_REDUNDANT_MAGs (5 bins, representing 3404 items).

Creating Self-Contained Profiles for MAGs

anvi-split -c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db \
                               -p 17_NR-MAGs-MERGED/PROFILE.db \
                               -C NR_MAGs -o 18_NR-MAGs-SPLIT

That’s it for this section. Next we will perform a phylogenomic analysis on a MAG or two.

Estimate MAG Taxonomy

One last thing to do is look at the predicted taxonomy for each MAG. To do this we will run the command anvi-estimate-scg-taxonomy, which uses 22 ribosomal genes (Parks et al. 2018) from the Genome Taxonomy Database (GTDB) to estimate taxonomy. We added these classifications earlier when we ran anvi-run-scg-taxonomy. We can either run anvi-estimate-scg-taxonomy on the self-contained profile and contig databases we just create OR just use the combined databases. We will do the latter, so we only need to run a single command. You can add the --debug flag to see a more verbose output. We will keep it simple for now.

anvi-estimate-scg-taxonomy -c 14_NR-MAGs-CONTIGS/NR-MAGs-CONTIGS.db
                           -p 17_NR-MAGs-MERGED/PROFILE.db
                           -C NR_MAGs

And here is the output.

Contigs DB ...................................: 14_NON-REDUNDANT-MAGs-CONTIGS/NON-REDUNDANT-MAGs-CONTIGS.db
Profile DB ...................................: 17_NON-REDUNDANT-MAGs-MERGED/PROFILE.db
Metagenome mode ..............................: False

* 3,404 split names associated with 5 bins of in collection 'NON_REDUNDANT_MAGs'
have been successfully recovered

Estimated taxonomy for collection "NON_REDUNDANT_MAGs"
===============================================
╒═════════════════╤══════════════╤═══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│                 │   total_scgs │   supporting_scgs │ taxonomy                                                                                                                              │
╞═════════════════╪══════════════╪═══════════════════╪═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ WATER_MAG_00001 │           22 │                22 │ Bacteria / Cyanobacteria / Cyanobacteriia / Synechococcales / Cyanobiaceae / Synechococcus /                                          │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00003 │           22 │                21 │ Bacteria / Cyanobacteria / Cyanobacteriia / Synechococcales / Cyanobiaceae / RCC307 / RCC307 sp000063525                              │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00002 │           20 │                 9 │ Bacteria / Campylobacterota / Campylobacteria / Campylobacterales / Arcobacteraceae / Poseidonibacter /                               │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00004 │           18 │                10 │ Bacteria / Proteobacteria / Alphaproteobacteria / Rhodobacterales / Rhodobacteraceae / Aliiroseovarius / Aliiroseovarius pelagivivens │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00005 │           16 │                16 │ Bacteria / Proteobacteria / Alphaproteobacteria / Pelagibacterales / Pelagibacteraceae / Pelagibacter /                               │
╘═════════════════╧══════════════╧═══════════════════╧═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛

Estimate Bin Taxonomy

Heck, since we’re here we can also estimate the taxonomy of the bins. We wil leave out the MAGs since we just did that.

anvi-estimate-scg-taxonomy -p 06_MERGED/WATER/PROFILE.db \
                           -c 03_CONTIGS/WATER-contigs.db
                           -C MICROBIAL_FINAL
                           -o microbial-bins-gtdb.txt
Contigs DB ...................................: 03_CONTIGS/WATER-contigs.db
Profile DB ...................................: 06_MERGED/WATER/PROFILE.db
Metagenome mode ..............................: False

* 11,572 split names associated with 44 bins of in collection 'MICROBIAL_FINAL'
have been successfully recovered

Estimated taxonomy for collection "MICROBIAL_FINAL"
===============================================
╒═════════════════╤══════════════╤═══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│                 │   total_scgs │   supporting_scgs │ taxonomy                                                                                                                              │
╞═════════════════╪══════════════╪═══════════════════╪═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ WATER_MAG_00001 │           22 │                22 │ Bacteria / Cyanobacteria / Cyanobacteriia / Synechococcales / Cyanobiaceae / Synechococcus /                                          │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00003 │           22 │                21 │ Bacteria / Cyanobacteria / Cyanobacteriia / Synechococcales / Cyanobiaceae / RCC307 / RCC307 sp000063525                              │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00002 │           20 │                 9 │ Bacteria / Campylobacterota / Campylobacteria / Campylobacterales / Arcobacteraceae / Poseidonibacter /                               │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00013 │           20 │                 6 │ Bacteria / Proteobacteria / Gammaproteobacteria / Pseudomonadales / Nitrincolaceae / ASP10-02a / ASP10-02a sp002686055                │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00004 │           18 │                10 │ Bacteria / Proteobacteria / Alphaproteobacteria / Rhodobacterales / Rhodobacteraceae / Aliiroseovarius / Aliiroseovarius pelagivivens │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_MAG_00005 │           16 │                16 │ Bacteria / Proteobacteria / Alphaproteobacteria / Pelagibacterales / Pelagibacteraceae / Pelagibacter /                               │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00007 │           12 │                12 │ Bacteria / Proteobacteria / Alphaproteobacteria / Puniceispirillales / Puniceispirillaceae / UBA8309 /                                │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00012 │           10 │                 3 │ Bacteria / Proteobacteria / Gammaproteobacteria / Pseudomonadales / Nitrincolaceae / ASP10-02a / ASP10-02a sp002312935                │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00006 │            9 │                 8 │ Bacteria / Actinobacteriota / Acidimicrobiia / TMED189 / TMED189 / TMED189 /                                                          │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00015 │            9 │                 7 │ Bacteria / SAR324 / SAR324 / SAR324 / NAC60-12 / UBA1014 / UBA1014 sp001469005                                                        │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00010 │            8 │                 8 │ Bacteria /  /  /  /  /  /                                                                                                             │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00023 │            7 │                 4 │ Bacteria / Proteobacteria / Alphaproteobacteria / Rhodobacterales / Rhodobacteraceae / HIMB11 /                                       │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00008 │            6 │                 6 │ Bacteria / Actinobacteriota / Acidimicrobiia / TMED189 / TMED189 / TMED189 /                                                          │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00016 │            6 │                 6 │ Bacteria / Bacteroidota / Bacteroidia / Flavobacteriales / Flavobacteriaceae / MED-G13 /                                              │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00022 │            6 │                 6 │ Bacteria /  /  /  /  /  /                                                                                                             │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00021 │            5 │                 5 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00029 │            5 │                 3 │ Bacteria / Proteobacteria / Alphaproteobacteria / Rhodobacterales / Rhodobacteraceae / HIMB11 / HIMB11 sp000472185                    │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00011 │            4 │                 4 │ Bacteria / Proteobacteria /  /  /  /  /                                                                                               │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00019 │            4 │                 4 │ Bacteria /  /  /  /  /  /                                                                                                             │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00017 │            4 │                 3 │ Bacteria / Bacteroidota / Bacteroidia / Flavobacteriales / Flavobacteriaceae / UBA724 / UBA724 sp002723075                            │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00009 │            2 │                 2 │ Bacteria / Proteobacteria / Gammaproteobacteria / Chromatiales / Sedimenticolaceae /  /                                               │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00027 │            2 │                 2 │ Archaea / Thermoplasmatota / Poseidoniia / Poseidoniales /  /  /                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00031 │            2 │                 2 │ Bacteria / Bacteroidota / Bacteroidia / Flavobacteriales / Flavobacteriaceae / UBA3478 / UBA3478 sp003045935                          │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00014 │            1 │                 1 │ Archaea / Thermoplasmatota / Poseidoniia / Poseidoniales / Thalassoarchaeaceae / MGIIb-N1 / MGIIb-N1 sp002505695                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00018 │            1 │                 1 │ Archaea / Thermoplasmatota / Poseidoniia / Poseidoniales / Thalassoarchaeaceae / MGIIb-N1 /                                           │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00020 │            1 │                 1 │ Bacteria / Proteobacteria / Alphaproteobacteria / Rhodobacterales / Rhodobacteraceae / GCA-002705045 / GCA-002705045 sp002703515      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00025 │            1 │                 1 │ Archaea / Thermoplasmatota / Poseidoniia / Poseidoniales / Poseidoniaceae / MGIIa-L2 / MGIIa-L2 sp002719815                           │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00026 │            1 │                 1 │ Bacteria / SAR324 / SAR324 / SAR324 / NAC60-12 / UBA1014 / UBA1014 sp001469005                                                        │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00028 │            1 │                 1 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00032 │            1 │                 1 │ Bacteria / Cyanobacteria / Cyanobacteriia / Synechococcales / Cyanobiaceae / Synechococcus /                                          │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00033 │            1 │                 1 │ Bacteria / Proteobacteria / Alphaproteobacteria / Rhodobacterales / Rhodobacteraceae /  /                                             │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00024 │            0 │                 0 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00030 │            0 │                 0 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00034 │            0 │                 0 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00035 │            0 │                 0 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00036 │            0 │                 0 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00037 │            0 │                 0 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00038 │            0 │                 0 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00039 │            0 │                 0 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00040 │            0 │                 0 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00041 │            0 │                 0 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00042 │            0 │                 0 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00043 │            0 │                 0 │ /  /  /  /  /  /                                                                                                                      │
├─────────────────┼──────────────┼───────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ WATER_Bin_00044 │            0 │                 0 │ /  /  /  /  /  /                                                                                                                      │
╘═════════════════╧══════════════╧═══════════════════╧═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛

Output file ..................................: microbial-bins-gtdb.txt

In the next section we explore WATER_MAG_00002 in a phylogenomic context. That’s all for this page.


Previous

Next

Source Code

The source code for this page can be accessed on GitHub by clicking this link.

Data Availability

Binning summary data can be downloaded directly from figshare at doi:10.25573/data.12809069. Summary data is included for four collections: 1) CONCOCT_5, the automatic binning, 2) VIRAL_FINAL, the bins after manual refinement deemed to be viral, 3) MICROBIAL_FINAL, the bins after manual refinement deemed to be microbial, 4) MAGS the 5 MAG bins only. Within each directory is an index.html that can be opened in a browser for closer inspection. Individual summary files are also provided. Self-Contained profile and contigs databases for each MAG are also available for download.

Alneberg, Johannes, Brynjar Smári Bjarnason, Ino De Bruijn, Melanie Schirmer, Joshua Quick, Umer Z Ijaz, Leo Lahti, Nicholas J Loman, Anders F Andersson, and Christopher Quince. 2014. “Binning Metagenomic Contigs by Coverage and Composition.” Nature Methods 11 (11): 1144–46. https://doi.org/10.1038/nmeth.3103.
Kang, Dongwan D, Feng Li, Edward Kirton, Ashleigh Thomas, Rob Egan, Hong An, and Zhong Wang. 2019. “MetaBAT 2: An Adaptive Binning Algorithm for Robust and Efficient Genome Reconstruction from Metagenome Assemblies.” PeerJ 7: e7359.
Parks, Donovan H, Maria Chuvochina, David W Waite, Christian Rinke, Adam Skarshewski, Pierre-Alain Chaumeil, and Philip Hugenholtz. 2018. “A Standardized Bacterial Taxonomy Based on Genome Phylogeny Substantially Revises the Tree of Life.” Nature Biotechnology 36 (10): 1546–696. https://doi.org/10.1038/nbt.4229.
Sieber, Christian MK, Alexander J Probst, Allison Sharrar, Brian C Thomas, Matthias Hess, Susannah G Tringe, and Jillian F Banfield. 2018. “Recovery of Genomes from Metagenomes via a Dereplication, Aggregation and Scoring Strategy.” Nature Microbiology 3 (7): 836–43.
Wu, Yu-Wei, Blake A Simmons, and Steven W Singer. 2016. “MaxBin 2.0: An Automated Binning Algorithm to Recover Genomes from Multiple Metagenomic Datasets.” Bioinformatics 32 (4): 605–7.

  1. This redundancy cutoff only works for bacteria or archaea. If you have eukaryotic bins, the redundancy should be set to 100.↩︎

References

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hypocolypse/web/, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".