ONT Methylation Analysis — Comprehensive Summary

Scope: What methylation is (5‑mC, 6‑mA, 4‑mC), how Oxford Nanopore (ONT) detects it, how it differs from bisulfite sequencing, required coverage, file types (modBAM/CRAM with MM/ML tags), basecalling models (Dorado), practical workflows, pipelines (nf-core/methylong), deliverables to request from providers (e.g., Novogene), and specific advice for bacterial projects.


1) What are 5‑mC, 6‑mA, 4‑mC?

  • 5‑mC (5‑methylcytosine): methyl group on cytosine C5 carbon. In eukaryotes strongly linked to gene regulation (CpG), chromatin state, imprinting. Also present in some bacteria (e.g., Dcm at CCWGG).
  • 6‑mA (N6‑methyladenine): methyl on adenine N6. Very common in bacteria/archaea (e.g., Dam at GATC), functions in restriction–modification (R–M), mismatch repair, replication control, and gene regulation.
  • 4‑mC (N4‑methylcytosine): methyl on cytosine N4, mostly in bacteria/archaea (R–M and regulation).

Coverage guidance (ONT direct detection):

  • ≥ 10× for 5‑mC calling/quantification.
  • ≥ 50× for 6‑mA and 4‑mC (signals are weaker; models need depth).

2) How ONT detects methylation (no chemical conversion)

  • ONT does not convert bases (unlike bisulfite sequencing which converts un‑methylated C → U → read as T). ONT reads remain A/C/G/T.
  • ONT measures ionic current while DNA k‑mers pass the pore. Modified bases (5‑mC/6‑mA/4‑mC) slightly shift current distributions.
  • A modified‑base basecaller (now Dorado; historically Guppy+Remora) decodes those shifts and writes methylation annotations into aligned BAM/CRAM as MM/ML tags:
    • MM: modified motif and per‑read positions.
    • ML: per‑site modification probabilities/scores.
  • Downstream tools (e.g., modkit, methylartist, nf‑core/methylong) summarize per‑site/per‑region methylation and export BED/bedGraph/bigWig for visualization/statistics.

Key contrast with bisulfite (BS‑seq):

  • BS‑seq chemically converts un‑methylated C to U (sequenced as T) → uses base changes to infer methylation.
  • ONT uses signal differences; no base letters change. Methylation is metadata in BAM tags, not edits in the sequence.

3) Data types & what you need (modBAM vs “assembly” reads)

  • Previous ONT reads used for genome assembly are typically standard basecalls (A/C/G/T only) and lack MM/ML tags, so not suitable for methylation quantification.
  • For methylation analysis you need either:
    1. Provider delivers aligned modified‑base BAM/CRAM (modBAM/CRAM) with MM/ML tags and indices (.bai/.crai).
    2. Or you re‑basecall FAST5/FASTQ with a modified‑base Dorado model and then align to your reference (producing modBAM).

Reference genome requirement:

  • For aligned BAM, you (or the provider) must map to a reference FASTA. Keep the exact FASTA (and .fai) used for reproducibility and downstream summarization.

4) Practical workflow (bacteria)

A. Planning & sequencing

  • Decide targets: in bacteria prioritize 6‑mA/4‑mC; optionally 5‑mC (if Dam/Dcm enzymes present).
  • Coverage targets: ≥50× (6‑mA/4‑mC), ≥10× (5‑mC).
  • Ask provider to run Dorado (modified‑base model) and deliver aligned modBAM/CRAM with MM/ML tags.

B. Inputs/outputs to request from provider (e.g., Novogene)

  1. Deliverables:
    • modBAM/CRAM (aligned to our provided reference), with MM/ML tags + .bai/.crai.
    • Optional per‑site tracks: BED/bedGraph/bigWig and a QC report.
  2. Reference:
    • Can we provide bacterial reference FASTA? Will they return the exact FASTA (.fai) used?
  3. Models & modifications:
    • Which Dorado model version and which mods (5‑mC, 6‑mA, 4‑mC) are called by default?
  4. Unaligned data:
    • If delivering unmapped uBAM/FASTQ, request that modified‑base calls (tags) are still included, or obtain raw signal/FAST5 if re‑calling in‑house.

C. In‑house analysis (outline)

  • Align mod‑called reads to reference (if not already) → modBAM.
  • Run modkit to summarize per‑site methylation frequencies and export bedGraph/bigWig.
  • Use methylartist for regional plots, motif‑centric views, metaplots over features (promoters, operons, RND genes, etc.).
  • Integrate with other omics (RNA‑seq) by averaging methylation in promoter/operon windows and correlating with expression changes.

5) nf‑core/methylong (pipeline overview)

  • Community Nextflow pipeline for ONT methylation. Typical features:
    • Supports Dorado modified‑base calling (or consumes modBAM/CRAM).
    • Performs alignment (e.g., minimap2) to your reference, keeps MM/ML tags.
    • Generates per‑site/ per‑region summaries, tracks (bedGraph/bigWig), and QC.
  • Inputs: reads (FASTQ/FAST5) or modBAM + reference FASTA; sample sheet with metadata.
  • Outputs: modBAM/CRAM + indices, per‑site methylation tables, genome tracks, multiQC‑style reports.

(Exact CLI flags vary by version; coordinate with the provider or your compute environment.)


6) QC & caveats

  • Depth matters: 6‑mA/4‑mC need higher coverage than 5‑mC.
  • Model choice: Use the correct Dorado modified‑base model for your chemistry/flow cell and target modifications.
  • Reference fidelity: Use the same reference throughout (and document version).
  • BAM integrity: Verify MM/ML tags exist; confirm alignment header matches the provided FASTA.
  • Context effects: Methylation calling is k‑mer context‑dependent; some motifs are easier/harder.
  • Biological interpretation: In bacteria, methylation is often tied to R–M systems, replication, and gene regulation; interpret rates in motif/operon context, not only at single CpG‑style sites.

7) What to ask a provider (email checklist)

  • Will you deliver aligned modBAM/CRAM with MM/ML tags (+ index)?
  • Which modified bases are called (5‑mC, 6‑mA, 4‑mC)? Which Dorado model/version?
  • Do you require us to provide a bacterial reference FASTA for alignment? Will you return the exact reference used?
  • Can you also provide per‑site methylation tracks (bedGraph/bigWig) and a QC report?
  • What coverage will be achieved per sample (target ≥10× for 5‑mC; ≥50× for 6‑mA/4‑mC)?

8) Suggested minimal deliverables

  • modBAM/CRAM aligned to our provided reference (+ .bai/.crai).
  • Reference FASTA and .fai used in alignment/calling.
  • Per‑site tables (tsv) and tracks (bedGraph/bigWig).
  • Brief QC (coverage, fraction modified by motif, per‑site confidence).

9) Bacterial project recommendation (one‑liner)

For bacteria, profile 6‑mA (and 4‑mC) as primary targets (≥50×), optionally 5‑mC (≥10× if Dcm‑like activity expected), using Dorado modified‑base calling and aligned modBAM/CRAM with MM/ML tags; summarize with modkit/methylartist and integrate with RNA‑seq.


10) Handy pointers & checks (quick ref)

  • Check BAM has mods: samtools view -h mod.bam | head → look for MM:Z: and ML:B:C tags.
  • Confirm reference: samtools view -H mod.bam | grep '^@SQ' and keep the FASTA.
  • Summarize (example modkit): modkit pileup mod.bam ref.fa --bedgraph out.bg --min-mapq 20
  • Visualize: Load bigWig/bedGraph in IGV/JBrowse; overlay RNA‑seq coverage/DE results.

Prepared from the morning discussion to serve as a self‑contained guide and hand‑off document.

Leave a Reply

Your email address will not be published. Required fields are marked *