Scope: What methylation is (5‑mC, 6‑mA, 4‑mC), how Oxford Nanopore (ONT) detects it, how it differs from bisulfite sequencing, required coverage, file types (modBAM/CRAM with MM/ML tags), basecalling models (Dorado), practical workflows, pipelines (nf-core/methylong), deliverables to request from providers (e.g., Novogene), and specific advice for bacterial projects.
1) What are 5‑mC, 6‑mA, 4‑mC?
- 5‑mC (5‑methylcytosine): methyl group on cytosine C5 carbon. In eukaryotes strongly linked to gene regulation (CpG), chromatin state, imprinting. Also present in some bacteria (e.g., Dcm at CCWGG).
- 6‑mA (N6‑methyladenine): methyl on adenine N6. Very common in bacteria/archaea (e.g., Dam at GATC), functions in restriction–modification (R–M), mismatch repair, replication control, and gene regulation.
- 4‑mC (N4‑methylcytosine): methyl on cytosine N4, mostly in bacteria/archaea (R–M and regulation).
Coverage guidance (ONT direct detection):
- ≥ 10× for 5‑mC calling/quantification.
- ≥ 50× for 6‑mA and 4‑mC (signals are weaker; models need depth).
2) How ONT detects methylation (no chemical conversion)
- ONT does not convert bases (unlike bisulfite sequencing which converts un‑methylated C → U → read as T). ONT reads remain A/C/G/T.
- ONT measures ionic current while DNA k‑mers pass the pore. Modified bases (5‑mC/6‑mA/4‑mC) slightly shift current distributions.
- A modified‑base basecaller (now Dorado; historically Guppy+Remora) decodes those shifts and writes methylation annotations into aligned BAM/CRAM as MM/ML tags:
- MM: modified motif and per‑read positions.
- ML: per‑site modification probabilities/scores.
- Downstream tools (e.g., modkit, methylartist, nf‑core/methylong) summarize per‑site/per‑region methylation and export BED/bedGraph/bigWig for visualization/statistics.
Key contrast with bisulfite (BS‑seq):
- BS‑seq chemically converts un‑methylated C to U (sequenced as T) → uses base changes to infer methylation.
- ONT uses signal differences; no base letters change. Methylation is metadata in BAM tags, not edits in the sequence.
3) Data types & what you need (modBAM vs “assembly” reads)
- Previous ONT reads used for genome assembly are typically standard basecalls (A/C/G/T only) and lack MM/ML tags, so not suitable for methylation quantification.
- For methylation analysis you need either:
- Provider delivers aligned modified‑base BAM/CRAM (modBAM/CRAM) with MM/ML tags and indices (.bai/.crai).
- Or you re‑basecall FAST5/FASTQ with a modified‑base Dorado model and then align to your reference (producing modBAM).
Reference genome requirement:
- For aligned BAM, you (or the provider) must map to a reference FASTA. Keep the exact FASTA (and .fai) used for reproducibility and downstream summarization.
4) Practical workflow (bacteria)
A. Planning & sequencing
- Decide targets: in bacteria prioritize 6‑mA/4‑mC; optionally 5‑mC (if Dam/Dcm enzymes present).
- Coverage targets: ≥50× (6‑mA/4‑mC), ≥10× (5‑mC).
- Ask provider to run Dorado (modified‑base model) and deliver aligned modBAM/CRAM with MM/ML tags.
B. Inputs/outputs to request from provider (e.g., Novogene)
- Deliverables:
- modBAM/CRAM (aligned to our provided reference), with MM/ML tags + .bai/.crai.
- Optional per‑site tracks: BED/bedGraph/bigWig and a QC report.
- Reference:
- Can we provide bacterial reference FASTA? Will they return the exact FASTA (.fai) used?
- Models & modifications:
- Which Dorado model version and which mods (5‑mC, 6‑mA, 4‑mC) are called by default?
- Unaligned data:
- If delivering unmapped uBAM/FASTQ, request that modified‑base calls (tags) are still included, or obtain raw signal/FAST5 if re‑calling in‑house.
C. In‑house analysis (outline)
- Align mod‑called reads to reference (if not already) → modBAM.
- Run modkit to summarize per‑site methylation frequencies and export bedGraph/bigWig.
- Use methylartist for regional plots, motif‑centric views, metaplots over features (promoters, operons, RND genes, etc.).
- Integrate with other omics (RNA‑seq) by averaging methylation in promoter/operon windows and correlating with expression changes.
5) nf‑core/methylong (pipeline overview)
- Community Nextflow pipeline for ONT methylation. Typical features:
- Supports Dorado modified‑base calling (or consumes modBAM/CRAM).
- Performs alignment (e.g., minimap2) to your reference, keeps MM/ML tags.
- Generates per‑site/ per‑region summaries, tracks (bedGraph/bigWig), and QC.
- Inputs: reads (FASTQ/FAST5) or modBAM + reference FASTA; sample sheet with metadata.
- Outputs: modBAM/CRAM + indices, per‑site methylation tables, genome tracks, multiQC‑style reports.
(Exact CLI flags vary by version; coordinate with the provider or your compute environment.)
6) QC & caveats
- Depth matters: 6‑mA/4‑mC need higher coverage than 5‑mC.
- Model choice: Use the correct Dorado modified‑base model for your chemistry/flow cell and target modifications.
- Reference fidelity: Use the same reference throughout (and document version).
- BAM integrity: Verify MM/ML tags exist; confirm alignment header matches the provided FASTA.
- Context effects: Methylation calling is k‑mer context‑dependent; some motifs are easier/harder.
- Biological interpretation: In bacteria, methylation is often tied to R–M systems, replication, and gene regulation; interpret rates in motif/operon context, not only at single CpG‑style sites.
7) What to ask a provider (email checklist)
- Will you deliver aligned modBAM/CRAM with MM/ML tags (+ index)?
- Which modified bases are called (5‑mC, 6‑mA, 4‑mC)? Which Dorado model/version?
- Do you require us to provide a bacterial reference FASTA for alignment? Will you return the exact reference used?
- Can you also provide per‑site methylation tracks (bedGraph/bigWig) and a QC report?
- What coverage will be achieved per sample (target ≥10× for 5‑mC; ≥50× for 6‑mA/4‑mC)?
8) Suggested minimal deliverables
- modBAM/CRAM aligned to our provided reference (+ .bai/.crai).
- Reference FASTA and .fai used in alignment/calling.
- Per‑site tables (tsv) and tracks (bedGraph/bigWig).
- Brief QC (coverage, fraction modified by motif, per‑site confidence).
9) Bacterial project recommendation (one‑liner)
For bacteria, profile 6‑mA (and 4‑mC) as primary targets (≥50×), optionally 5‑mC (≥10× if Dcm‑like activity expected), using Dorado modified‑base calling and aligned modBAM/CRAM with MM/ML tags; summarize with modkit/methylartist and integrate with RNA‑seq.
10) Handy pointers & checks (quick ref)
- Check BAM has mods:
samtools view -h mod.bam | head→ look for MM:Z: and ML:B:C tags. - Confirm reference:
samtools view -H mod.bam | grep '^@SQ'and keep the FASTA. - Summarize (example modkit):
modkit pileup mod.bam ref.fa --bedgraph out.bg --min-mapq 20 - Visualize: Load bigWig/bedGraph in IGV/JBrowse; overlay RNA‑seq coverage/DE results.
Prepared from the morning discussion to serve as a self‑contained guide and hand‑off document.