Annotation of Multi-Nucleotide Polymorphisms (MNPs): Dissecting the c.465_469delCATTGinsAATTT Complex Variant (Data_Holger_DNAseq_2026_Sepi_Pairs)

Based on the genetic notation provided, here is the detailed explanation for why it appears that “only one nucleotide changed” the protein, even though the DNA notation looks like a large 5-base substitution.

The Short Answer

At the protein level, the change from Glycine to Cysteine (p.Gly157Cys) is indeed caused by a single nucleotide change within that specific codon.

However, at the DNA level, two nucleotides actually changed within the 5-base window (CATTG $\rightarrow$ AATTT). The reason you only see one amino acid change is that the second DNA mutation is likely synonymous (silent)—meaning it does not change the amino acid. Bioinformatics tools often group adjacent DNA changes into a single “block” notation, which can make it look more complicated than it is.


Step-by-Step Breakdown of the Mutation

Let’s map the 5-base change (c.465_469delCATTGinsAATTT) directly to the codons (amino acid building blocks) to see exactly what happened.

Since each codon is 3 bases long:

  • Codon 155 consists of bases 463, 464, and 465.
  • Codon 156 consists of bases 466, 467, and 468.
  • Codon 157 consists of base 469, 470, and 471.

Now, let’s align your reference (CATTG) and alternate (AATTT) sequences across these positions:

DNA Position Codon Location Reference Base Alternate Base Effect
465 3rd base of Codon 155 C A Synonymous (Silent): Because this is the 3rd “wobble” position of the codon, this change likely does not alter the amino acid at position 155.
466 1st base of Codon 156 A A No change
467 2nd base of Codon 156 T T No change
468 3rd base of Codon 156 T T No change
469 1st base of Codon 157 G T Missense: This changes the first letter of Codon 157. Glycine codons start with GG... (e.g., GGT). Changing the G to a T makes it TG... (e.g., TGT), which codes for Cysteine.

Why is it written as a 5-base change?

You might wonder why the software wrote delCATTGinsAATTT (which looks like a massive 5-base deletion and insertion) instead of just listing the two single-nucleotide changes.

This is a standard behavior of variant calling and annotation pipelines (like GATK, SnpEff, or VEP). When multiple nucleotide changes occur very close to each other in the sequencing reads, the software groups them together into a single “Complex Variant” or Multi-Nucleotide Polymorphism (MNP).

Instead of outputting two separate lines:

  1. c.465C>A (Silent)
  2. c.469G>T (Missense)

The pipeline outputs them as one combined block event: c.465_469delCATTGinsAATTT.

Summary

  • Did only one nucleotide change the protein? Yes. The G $\rightarrow$ T mutation at position 469 is the sole reason the amino acid changed from Glycine to Cysteine.
  • What about the other change? The C $\rightarrow$ A mutation at position 465 is a “silent” passenger mutation that doesn’t affect the final protein structure.
  • Why the confusing notation? The software simply bundled the two adjacent DNA events into one line for simplicity, even though they affect different codons.

Leave a Reply

Your email address will not be published. Required fields are marked *