Annotation of Multi-Nucleotide Polymorphisms (MNPs): Dissecting the c.465_469delCATTGinsAATTT Complex Variant (Data_Holger_DNAseq_2026_Sepi_Pairs)

Based on the genetic notation provided, here is the detailed explanation for why it appears that “only one nucleotide changed” the protein, even though the DNA notation looks like a large 5-base substitution.

The Short Answer

At the protein level, the change from Glycine to Cysteine (p.Gly157Cys) is indeed caused by a single nucleotide change within that specific codon.

However, at the DNA level, two nucleotides actually changed within the 5-base window (CATTG $\rightarrow$ AATTT). The reason you only see one amino acid change is that the second DNA mutation is likely synonymous (silent)—meaning it does not change the amino acid. Bioinformatics tools often group adjacent DNA changes into a single “block” notation, which can make it look more complicated than it is.

Step-by-Step Breakdown of the Mutation

Let’s map the 5-base change (c.465_469delCATTGinsAATTT) directly to the codons (amino acid building blocks) to see exactly what happened.

Since each codon is 3 bases long:

Codon 155 consists of bases 463, 464, and 465.
Codon 156 consists of bases 466, 467, and 468.
Codon 157 consists of base 469, 470, and 471.

Now, let’s align your reference (CATTG) and alternate (AATTT) sequences across these positions:

DNA Position	Codon Location	Reference Base	Alternate Base	Effect
465	3rd base of Codon 155	C	A	Synonymous (Silent): Because this is the 3rd “wobble” position of the codon, this change likely does not alter the amino acid at position 155.
466	1st base of Codon 156	A	A	No change
467	2nd base of Codon 156	T	T	No change
468	3rd base of Codon 156	T	T	No change
469	1st base of Codon 157	G	T	Missense: This changes the first letter of Codon 157. Glycine codons start with `GG...` (e.g., GGT). Changing the `G` to a `T` makes it `TG...` (e.g., TGT), which codes for Cysteine.

Why is it written as a 5-base change?

You might wonder why the software wrote delCATTGinsAATTT (which looks like a massive 5-base deletion and insertion) instead of just listing the two single-nucleotide changes.

This is a standard behavior of variant calling and annotation pipelines (like GATK, SnpEff, or VEP). When multiple nucleotide changes occur very close to each other in the sequencing reads, the software groups them together into a single “Complex Variant” or Multi-Nucleotide Polymorphism (MNP).

Instead of outputting two separate lines:

c.465C>A (Silent)
c.469G>T (Missense)

The pipeline outputs them as one combined block event: c.465_469delCATTGinsAATTT.

Summary

Did only one nucleotide change the protein? Yes. The G $\rightarrow$ T mutation at position 469 is the sole reason the amino acid changed from Glycine to Cysteine.
What about the other change? The C $\rightarrow$ A mutation at position 465 is a “silent” passenger mutation that doesn’t affect the final protein structure.
Why the confusing notation? The software simply bundled the two adjacent DNA events into one line for simplicity, even though they affect different codons.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Microbial bioinformatics

Microbial bioinformatics uses computational tools to analyze genomes, track evolution, and study functions in microorganisms, including bacteria and viruses.

Annotation of Multi-Nucleotide Polymorphisms (MNPs): Dissecting the c.465_469delCATTGinsAATTT Complex Variant (Data_Holger_DNAseq_2026_Sepi_Pairs)

The Short Answer

Step-by-Step Breakdown of the Mutation

Why is it written as a 5-base change?

Summary

Leave a Reply Cancel reply