cfDNA Sequencing: Technological Approaches and Bioinformatic Issues

gene_x 0 like s 31 view s

Tags: research

Cell free circulating DNA (cfDNA) refers to DNA fragments present outside of cells in body fluids such as plasma, urine, and cerebrospinal fluid (CSF). CfDNA was first identified in 1948 from plasma of healthy individuals [1]. Afterward, studies showed that the quantity of this cfDNA in the blood was increased under pathological conditions such as auto-immune diseases [2] but also cancers [3]. In 1989, Philippe Anker and Maurice Stroun, from the University of Geneva, demonstrated that this cfDNA from cancer patients carries the characteristics of the DNA from tumoral cells [4]. Next, using the recently developed technique of PCR, David Sidransky and his team found the same mutations of TP53 in bladder tumoral samples and urine pellets from patients [5]. Then, the research and identification of genomic anomalies specific of a cancer type in the circulating DNA, such as NRAS and KRAS mutations or HER-2 amplifications [6,7,8], started to expand, and for the first time, the term of circulating tumor DNA (ctDNA) appeared.

Since the highlighting of this circulating DNA of tumoral origin, technological developments in molecular biology, from quantitative and digital PCR to Next Generation Sequencing, turned it into a powerful liquid biopsy tool. At the era of precision medicine, it seems crucial to identify molecular alterations that will be able to guide the therapeutic management of patients. As tumors release DNA in the blood or other body fluids such as urine, this circulating tumoral DNA, containing the molecular characteristics of the tumor, can be collected with a simple body fluid sample. Since it is minimally invasive, this liquid biopsy is easily repeatable during follow up and in case of relapse. It is also of major interest in some particular cancers where a tumoral biopsy is difficult to obtain such as primary central nervous system lymphoma [9] or cancer subtypes with tissue biopsy containing very little tumoral cells such as Hodgkin lymphoma (HL) for which Reed–Sternbeg cells represent only 0.1 to 2% of the tumoral mass [10,11]. In these particular conditions and malignancies, the sequencing of ctDNA in body fluids could serve as a surrogate for a tumor biopsy. Other body fluids than blood are often used according to the localization of the tumor, such as urine for bladder cancers or cerebrospinal fluid for cerebral tumors [9,12] but blood is the body fluid most often used in studies.

In blood, average cfDNA concentration in healthy individuals can range between 0 and 100 ng/mL of plasma with an average of 30 ng/mL of plasma and is significantly higher in blood of cancer patients, varying between 0 and 1000 ng/mL, with an average of 180 ng/mL [13]. This concentration is correlated with the stage of the cancer, increasing with higher stages, and the size of the tumor. Circulating DNA of tumoral origin represents from 0.01 to more than 90% of the total cell free DNA found in blood [14]. In different types of cancers, a large scale ctDNA sequencing study has shown an association between ctDNA levels and mutational tumor burden [15]. Moreover, given the spatial heterogeneity observed in tumor tissue, ctDNA analysis can determine the complete molecular landscape of a patient’s tumor and give supplementary information on drug targetable alterations and resistant variants [16]. ctDNA kinetics during follow up is correlated with prognosis, as a drastic reduction in its level after treatment is associated with better prognosis, whereas an increase usually means the evolution of drug resistant clones and an ultimate therapeutic failure [17,18,19,20].

Detection of ctDNA during MRD follow up to predict early relapse and at diagnosis in early stages of cancer continues to be a challenge, as the fraction of tumoral DNA contents in total circulating DNA may be <0.01% [21,22]. The development of sequencing technologies being more and more sensitive allows the detection of alterations present in cfDNA at very low variant allele frequencies (VAF), not only for mutational profiling at diagnosis but also for the early detection of disease recurrence and monitoring for therapy response. However, several parameters can affect the sensitivity of ctDNA detection. First, adequate handling of the blood sample, from blood collection to the quality control of the cfDNA extracted, is crucial in analysis. Next, an important step is the choice of the biomarker (s) and the sequencing technology used to detect it. Then, bioinformatic analysis, using error suppression algorithms, is the ultimate tool to discriminate the true variant from false positives.

无细胞循环DNA(cfDNA)指的是体液中细胞外的DNA碎片,如血浆、尿液和脑脊液(CSF)。cfDNA最早在1948年从健康个体的血浆中被发现[1]。此后,研究表明,这种cfDNA在血液中的数量在如自身免疫性疾病[2]等病理状态下增加,以及癌症[3]。1989年,日内瓦大学的Philippe Anker和Maurice Stroun展示了癌症患者的cfDNA携带了肿瘤细胞DNA的特征[4]。接下来,使用新开发的PCR技术,David Sidransky及其团队在膀胱肿瘤样本和患者的尿沉渣中发现了相同的TP53突变[5]。然后,对循环DNA中特定癌症类型的基因组异常的研究和识别,如NRAS和KRAS突变或HER-2扩增[6,7,8]开始扩展,首次出现了循环肿瘤DNA(ctDNA)这一术语。


在血液中,健康个体的平均cfDNA浓度可以在0至100 ng/mL血浆之间,平均为30 ng/mL血浆,而癌症患者的血液中则显著更高,变化在0至1000 ng/mL,平均为180 ng/mL [13]。这个浓度与癌症的阶段相关,随着阶段的提高和肿瘤大小的增加而增加。肿瘤来源的循环DNA占血液中发现的总无细胞DNA的0.01%到90%以上[14]。在不同类型的癌症中,一个大规模ctDNA测序研究显示ctDNA水平与突变肿瘤负担之间存在关联[15]。此外,鉴于在肿瘤组织中观察到的空间异质性,ctDNA分析可以确定患者肿瘤的完整分子景观,并提供关于可药物靶向的改变和耐药变异的补充信息[16]。随访期间ctDNA动态与预后相关,治疗后其水平的急剧减少与更好的预后相关,而增加通常意味着耐药克隆的发展和最终的治疗失败[17,18,19,20]。


  1. using DAMIAN to analyse the cfDNA sequencing data

    cd ~/Tools/damian/databases/blast
    damian.rb --host human3 --type dna -1 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20428/neg_control_S2_R1_001.fastq.gz -2 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20428/neg_control_S2_R2_001.fastq.gz --sample neg_control_S2_megablast --blastn never --blastp never --min_contiglength 200 --threads 55 --force
    zip -r neg_control_S2_megablast/
    echo -e "Hi Nicole,\n\nPlease find attached the latest results from our DAMIAN analysis.\n\nBest,\nJiabin " | mutt -a "./" -s "New results from DAMIAN" -- ","
    damian.rb --host human3 --type dna -1 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20429/635724976_S_aureus_epidermidis_S3_R1_001.fastq.gz -2 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20429/635724976_S_aureus_epidermidis_S3_R2_001.fastq.gz --sample 635724976_S_aureus_epidermidis_S3_megablast --blastn never --blastp never --min_contiglength 200 --threads 55 --force
    zip -r 635724976_S_aureus_epidermidis_S3_megablast/
    echo -e "Hi Nicole,\n\nPlease find attached the latest results from our DAMIAN analysis.\n\nBest,\nJiabin " | mutt -a "./" -s "New results from DAMIAN" -- ","
    damian.rb --host human3 --type dna -1 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20430/635290002_CMV_S4_R1_001.fastq.gz -2 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20430/635290002_CMV_S4_R2_001.fastq.gz --sample 635290002_CMV_S4_megablast --blastn never --blastp never --min_contiglength 200 --threads 55 --force
    zip -r 635290002_CMV_S4_megablast/
    echo -e "Hi Nicole,\n\nPlease find attached the latest results from our DAMIAN analysis.\n\nBest,\nJiabin " | mutt -a "./" -s "New results from DAMIAN" -- ","
    damian.rb --host human3 --type dna -1 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20431/635850623_EBV_S5_R1_001.fastq.gz -2 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20431/635850623_EBV_S5_R2_001.fastq.gz --sample 635850623_EBV_S5_megablast --blastn never --blastp never --min_contiglength 200 --threads 55 --force
    zip -r 635850623_EBV_S5_megablast/
    echo -e "Hi Nicole,\n\nPlease find attached the latest results from our DAMIAN analysis.\n\nBest,\nJiabin " | mutt -a "./" -s "New results from DAMIAN" -- ","
    damian.rb --host human3 --type dna -1 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20427/635031018_E_faecium_S1_R1_001.fastq.gz -2 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20427/635031018_E_faecium_S1_R2_001.fastq.gz --sample 635031018_E_faecium_S1_megablast --blastn never --blastp never --min_contiglength 200 --threads 55 --force
    zip -r 635031018_E_faecium_S1_megablast/
    echo -e "Hi Nicole,\n\nPlease find attached the latest results from our DAMIAN analysis.\n\nBest,\nJiabin " | mutt -a "./" -s "New results from DAMIAN" -- ","
  2. using vrap to analyse the cfDNA sequencing data

    conda activate vrap
    cd vrap_outputs
    ln -s ~/Tools/vrap .
    vrap/ -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20427/635031018_E_faecium_S1_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20427/635031018_E_faecium_S1_R2_001.fastq.gz -o E_faecium_S1_vrap_out --host /home/jhuang/REFs/genome.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200
    vrap/ -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20428/neg_control_S2_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20428/neg_control_S2_R2_001.fastq.gz -o neg_control_S2_vrap_out --host /home/jhuang/REFs/genome.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200
    vrap/ -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20429/635724976_S_aureus_epidermidis_S3_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20429/635724976_S_aureus_epidermidis_S3_R2_001.fastq.gz -o S_aureus_epidermidis_S3_vrap_out --host /home/jhuang/REFs/genome.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200
    #txid10358 (
    sed -i -e 's/txid10239/txid10358/g' vrap/
    sed -i -e 's/retmax=100000/retmax=10000000/g' vrap/
    vrap/ -u
    vrap/ -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20430/635290002_CMV_S4_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20430/635290002_CMV_S4_R2_001.fastq.gz -o CMV_S4_vrap_out --host /home/jhuang/REFs/genome.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200 #[--virus=Cytomegalovirus.fasta]
    mv vrap/database/viral_db vrap/database/viral_db_CMV
    sed -i -e 's/txid10358/txid10376/g' vrap/
    vrap/ -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20431/635850623_EBV_S5_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20431/635850623_EBV_S5_R2_001.fastq.gz -o EBV_S5_vrap_out --host /home/jhuang/REFs/genome.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200 #[--virus=Epstein-Barr-Virus.fasta]
    mv vrap/database/viral_db vrap/database/viral_db_EBV
    mv vrap/database/viral_db_orig vrap/database/viral_db
    vrap/ -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20430/635290002_CMV_S4_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20430/635290002_CMV_S4_R2_001.fastq.gz -o CMV_S4_vrap_out_host_CMV --host vrap/database/viral_db_CMV/nucleotide.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200
    vrap/ -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20431/635850623_EBV_S5_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20431/635850623_EBV_S5_R2_001.fastq.gz -o EBV_S5_vrap_out_host_EBV --host vrap/database/viral_db_EBV/nucleotide.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200
    #show samtools flagstat mapped and screen of mapped on IGV, the bam and fasta files to her.  

like unlike






© 2023 Impressum