-
preparing gene seqences
(Optional online search) https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&BLAST_SPEC=MicrobialGenomes #Title:Refseq prokaryote representative genomes (contains refseq assembly) #Molecule Type:mixed DNA #Update date:2024/10/16 #Number of sequences:1038672 #cat gyrB_revcomp.fasta fumC_revcomp.fasta gltA.fasta icd_revcomp.fasta apsS.fasta sigB_revcomp.fasta sarA_revcomp.fasta ... # SCCmec,agr.typing, # gyrB(House keeper), # fumC,gltA,icd(Metabolic genes), # apsS,sigB,sarA,agrC,yycG(Virulence regulators), # psm.β1,hlb(Toxins), # atlE,sdrG,sdrH,ebh,ebp,tagB(Biofilm formation), # capC,sepA,dltA,fmtC,lipA,sceD,SE0760(Immune evasion & colonization), # esp,ecpA(Serine protease) gene complement(1547773..1549233) /gene="ebp" /locus_tag="SAKG22_13940" CDS complement(1547773..1549233) /gene="ebp" samtools faidx AP019545.1.fasta samtools faidx AP019545.1.fasta "gi|1602080981|dbj|AP019545.1|":1547773-1549233 > ebp.fasta revcomp ebp.fasta > ebp_revcomp.fasta cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/gyrB_revcomp.fasta gyrB.fasta cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/fumC_revcomp.fasta fumC.fasta cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/gltA.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/icd_revcomp.fasta icd.fasta cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/apsS.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/sigB_revcomp.fasta sigB.fasta cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/sarA_revcomp.fasta sarA.fasta cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/agrC.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/yycG.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/psm-beta1.fasta psm.β1.fasta cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/hlb.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/atlE.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/sdrG.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/sdrH_revcomp.fasta sdrH.fasta cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/ebh_revcomp.fasta ebh.fasta cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/ebp_revcomp.fasta ebp.fasta cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/tagB.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/capC.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/sepA.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/dltA.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/fmtC.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/lipA.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/sceD.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/SE0760.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/esp.fasta . cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/ecpA.fasta . #cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/MT880870.fasta . #cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/MT880871.fasta . #cp ~/DATA/Data_Luise_Sepi_STKN/plotTreeHeatmap_mibi_isolates/MT880872.fasta . >gi|1602080981|dbj|AP019545.1|:1547773-1549233 >gi|34980227|gb|AY322150.1|:3091-8793 ./gyrB_revcomp.fasta ./agrC.fasta ./CP133692.fasta ./sceD_mibi1435_revcomp.fasta ./CP133682.fasta ./CP133697.fasta ./sdrG.fasta ./sarA_revcomp.fasta ./CP133679.fasta ./CP133686.fasta ./sepA.fasta ./icd.fasta ./CP133691.fasta ./CP133698.fasta ./yycG.fasta ./CP133701.fasta ./sdrH_revcomp.fasta ./SE0760.fasta ./CP133681.fasta ./CP133677.fasta ./yycG_revcomp.fasta ./CP000029.fasta ./atlE_old.fasta ./ebpS.fasta ./ebpS_revcomp.fasta ./atlE_revcomp.fasta ./sceDAE.fasta ./hlb.fasta >gi|441494790|gb|KC242859.1|:16-1008 ATGGTGAAAAAAACAAAATCCAATTCACTAAAAAAAGTTGCAACACTTGCATTAGCAAAT TTATTATTAGTTGGTGCACTTACTGACAATAGTGCCAAAGCCGAATCTAAGAAAGATGAT ACTGATTTGAAGTTAGTTAGTCATAACGTTTATATGTTATCGACCGTTTTGTATCCAAAC TGGGGGCAATATAAACGCGCTGATTTAATCGGGCAATCTTCTTATATTAAAAATAATGAT GTCGTAATATTCAATGAAGCATTTGATAATGGCGCATCAGATAAGCTATTAAGTAATGTA AAAAAAGAATATCCTTACCAAACACCTGTACTCGGTCGTTCTCAATCAGGGTGGGACAAA ACTGAAGGTAGCTACTCATCAACTGTTGCTGAAGATGGTGGCGTAGCGATTGTAAGTAAA TATCCTATTAAAGAAAAAATCCAGCATGTTTTCAAAAGCGGTTGTGGATTCGACAATGAT AGCAACAAAGGCTTTGTTTATACAAAAATAGAGAAAAATGGGAAGAACGTTCATGTTATC GGTACACATACACAATCTGAAGATTCACGTTGTGGTGCTGGACATGATCGAAAAATTAGA GCTGAACAAATGAAAGAAATCAGTGATTTTGTTAAAAAGAAAAATATCCCTAAAGATGAA ACGGTATATATAGGTGGCGACCTTAATGTAAATAAAGGTACTCCAGAGTTCAAAGATATG CTTAAAAACTTGAATGTAAATGATGTTCTATATGCAGGTCATAATAGCACATGGGACCCT CAATCAAATTCAATTGCGAAATATAATTATCCTAATGGTAAACCAGAACATTTAGACTAT ATATTTACAGATAAAGATCATAAACAACCAAAACAATTAGTCAATGAAGTTGTGACTGAA AAACCTAAGCCATGGGATGTATATGCGTTCCCATATTACTACGTTTACAATGATTTTTCA GATCATTACCCAATCAAAGCCTATAGTAAATAA >hlb_ >gi|441494790|gb|KC242859.1| Staphylococcus aureus strain 226 beta-hemolysin (hlb) gene, complete cds AAAGGAGTGATAATG ATGGTGAAAAAAACAAAATCCAATTCACTAAAAAAAGTTGCAACACTTGCATTAGCAAATTTATTATTAGTTGGTGCACTTACTGACAATAGTGCCAAAGCCGAATCTAAGAAAGATGATACTGA TTTGAAGTTAGTTAGTCATAACGTTTATATGTTATCGACCGTTTTGTATCCAAACTGGGGGCAATATAAA CGCGCTGATTTAATCGGGCAATCTTCTTATATTAAAAATAATGATGTCGTAATATTCAATGAAGCATTTG ATAATGGCGCATCAGATAAGCTATTAAGTAATGTAAAAAAAGAATATCCTTACCAAACACCTGTACTCGG TCGTTCTCAATCAGGGTGGGACAAAACTGAAGGTAGCTACTCATCAACTGTTGCTGAAGATGGTGGCGTA GCGATTGTAAGTAAATATCCTATTAAAGAAAAAATCCAGCATGTTTTCAAAAGCGGTTGTGGATTCGACA ATGATAGCAACAAAGGCTTTGTTTATACAAAAATAGAGAAAAATGGGAAGAACGTTCATGTTATCGGTAC ACATACACAATCTGAAGATTCACGTTGTGGTGCTGGACATGATCGAAAAATTAGAGCTGAACAAATGAAA GAAATCAGTGATTTTGTTAAAAAGAAAAATATCCCTAAAGATGAAACGGTATATATAGGTGGCGACCTTA ATGTAAATAAAGGTACTCCAGAGTTCAAAGATATGCTTAAAAACTTGAATGTAAATGATGTTCTATATGC AGGTCATAATAGCACATGGGACCCTCAATCAAATTCAATTGCGAAATATAATTATCCTAATGGTAAACCA GAACATTTAGACTATATATTTACAGATAAAGATCATAAACAACCAAAACAATTAGTCAATGAAGTTGTGA CTGAAAAACCTAAGCCATGGGATGTATATGCGTTCCCATATTACTACGTTTACAATGATTTTTCAGATCA TTACCCAATCAAAGCCTATAGTAAATAA ./dltA.fasta ./atlE.fasta ./gyrB.fasta ./ORF123_sepA_ORF5.fasta ./ebh.fasta ./CP133696.fasta ./yycG_old.fasta ./CP133700.fasta ./CP133695.fasta ./ecpA.fasta ./capBCA_ywtC.fasta ./CP133690.fasta ./gltA.fasta ./esp.fasta ./sigB.fasta ./CP133699.fasta ./CP133685.fasta ./CP133680.fasta ./MT880872.fasta ./icd_revcomp.fasta ./agrC_old.fasta ./ecpA_.fasta ./agrC_revcomp.fasta ./MT880870.fasta ./CP133687.fasta ./CP133693.fasta ./psm-beta1.fasta >gi|380448412|gb|JQ066320.1| Staphylococcus aureus strain JP1 Psm beta1 (psm beta1) and Psm beta2 (psm beta2) genes, complete cds ACACGTTTAACAACACAAGAATTATTATATCTAAATGAACTAAAATTAGCAATACCTTGTAAATAAAAAA TGTTTATATTTTTCACTATTATAGAGCTATTTATCTAAAAAGGTTCAATAAGACTTAAATACGAATTCAG GCAACTTAATTGTGTTAAATACAGTTTTGAATGCCTAACTGTATTTCTTTTCTCTTTAAAATACAGTTAA GTACATTATAAGATGTTGTGCGGATAAACAAACTAATTGCATCAAATTTATTTTAAAATAACAACAACAA AACGTTAAGAGAATAACATTTCGGTGATTTAAAAGCTACGCACGTTTTTGTTATCTTCAAATTTAAATTT TAAGGAGTGTTTTCA ATGGAAGGTTTATTTAACGCAATTAAAGATACCGTAACTGCAGCAATTAATAATGATGGC GCAAAATTAGGCACAAGCATTGTGAGCATCGTTGAAAATGGCGTAGGTTTATTAGGTAAA TTATTCGGATTCTAA TTTCAATATGTTATGTAAGTAATCAGTATTATTTCAAAGGTGAGGGAGAGATTTAAATGA CTGGACTAGCAGAAGCAATCGCAAATACTGTGCAAGCTGCACAACAACATGATAGTGTGAAATTAGGCAC AAGTATCGTAGACATCGTTGCTAACGGTGTGGGTTTACTAGGTAAATTATTTGGATTCTAATATAATAAC TAATATTCTTTAAAATAAACTGGGTGAGCATACTTTAATGTTATGCACTCAGTTTATTTTATTTGCAGAA ATTTGAGCCTCTGTTAAGATTTAGATACATAGACAATATAGGAGATGGGGAAATTGGGATATAAAAATAT TTTGATAGACTTTGATGATACAATTGTTGATTTTTATGATGCAGA ATGGAAGGTTTATTTAACGCAATTAAAGATACCGTAACTGCAGCAATTAATAATGATGGC GCAAAATTAGGCACAAGCATTGTGAGCATCGTTGAAAATGGCGTAGGTTTATTAGGTAAA TTATTCGGATTCTAA ./psm-beta.fasta ./lipA.fasta ./capC.fasta ./fumC_revcomp.fasta ./CP133689.fasta ./sepA_mibi1435.fasta ./sigB_revcomp.fasta ./sceD.fasta ./fumC.fasta ./Enterococcus_faecium_isolate_E300_pathogenicity_island.fasta ./agrABCD_hld.fasta ./CP133688.fasta ./ecpA_mibi1435_revcomp.fasta ./hlb_.fasta ./CP133684.fasta ./tagB_mibi1435_revcomp.fasta ./CP133678.fasta ./tagB.fasta ./MT880871.fasta ./CP133676.fasta ./sdrH.fasta ./Staphylococcus_epidermidis_RP62A.fasta ./ebh_revcomp.fasta ./CP133683.fasta ./apsS.fasta ./Staphylococcus_aureus_MRSA252.fasta ./fmtC.fasta ./CP133694.fasta ./sarA.fasta
-
prepare the long fasta if we use only the long sequencing
ln -s ../Data_Holger_S.epidermidis_long/finished_assemblies/HDRNA_01_K01_conservative.fasta HDRNA_01_K01.fasta ln -s ../Data_Holger_S.epidermidis_long/finished_assemblies/HDRNA_03_K01_bold_bandage.fasta HDRNA_03_K01.fasta ln -s ../Data_Holger_S.epidermidis_long/finished_assemblies/HDRNA_06_K01_conservative.fasta HDRNA_06_K01.fasta ln -s ../Data_Holger_S.epidermidis_long/finished_assemblies/HDRNA_07_K01_conservative.fasta HDRNA_07_K01.fasta ln -s ../Data_Holger_S.epidermidis_long/finished_assemblies/HDRNA_08_K01_conservative.fasta HDRNA_08_K01.fasta ln -s ../Data_Holger_S.epidermidis_long/finished_assemblies/HDRNA_12_K01_bold.fasta HDRNA_12_K01.fasta ln -s ../Data_Holger_S.epidermidis_long/finished_assemblies/HDRNA_16_K01_conservative.fasta HDRNA_16_K01.fasta ln -s ../Data_Holger_S.epidermidis_long/finished_assemblies/HDRNA_17_K01_conservative.fasta HDRNA_17_K01.fasta ln -s ../Data_Holger_S.epidermidis_long/finished_assemblies/HDRNA_19_K01_bold.fasta HDRNA_19_K01.fasta ln -s ../Data_Holger_S.epidermidis_long/finished_assemblies/HDRNA_20_K01_conservative.fasta HDRNA_20_K01.fasta
-
perform blastn searching for long-sequencing
# -- makeblastdb -- for sample in HDRNA_01_K01 HDRNA_03_K01 HDRNA_06_K01 HDRNA_07_K01 HDRNA_08_K01 HDRNA_12_K01 HDRNA_16_K01 HDRNA_17_K01 HDRNA_19_K01 HDRNA_20_K01; do makeblastdb -in ../assembly/${sample}.fasta -dbtype nucl done for id in gyrB fumC gltA icd apsS sigB sarA agrC yycG psm.β1 hlb atlE sdrG sdrH ebh ebp tagB capC sepA dltA fmtC lipA sceD SE0760 esp ecpA; do echo "mkdir ${id}" echo "for sample in in HDRNA_01_K01 HDRNA_03_K01 HDRNA_06_K01 HDRNA_07_K01 HDRNA_08_K01 HDRNA_12_K01 HDRNA_16_K01 HDRNA_17_K01 HDRNA_19_K01 HDRNA_20_K01; do" echo "blastn -db ../assembly/${sample}.fasta -query ${id}.fasta -evalue 1e-40 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1 > ./${id}/\${sample}.blastn" echo "done" done #gyrB fumC gltA icd apsS sigB sarA agrC yycG psm.β1 hlb atlE sdrG sdrH ebh ebp tagB capC sepA dltA fmtC lipA sceD SE0760 esp ecpA python ~/Scripts/process_directory.py gyrB typing.csv typing_until_gyrB.csv python ~/Scripts/process_directory.py fumC typing_until_gyrB.csv typing_until_fumC.csv python ~/Scripts/process_directory.py gltA typing_until_fumC.csv typing_until_gltA.csv python ~/Scripts/process_directory.py icd typing_until_gltA.csv typing_until_icd.csv python ~/Scripts/process_directory.py apsS typing_until_icd.csv typing_until_apsS.csv python ~/Scripts/process_directory.py sigB typing_until_apsS.csv typing_until_sigB.csv python ~/Scripts/process_directory.py sarA typing_until_sigB.csv typing_until_sarA.csv python ~/Scripts/process_directory.py agrC typing_until_sarA.csv typing_until_agrC.csv python ~/Scripts/process_directory.py yycG typing_until_agrC.csv typing_until_yycG.csv python ~/Scripts/process_directory.py psm.β1 typing_until_yycG.csv typing_until_psm.β1.csv python ~/Scripts/process_directory.py hlb typing_until_psm.β1.csv typing_until_hlb.csv python ~/Scripts/process_directory.py atlE typing_until_hlb.csv typing_until_atlE.csv python ~/Scripts/process_directory.py sdrG typing_until_atlE.csv typing_until_sdrG.csv python ~/Scripts/process_directory.py sdrH typing_until_sdrG.csv typing_until_sdrH.csv python ~/Scripts/process_directory.py ebh typing_until_sdrH.csv typing_until_ebh.csv python ~/Scripts/process_directory.py ebp typing_until_ebh.csv typing_until_ebp.csv python ~/Scripts/process_directory.py tagB typing_until_ebp.csv typing_until_tagB.csv python ~/Scripts/process_directory.py capC typing_until_tagB.csv typing_until_capC.csv python ~/Scripts/process_directory.py sepA typing_until_capC.csv typing_until_sepA.csv python ~/Scripts/process_directory.py dltA typing_until_sepA.csv typing_until_dltA.csv python ~/Scripts/process_directory.py fmtC typing_until_dltA.csv typing_until_fmtC.csv python ~/Scripts/process_directory.py lipA typing_until_fmtC.csv typing_until_lipA.csv python ~/Scripts/process_directory.py sceD typing_until_lipA.csv typing_until_sceD.csv python ~/Scripts/process_directory.py SE0760 typing_until_sceD.csv typing_until_SE0760.csv python ~/Scripts/process_directory.py esp typing_until_SE0760.csv typing_until_esp.csv python ~/Scripts/process_directory.py ecpA typing_until_esp.csv typing_until_ecpA.csv
-
perform blastn searching for short-sequencing
# -- makeblastdb -- for sample in HDRNA_01_K01 HDRNA_01_K02 HDRNA_01_K03 HDRNA_01_K04 HDRNA_01_K05 HDRNA_01_K06 HDRNA_01_K07 HDRNA_01_K08 HDRNA_01_K09 HDRNA_01_K10 HDRNA_02_K01 HDRNA_03_K01 HDRNA_03_K02 HDRNA_03_K03 HDRNA_03_K04 HDRNA_03_K05 HDRNA_03_K06 HDRNA_03_K07 HDRNA_03_K08 HDRNA_03_K09 HDRNA_03_K10 HDRNA_04_K01 HDRNA_05_K01 HDRNA_06_K01 HDRNA_06_K02 HDRNA_06_K03 HDRNA_06_K04 HDRNA_06_K05 HDRNA_06_K06 HDRNA_06_K07 HDRNA_06_K08 HDRNA_06_K09 HDRNA_06_K10 HDRNA_07_K01 HDRNA_07_K02 HDRNA_07_K03 HDRNA_07_K04 HDRNA_07_K05 HDRNA_07_K06 HDRNA_07_K07 HDRNA_07_K08 HDRNA_07_K09 HDRNA_07_K10 HDRNA_08_K01 HDRNA_08_K02 HDRNA_08_K03 HDRNA_08_K04 HDRNA_08_K05 HDRNA_08_K06 HDRNA_08_K07 HDRNA_08_K08 HDRNA_08_K09 HDRNA_08_K10 HDRNA_10_K01 HDRNA_11_K01 HDRNA_12_K01 HDRNA_12_K02 HDRNA_12_K03 HDRNA_12_K04 HDRNA_12_K05 HDRNA_12_K06 HDRNA_12_K07 HDRNA_12_K08 HDRNA_12_K09 HDRNA_12_K10 HDRNA_16_K01 HDRNA_16_K02 HDRNA_16_K03 HDRNA_16_K04 HDRNA_16_K05 HDRNA_16_K06 HDRNA_16_K07 HDRNA_16_K08 HDRNA_16_K09 HDRNA_16_K10 HDRNA_17_K01 HDRNA_17_K02 HDRNA_17_K03 HDRNA_17_K04 HDRNA_17_K05 HDRNA_17_K06 HDRNA_17_K07 HDRNA_17_K08 HDRNA_17_K09 HDRNA_17_K10 HDRNA_19_K01 HDRNA_19_K02 HDRNA_19_K03 HDRNA_19_K04 HDRNA_19_K05 HDRNA_19_K06 HDRNA_19_K07 HDRNA_19_K08 HDRNA_19_K09 HDRNA_19_K10 HDRNA_20_K01 HDRNA_20_K02 HDRNA_20_K03 HDRNA_20_K04 HDRNA_20_K05 HDRNA_20_K06 HDRNA_20_K07 HDRNA_20_K08 HDRNA_20_K09; do makeblastdb -in ../Data_Holger_S.epidermidis_short/shovill/${sample}/contigs.fa -dbtype nucl done #Note: The current -evalue is set to 1e-1; it might need to be made more stringent. for id in gyrB fumC gltA icd apsS sigB sarA agrC yycG psm.β1 hlb atlE sdrG sdrH ebh ebp tagB capC sepA dltA fmtC lipA sceD SE0760 esp ecpA; do echo "mkdir ${id}" echo "for sample in HDRNA_01_K01 HDRNA_01_K02 HDRNA_01_K03 HDRNA_01_K04 HDRNA_01_K05 HDRNA_01_K06 HDRNA_01_K07 HDRNA_01_K08 HDRNA_01_K09 HDRNA_01_K10 HDRNA_02_K01 HDRNA_03_K01 HDRNA_03_K02 HDRNA_03_K03 HDRNA_03_K04 HDRNA_03_K05 HDRNA_03_K06 HDRNA_03_K07 HDRNA_03_K08 HDRNA_03_K09 HDRNA_03_K10 HDRNA_04_K01 HDRNA_05_K01 HDRNA_06_K01 HDRNA_06_K02 HDRNA_06_K03 HDRNA_06_K04 HDRNA_06_K05 HDRNA_06_K06 HDRNA_06_K07 HDRNA_06_K08 HDRNA_06_K09 HDRNA_06_K10 HDRNA_07_K01 HDRNA_07_K02 HDRNA_07_K03 HDRNA_07_K04 HDRNA_07_K05 HDRNA_07_K06 HDRNA_07_K07 HDRNA_07_K08 HDRNA_07_K09 HDRNA_07_K10 HDRNA_08_K01 HDRNA_08_K02 HDRNA_08_K03 HDRNA_08_K04 HDRNA_08_K05 HDRNA_08_K06 HDRNA_08_K07 HDRNA_08_K08 HDRNA_08_K09 HDRNA_08_K10 HDRNA_10_K01 HDRNA_11_K01 HDRNA_12_K01 HDRNA_12_K02 HDRNA_12_K03 HDRNA_12_K04 HDRNA_12_K05 HDRNA_12_K06 HDRNA_12_K07 HDRNA_12_K08 HDRNA_12_K09 HDRNA_12_K10 HDRNA_16_K01 HDRNA_16_K02 HDRNA_16_K03 HDRNA_16_K04 HDRNA_16_K05 HDRNA_16_K06 HDRNA_16_K07 HDRNA_16_K08 HDRNA_16_K09 HDRNA_16_K10 HDRNA_17_K01 HDRNA_17_K02 HDRNA_17_K03 HDRNA_17_K04 HDRNA_17_K05 HDRNA_17_K06 HDRNA_17_K07 HDRNA_17_K08 HDRNA_17_K09 HDRNA_17_K10 HDRNA_19_K01 HDRNA_19_K02 HDRNA_19_K03 HDRNA_19_K04 HDRNA_19_K05 HDRNA_19_K06 HDRNA_19_K07 HDRNA_19_K08 HDRNA_19_K09 HDRNA_19_K10 HDRNA_20_K01 HDRNA_20_K02 HDRNA_20_K03 HDRNA_20_K04 HDRNA_20_K05 HDRNA_20_K06 HDRNA_20_K07 HDRNA_20_K08 HDRNA_20_K09; do" echo "blastn -db ../Data_Holger_S.epidermidis_short/shovill/\${sample}/contigs.fa -query ${id}.fasta -evalue 1e-1 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1 > ./${id}/\${sample}.blastn" echo "done" done #gyrB fumC gltA icd apsS sigB sarA agrC yycG psm.β1 hlb atlE sdrG sdrH ebh ebp tagB capC sepA dltA fmtC lipA sceD SE0760 esp ecpA python ~/Scripts/process_directory.py gyrB typing_104.csv typing_until_gyrB.csv python ~/Scripts/process_directory.py fumC typing_until_gyrB.csv typing_until_fumC.csv python ~/Scripts/process_directory.py gltA typing_until_fumC.csv typing_until_gltA.csv python ~/Scripts/process_directory.py icd typing_until_gltA.csv typing_until_icd.csv python ~/Scripts/process_directory.py apsS typing_until_icd.csv typing_until_apsS.csv python ~/Scripts/process_directory.py sigB typing_until_apsS.csv typing_until_sigB.csv python ~/Scripts/process_directory.py sarA typing_until_sigB.csv typing_until_sarA.csv python ~/Scripts/process_directory.py agrC typing_until_sarA.csv typing_until_agrC.csv python ~/Scripts/process_directory.py yycG typing_until_agrC.csv typing_until_yycG.csv python ~/Scripts/process_directory.py psm.β1 typing_until_yycG.csv typing_until_psm.β1.csv python ~/Scripts/process_directory.py hlb typing_until_psm.β1.csv typing_until_hlb.csv python ~/Scripts/process_directory.py atlE typing_until_hlb.csv typing_until_atlE.csv python ~/Scripts/process_directory.py sdrG typing_until_atlE.csv typing_until_sdrG.csv python ~/Scripts/process_directory.py sdrH typing_until_sdrG.csv typing_until_sdrH.csv python ~/Scripts/process_directory.py ebh typing_until_sdrH.csv typing_until_ebh.csv python ~/Scripts/process_directory.py ebp typing_until_ebh.csv typing_until_ebp.csv python ~/Scripts/process_directory.py tagB typing_until_ebp.csv typing_until_tagB.csv python ~/Scripts/process_directory.py capC typing_until_tagB.csv typing_until_capC.csv python ~/Scripts/process_directory.py sepA typing_until_capC.csv typing_until_sepA.csv python ~/Scripts/process_directory.py dltA typing_until_sepA.csv typing_until_dltA.csv python ~/Scripts/process_directory.py fmtC typing_until_dltA.csv typing_until_fmtC.csv python ~/Scripts/process_directory.py lipA typing_until_fmtC.csv typing_until_lipA.csv python ~/Scripts/process_directory.py sceD typing_until_lipA.csv typing_until_sceD.csv python ~/Scripts/process_directory.py SE0760 typing_until_sceD.csv typing_until_SE0760.csv python ~/Scripts/process_directory.py esp typing_until_SE0760.csv typing_until_esp.csv python ~/Scripts/process_directory.py ecpA typing_until_esp.csv typing_until_ecpA.csv
-
plotTreeHeatmap
~/DATA/Data_PaulBongarts_S.epidermidis_HDRNA/plotTreeHeatmap #FastTree -gtr -nt variants/snippy.core_without_reference.aln > plotTreeHeatmap/snippy.core.tree #cp ../Data_Holger_S.epidermidis_short/plotTreeHeatmap/typing_104.csv . #cp ../Data_Holger_S.epidermidis_short/results_HDRNA_01-20/variants/snippy.core.aln_104.tree . #Run step 4 library(ggtree) library(ggplot2) library(dplyr) setwd("/home/jhuang/DATA/Data_PaulBongarts_S.epidermidis_HDRNA/plotTreeHeatmap_short/") # -- edit tree -- info <- read.csv("typing_until_ecpA.csv", sep="\t") # Convert 'ST' column to factor with levels in the desired order info$ST <- factor(info$ST, levels = c("5", "23", "35", "69", "86", "87", "130", "224", "487", "640", "none")) info$name <- info$Isolate tree <- read.tree("snippy.core.aln_104.tree") cols <- c("5"="darkgreen", "23"="cornflowerblue", "35"="green","69"="orange","86"="magenta","87"="brown", "130"="purple","224"="darkkhaki", "487"="cyan", "640"="pink","none"="grey") #"2"="", #"7"="seagreen3", #"9"="tan","14"="red", "17"="navyblue", #"73"="pink","81"="purple", #"89"="darksalmon", #"454"="blue","487"="cyan", "558"="skyblue2", "766"="blueviolet" #"5", "23", "35", "69", "86", "87", "130", "224", "487", "640", "none" heatmapData2 <- info %>% select(Isolate, gyrB, fumC, gltA, icd, apsS, sigB, sarA, agrC, yycG, psm.β1, hlb, atlE, sdrG, sdrH, ebh, ebp, tagB, capC, sepA, dltA, fmtC, lipA, sceD, SE0760, esp, ecpA) #ST, rn <- heatmapData2$Isolate heatmapData2$Isolate <- NULL heatmapData2 <- as.data.frame(sapply(heatmapData2, as.character)) rownames(heatmapData2) <- rn #heatmap.colours <- c("darkred", "darkblue", "darkgreen", "grey") #names(heatmap.colours) <- c("MT880870","MT880871","MT880872","-") #heatmap.colours <- c("cornflowerblue","darkgreen","seagreen3","tan","red", "navyblue", "gold", "green","orange","pink","purple","magenta","brown", "darksalmon","chocolate4","darkkhaki", "azure3", "maroon","lightgreen", "blue","cyan", "skyblue2", "blueviolet", "darkred", "darkblue", "darkgreen", "grey") #names(heatmap.colours) <- c("2","5","7","9","14", "17","23", "35","59","73", "81","86","87","89","130","190","290", "297","325", "454","487","558","766", "MT880870","MT880871","MT880872","-") #TEMP_DEACTIVATED! heatmap.colours <- c("cornflowerblue","darkgreen","seagreen3","tan","red", "navyblue", "purple", "green","cyan", "darkred", "darkblue", "darkgreen", "grey", "darkgreen", "grey") #TEMP_DEACTIVATED! names(heatmap.colours) <- c("SCCmec_type_II(2A)", "SCCmec_type_III(3A)", "SCCmec_type_III(3A) and SCCmec_type_VIII(4A)", "SCCmec_type_IV(2B)", "SCCmec_type_IV(2B&5)", "SCCmec_type_IV(2B) and SCCmec_type_VI(4B)", "SCCmec_type_IVa(2B)", "SCCmec_type_IVb(2B)", "SCCmec_type_IVg(2B)", "I", "II", "III", "none", "+","-") ## The heatmap colours should correspond to the factor levels #heatmap.colours <- #setNames(c("cornflowerblue","darkgreen","seagreen3","tan","red","green","orange","pink","brown","magenta","cyan", #"magenta","navyblue","cornflowerblue","gold","orange","darkgreen", "green", "seagreen3", #"chocolate4","brown","purple","pink", "tan","cyan","red"), #c("5", "23", "35", "69", "86", "87", "130", "224", "487", "640", "none", "P01", "P02", "P03", "P04", "P05", "P06", "P07", #"P08", "P10", "P11", "P12", "P16", "P17", "P19", "P20")) #mydat$Regulation <- factor(mydat$Regulation, levels=c("up","down")) #circular p <- ggtree(tree, layout='circular', branch.length='none') %<+% info + geom_tippoint(aes(color=ST)) + scale_color_manual(values=cols) + geom_tiplab2(aes(label=name), offset=1) png("ggtree.png", width=1260, height=1260) #svg("ggtree.svg", width=1260, height=1260) p dev.off() #gheatmap(p, heatmapData2, width=0.1, colnames_position="top", colnames_angle=90, colnames_offset_y = 0.1, hjust=0.5, font.size=4, offset = 5) + scale_fill_manual(values=heatmap.colours) + theme(legend.text = element_text(size = 14)) + theme(legend.title = element_text(size = 14)) + guides(fill=guide_legend(title=""), color = guide_legend(override.aes = list(size = 5))) png("ggtree_and_gheatmap_selected_genes.png", width=1690, height=1400) #svg("ggtree_and_gheatmap_mibi_selected_genes.svg", width=17, height=15) gheatmap(p, heatmapData2, width=2, colnames_position="top", colnames_angle=90, colnames_offset_y = 2.0, hjust=0.7, font.size=4, offset = 17) + scale_fill_manual(values=heatmap.colours) + theme(legend.text = element_text(size = 16)) + theme(legend.title = element_text(size = 16)) + guides(fill=guide_legend(title=""), color = guide_legend(override.aes = list(size = 5))) dev.off()
-
Response
As mentioned earlier, the presence-absence matrix of genes was generated using a BLASTn comparison between the genome and the gene sequences. Ensuring the accuracy of these gene sequences is crucial for reliable results. I prepared the sequences by searching GeneBank using the gene names, but there may be some ambiguity where the same gene name corresponds to different sequences. Attached, you will find the sequences for the 11 genes that were not identified in the isolates: gltA, psm-beta1, hlb, ebp, tagB, capA, sepA, fmtC, sceD, esp, and ecpA. I think it would be helpful if you could verify whether these sequences correspond to the genes we are looking for. In addition, I have attached a figure showing all samples with short-read sequencing. As illustrated in the plot, SCCmec and agr typing are still pending. If necessary, I can perform the typing and include the information in the plot. Alternatively, we could focus exclusively on the 10 samples obtained through long-read sequencing. Please let me know how you would prefer to proceed.
Enhanced Visualization of Gene Presence for the Selected Genes in Bongarts_S.epidermidis_HDRNA
Leave a reply