Author Archives: gene_x

高通量测序技术与基因组学研究方法

  1. RNA-seq:RNA测序,一种高通量测序技术,用于研究转录组,了解基因的表达水平和结构。

  2. miRNA-seq:miRNA测序,针对小分子microRNA(miRNA)的高通量测序技术,用于研究miRNA在调控基因表达中的作用。

  3. ncRNA-seq:非编码RNA测序,研究非编码RNA(ncRNA)的高通量测序技术,这些RNA不编码蛋白质但在基因调控和细胞功能中发挥重要作用。

  4. RNA-seq (CAGE):带有毛细管分析基因表达(CAGE)的RNA测序,一种定量测量基因起始位点和表达水平的方法。

  5. RNA-seq (RACE):带有快速扩增cDNA末端(RACE)的RNA测序,用于确定转录本的5’和3’末端。

  6. ssRNA-seq:单链RNA测序,一种特殊的RNA测序技术,用于研究单链RNA的结构和功能。

  7. ChIP-seq:染色质免疫沉淀测序,结合染色质免疫沉淀和高通量测序技术,用于研究蛋白质和DNA之间的相互作用。

  8. MNase-seq:微coccal核酸酶测序,利用核酸酶对染色质进行切割并进行高通量测序,用于研究染色质结构和核小体定位。

  9. MBD-seq:甲基CpG结合蛋白测序,通过捕获甲基化CpG位点来研究DNA甲基化模式。

  10. MRE-seq:甲基化敏感限制酶测序,利用甲基化敏感的限制性内切酶分析DNA甲基化水平。

  11. Bisulfite-seq:硫酸氢盐测序,用于检测DNA中的甲基化位点。

  12. Bisulfite-seq (reduced representation):简化表示硫酸氢盐测序,是一种降低成本和复杂性的硫酸氢盐测序方法。

  13. MeDIP-seq:甲基化DNA免疫沉淀测序,通过免疫沉淀来捕获甲基化DNA片段,用于研究全基因组甲基化模式。

  14. DNase-Hypersensitivity:DNase I超敏感位点分析,用于检测与转录因子结合和开放染色质区域相关的DNA位点。

  15. Tn-seq:转座子测序,一种用于研究基因功能和表达调控的技术,通过分析转座子插入的位置来了解基因的重要性。

  16. FAIRE-seq:甲醛辅助同位素沉淀测序,用于研究开放染色质区域,这些区域通常与基因调控元件有关。

  17. SELEX:系统进化逐渐丢失的相关性,一种用于筛选具有高亲和力的核酸序列的技术,常用于研究RNA结构和功能。

  18. RIP-seq:RNA免疫沉淀测序,结合RNA免疫沉淀和高通量测序技术,用于研究RNA与蛋白质之间的相互作用。 它是一种结合RNA免疫沉淀和高通量测序技术的方法,用于研究RNA与蛋白质之间的相互作用。这种技术对于揭示转录后调控机制以及RNA结合蛋白在基因表达和功能中的作用具有重要意义。

    RIP-seq实验的基本步骤如下:

    • 使用特异性抗体免疫沉淀目标RNA结合蛋白。
    • 沉淀后,提取与蛋白质结合的RNA片段。
    • 对沉淀的RNA片段进行逆转录,生成cDNA文库。
    • 对cDNA文库进行高通量测序。
    • 分析测序数据,识别与目标蛋白质结合的RNA片段。

    通过RIP-seq实验,研究人员可以了解RNA结合蛋白与哪些RNA序列发生相互作用,从而揭示蛋白质在RNA加工、转运、翻译和降解等过程中的功能。

    inteRNA是一个由欧洲联盟资助的研究项目,旨在研究非编码RNA(ncRNA)在生物体中的功能及其在疾病发生中的作用。Björn Voss教授是这个项目的一个参与者。这个项目的目标是通过高通量测序技术和生物信息学方法研究非编码RNA的生物学功能,以便更好地了解它们在细胞发育和疾病过程中的作用。这些研究成果有望为未来的诊断和治疗方法提供新的见解。

  19. ATAC-seq:活动染色质转座子测序,一种测定开放染色质区域的技术,用于研究基因调控和表达。

  20. ChIA-PET:染色质相互作用分析-蛋白质共沉淀测序,结合染色质免疫沉淀和染色质共沉淀技术,用于研究远程染色质相互作用和基因调控。

  21. Hi-C:一种用于研究染色质三维结构和相互作用的技术,通过高通量测序和计算分析来揭示染色质在细胞核中的空间组织。

A Timeline of the Development of Microarray and NGS Technologies

A timeline of the history of microarray and next-generation sequencing technologies:

  • Microarray Technology:

    • 1990s: The first microarrays were developed, which used small glass slides or nylon membranes to spot DNA or RNA probes.
    • 2000s: Microarray technology became widely used in genomics research for measuring gene expression levels, identifying single-nucleotide polymorphisms (SNPs), and detecting copy number variations (CNVs).
    • 2008: The first whole-genome microarray was developed, allowing researchers to measure the expression levels of all known genes in a given organism.
    • 2010s: With the emergence of next-generation sequencing technology, the use of microarrays declined somewhat, but they continue to be used for specific applications, such as validating gene expression levels or detecting chromosomal abnormalities.
  • Next-Generation Sequencing (NGS) Technology:

    • 2005: The first next-generation sequencing technology, 454 pyrosequencing, was introduced, allowing researchers to sequence DNA fragments up to several hundred base pairs long.
    • 2007: The Illumina/Solexa platform was introduced, which allowed for high-throughput sequencing of millions of short DNA fragments in parallel.
    • 2008: The SOLiD platform was introduced, which uses a different sequencing chemistry than Illumina and can detect certain types of genetic variations more accurately.
    • 2010s: NGS technology continued to evolve, with improvements in read length, accuracy, and cost-effectiveness. Applications of NGS technology expanded to include whole-genome sequencing, transcriptome sequencing, epigenetic analysis, metagenomics, and more.
    • 2014: The Oxford Nanopore MinION device was introduced, which uses a novel nanopore sequencing technology and can sequence long DNA or RNA molecules in real-time.
    • 2020s: NGS technology remains a critical tool in genomics research and is being used to advance precision medicine, drug discovery, and other areas of biomedical research.

Overall, microarray and NGS technologies have transformed the field of genomics and have allowed researchers to answer questions about the molecular basis of disease and other biological processes. While each technology has its own strengths and limitations, they continue to be complementary tools for genomic analysis.

多瘤病毒科家族中的MCPyV与TSPyV

MCPyV(梅克尔细胞多瘤病毒,Merkel cell polyomavirus)和TSPyV(纺锤状毛发发育不良相关多瘤病毒,Trichodysplasia spinulosa-associated polyomavirus)都是多瘤病毒科(Polyomaviridae)家族的成员,这是一类可以感染各种脊椎动物的小型双链DNA病毒。尽管它们都属于同一家族,但它们与不同类型的疾病相关。

MCPyV与一种罕见的皮肤癌——梅克尔细胞癌(Merkel cell carcinoma,MCC)有关。梅克尔细胞癌是一种快速生长的神经内分泌肿瘤,主要发生在皮肤表面。MCPyV在大约80%的梅克尔细胞癌患者中被发现。MCPyV感染通常是无害的,但在某些情况下,病毒可能会整合到宿主细胞的基因组中,导致细胞恶性转化和肿瘤发展。

与之相反,TSPyV与一种罕见的皮肤病——纺锤状毛发发育不良(Trichodysplasia spinulosa,简称TS)相关。这种病状的特点是毛囊纺锤状突起、脱发和毛囊异常生长,主要影响免疫受损的个体,如器官移植受者或艾滋病患者。

尽管MCPyV和TSPyV都属于多瘤病毒科家族,它们在致病机制、相关疾病和受影响人群方面存在显著差异。研究这些病毒将有助于更好地了解它们的感染和致病机制,以及为相关疾病的患者开发有效的治疗策略。

Guide to Submitting Data to GEO (Gene Expression Omnibus)

  1. Create an account: First, create a GEO account at https://www.ncbi.nlm.nih.gov/geo/submission/. If you already have an NCBI account, you can use the same credentials to log in.

  2. Upload data files via FTP: Upload your raw data and processed data files to the GEO server using an FTP client. Please refer to GEO’s FTP upload instructions: https://www.ncbi.nlm.nih.gov/geo/info/ftp.html.

  3. Download the appropriate template: Based on your data type, download the corresponding Excel template (called “SOFT” files) from the GEO submission guidelines page: https://www.ncbi.nlm.nih.gov/geo/info/seq.html. There are different templates for platforms, samples, and series.

  4. Prepare metadata in the Excel template: Fill out the Excel template with the required information about your samples, platform, and series (experiment). Be sure to follow the GEO guidelines for formatting and required fields.

    • Platform: Describe the technology used for data generation (e.g., microarray or RNA-seq). Provide platform details like manufacturer, layout, probe sequences, etc.

    • Samples: Provide sample details such as source, treatment, extraction protocol, labeling, and hybridization methods. Also, include any relevant clinical or phenotypic data.

    • Series: Describe the overall experiment design and goals, as well as any related publications or supplementary files.

  5. Submit the Excel template: Log in to the GEO Submission Portal (https://www.ncbi.nlm.nih.gov/geo/submission/) using your NCBI account. Click “Submit” to start a new submission and upload the completed Excel template.

    Download an Excel template

    Download an example Excel file for ChIP-seq submission

    Download an example Excel file for RNA-seq submission

  6. Notify GEO about your FTP file transfer (suitable for high-throughput sequencing or large microarray submissions and updates). GEO_notify_screenshot

  7. Wait for the review: The GEO team will review your submission and may contact you for additional information or clarification. Once your submission is approved, you will receive a confirmation email containing your GEO accession number(s).

  8. Cite your data: Include the GEO accession number(s) in any related publications or presentations to ensure proper attribution and facilitate data discovery.

For more detailed instructions and guidelines, visit the GEO Submission Guidelines page: https://www.ncbi.nlm.nih.gov/geo/info/submission.html.

Quick Instructions

  1. Check that GEO accepts your data type.
  2. Gather raw data files.
  3. Gather processed data files .
  4. Fill in Metadata Template (one seq type per template). Please review “Before completing your Metadata Template” below.
  5. Fill in MD5 Checksums sheet for any raw data files and processed data files referenced in Metadata Template.
  6. Create a folder on your computer that contains all raw and processed files and your completed Metadata Template in Excel format.
  7. FTP the entire data folder to GEO.
  8. Notify GEO using the ‘Submit to GEO’ web form, after the FTP transfer is complete; unannounced files will not be processed.
  9. Your submission is placed into the processing queue and reviewed within 5 business days; expect to receive an email from GEO curators with questions about your submission or the GEO accession numbers.

* Updating GEO records (that have been processed and approved) can be labor-intensive and time-consuming, so please carefully prepare your submission before you transfer your files to the GEO FTP server.

* A complete GEO submission consists of the following 3 components. If your transfer does not include all 3 components, please explain the reason in the comment box below. An incomplete submission may result in processing delays.

  • Completed metadata worksheet
  • Raw data
  • Processed data

* When this submission should be released to the public (more information about release dates)

  • Keep my existing release date
  • Specify a new future release date for the submission being updated (up to 4 years from today). New release dates apply only to submissions that are still private.

https://submit.ncbi.nlm.nih.gov/geo/submission/

https://www.ncbi.nlm.nih.gov/geo/info/faq.html#holduntilpublished

https://www.ncbi.nlm.nih.gov/geo/submitter/

https://www.ncbi.nlm.nih.gov/geo/subs/

耶尔森氏菌Type III分泌系统效应蛋白

  1. type III secretion system translocon subunit YopB:YopB (Yersinia outer protein B) 是Yersinia属细菌的一种效应蛋白,与YopD共同参与形成跨膜通道,使其他Yop效应蛋白能够穿过宿主细胞膜进入宿主细胞。

  2. type III secretion system translocon subunit YopD:YopD (Yersinia outer protein D) 与YopB共同作用,形成一种跨膜通道,有助于其他Yop效应蛋白进入宿主细胞。此外,YopD还参与调控Type III分泌系统(T3SS)的效应蛋白的分泌。

  3. type III secretion system effector YopK:YopK 是一种调控蛋白,主要作用是细调Type III分泌系统的效应蛋白进入宿主细胞的程度,从而平衡细菌的毒力和免疫逃避。

  4. T3SS effector protein-tyrosine-phosphatase YopH:YopH 是一种酪氨酸磷酸酶,可以抑制宿主细胞的信号传导,从而干扰细胞粘附和免疫细胞的功能,有助于细菌逃避宿主的免疫系统。

  5. type III secretion system effector acetyltransferase YopJ:YopJ 是一种酰基酶,可以抑制宿主细胞内的NF-κB和MAPK信号通路,进而抑制炎症反应和细胞凋亡,有助于细菌逃避宿主免疫系统。

  6. SctW family type III secretion system gatekeeper subunit YopN:YopN 主要作为Type III分泌系统的分泌调控蛋白,可防止Yop效应蛋白在细菌内过早分泌,以确保在适当时机释放。

  7. type III secretion system effector YopM:YopM 是一种调节蛋白,可以促进炎性细胞凋亡,抑制细胞因子的产生,从而降低宿主的炎症反应。

  8. T3SS polymerization control protein YopR:关于YopR的信息有限,可能是一个错误的命名,与YopM和YopN重复。

  9. type III secretion system effector GTPase activator YopE:YopE 是一种GTP酶激活蛋白(GAP),通过抑制宿主细胞的Rho GTP酶家族成员来破坏细胞骨架,从而削弱宿主细胞的免疫反应。

  10. type III secretion system effector protein kinase YopO/YpkA:YopO(又称YpkA)是一种丝氨酸/苏氨酸蛋白激酶,可以通过干扰宿主细胞的细胞骨架,抑制细胞迁移,从而影响免疫细胞的功能。此外,YopO还可以激活Rho GTP酶家族成员,影响宿主细胞的信号传导。

  11. T3SS effector cysteine protease YopT:YopT 是一种具有蛋白酶活性的效应蛋白,主要通过切割宿主细胞的Rho GTP酶家族成员,破坏宿主细胞的细胞骨架,从而干扰细胞粘附和迁移。

这11种Yersinia属细菌的Type III分泌系统效应蛋白在病原体侵染过程中具有重要作用。它们通过破坏宿主细胞的信号传导和细胞骨架、抑制炎症反应和细胞凋亡等方式,协同作用以维持病原体在宿主体内的生存和繁殖。了解这些效应蛋白的功能和作用机制对于研究Yersinia属细菌的致病机制和寻找新的治疗方法具有重要意义。

How to run AlphaFold2?

AlphaFold2 is a protein structure prediction model developed by DeepMind. To run AlphaFold2, you’ll need to follow these steps:

  1. Clone the AlphaFold repository:

    git clone https://github.com/deepmind/alphafold.git
    cd alphafold
  2. Set up the environment:

    You will need to install the necessary dependencies for AlphaFold. It’s recommended to use a Python virtual environment or a Conda environment.

    If you’re using a Python virtual environment, create and activate it:

    python3 -m venv alphafold_venv
    source alphafold_venv/bin/activate

    Then install the required packages:

    pip install -r requirements.txt

    If you prefer to use Conda, create a Conda environment and activate it:

    conda create -n alphafold python=3.8
    conda activate alphafold

    Then install the required packages:

    conda install -c conda-forge openmm
    conda install -c conda-forge pdbfixer
    pip install -r requirements.txt
  3. Download the necessary model data:

    You need to download the model parameters and databases. Create a directory to store the data:

    mkdir data

    Download the model parameters from the AlphaFold GitHub repository:

    wget -P data/ https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar
    tar -xf data/alphafold_params_2021-07-14.tar -C data/

    Download the necessary databases (e.g., UniRef, BFD, and MGnify). You can find instructions on how to download them in the README.md file in the AlphaFold repository or on their respective websites.

  4. Run AlphaFold2:

    You can run the AlphaFold2 using the provided run_alphafold.py script. For example, to predict the structure of a protein with the sequence in input.fasta, you can use the following command:

    python run_alphafold.py --fasta_paths=input.fasta --output_dir=output/ --preset=full_dbs --max_template_date=2099-12-31 --data_dir=data/

    This command will run the full AlphaFold2 pipeline with all available databases and store the resulting structures in the output/ directory.

    Make sure to replace input.fasta with the path to your input FASTA file, and adjust other options as needed.

  5. Analyze the results:

    After the prediction is finished, you can find the predicted structures in the output/ directory. The PDB files can be visualized using molecular visualization software such as PyMOL, Chimera, or VMD.

Plot phylogenetic tree_heatmap and MSA on yopBDJTEMKOH[NR]

Isolate yopJ yopB yopT yopE yopD yopM yopK yopO yopH
Yersinia_enterocolitica_1055Rr 1 1 1 1 1 1 1 1 1
Yersinia_enterocolitica_2516-87 0 1 1 1 1 1 1 1 1
Yersinia_enterocolitica_8081 1 1 1 1 1 1 1 1 1
Yersinia_enterocolitica_8081_bis 1 1 1 1 1 1 1 1 1
Yersinia_enterocolitica_KNG22703 1 1 1 1 1 1 1 1 1
Yersinia_enterocolitica_WA 1 1 1 1 1 0 1 1 1
Yersinia_enterocolitica_Y1 1 1 1 1 1 1 1 1 1
Yersinia_enterocolitica_Y11 1 1 1 1 1 1 1 1 1
Yersinia_enterocolitica_YE1 1 0 1 1 1 1 1 1 1
Yersinia_enterocolitica_YE165 1 1 1 1 0 1 1 0 1
Yersinia_enterocolitica_YE3 1 0 1 1 1 1 1 0 1
Yersinia_enterocolitica_YE5 1 0 1 1 1 1 1 1 1
Yersinia_enterocolitica_YE6 1 1 1 1 1 1 1 0 1
Yersinia_enterocolitica_YE7 1 1 1 1 1 1 1 1 1
Yersinia_pestis_1045 1 1 1 1 1 1 1 1 1
Yersinia_pestis_1412 1 1 0 1 1 1 1 1 1
Yersinia_pestis_1413 1 1 0 1 1 1 1 1 1
Yersinia_pestis_1522 1 1 0 0 1 1 1 1 1
Yersinia_pestis_20 1 1 1 1 1 1 1 1 1
Yersinia_pestis_2944 1 1 1 1 1 1 1 1 1
Yersinia_pestis_3067 1 1 0 1 1 1 1 1 1
Yersinia_pestis_3770 1 1 0 1 1 1 1 1 1
Yersinia_pestis_790 0 1 1 1 1 1 1 0 1
Yersinia_pestis_8787 1 1 0 1 1 1 1 1 1
Yersinia_pestis_91001 1 1 1 1 1 1 1 1 1
Yersinia_pestis_94 1 1 1 1 1 1 1 1 1
Yersinia_pestis_Angola 1 1 1 1 1 1 1 1 1
Yersinia_pestis_Angola_bis 1 1 1 1 1 1 1 1 1
Yersinia_pestis_Antiqua 1 1 1 1 1 1 1 1 1
Yersinia_pestis_Antiqua_bis 0 1 1 1 1 1 1 1 1
Yersinia_pestis_CO92 1 1 1 1 1 1 1 1 1
Yersinia_pestis_CO92_pgm-_pPCP1- 1 1 1 1 1 1 1 1 1
Yersinia_pestis_Cadman 1 1 1 1 1 1 1 1 1
Yersinia_pestis_D106004 1 1 1 1 1 1 1 1 1
Yersinia_pestis_D182038 1 1 1 1 1 1 1 1 1
Yersinia_pestis_Dodson 1 1 1 1 1 1 1 1 1
Yersinia_pestis_EV76-CN 1 1 1 1 1 1 1 1 1
Yersinia_pestis_El_Dorado 1 1 1 1 1 1 1 1 1
Yersinia_pestis_FDAARGOS_601 1 1 1 1 1 1 1 0 1
Yersinia_pestis_FDAARGOS_602 0 1 1 1 1 0 1 1 1
Yersinia_pestis_FDAARGOS_603 1 1 1 1 1 1 1 1 1
Yersinia_pestis_Harbin_35 1 1 1 1 1 1 1 0 1
Yersinia_pestis_Harbin_35_bis 1 0 1 1 1 1 1 0 1
Yersinia_pestis_Java9 1 1 1 1 1 1 1 0 1
Yersinia_pestis_KIM5 1 1 1 1 1 1 1 1 1
Yersinia_pestis_Nicholisk_41 1 1 1 0 1 1 1 0 1
Yersinia_pestis_PBM19 1 1 1 1 1 1 1 1 1
Yersinia_pestis_Pestoides_B 0 1 1 1 1 1 1 1 1
Yersinia_pestis_Pestoides_F 1 1 0 1 1 1 1 1 1
Yersinia_pestis_Pestoides_F_bis 1 1 0 1 1 1 1 1 1
Yersinia_pestis_Pestoides_G 1 1 0 1 1 1 1 1 1
Yersinia_pestis_R 1 1 1 1 1 1 1 1 1
Yersinia_pestis_S19960127 1 1 1 1 1 1 1 1 1
Yersinia_pestis_SCPM-O-B-5935_I-1996 1 1 1 1 1 1 1 1 1
Yersinia_pestis_SCPM-O-B-5942_I-2638 1 1 1 1 1 1 1 1 1
Yersinia_pestis_SCPM-O-B-6530 1 1 1 1 1 1 1 1 1
Yersinia_pestis_SCPM-O-B-6899_231 1 1 1 1 1 1 1 1 1
Yersinia_pestis_SCPM-O-DNA-18_I-3113 1 1 1 1 1 1 1 1 1
Yersinia_pestis_Shasta 1 1 1 1 1 1 1 1 1
Yersinia_pestis_Z176003 1 1 1 1 1 1 1 1 1
Yersinia_pseudotuberculosis_598 1 1 1 1 1 1 1 1 1
Yersinia_pseudotuberculosis_EP2+ 0 1 1 1 1 1 1 1 1
Yersinia_pseudotuberculosis_FDAARGOS_579 1 1 1 1 1 1 1 1 1
Yersinia_pseudotuberculosis_FDAARGOS_580 1 1 1 1 1 1 1 1 1
Yersinia_pseudotuberculosis_FDAARGOS_581 1 1 1 0 1 1 1 1 1
Yersinia_pseudotuberculosis_FDAARGOS_582 1 1 1 1 1 1 1 1 1
Yersinia_pseudotuberculosis_FDAARGOS_583 1 1 1 1 1 1 1 1 1
Yersinia_pseudotuberculosis_IP2666pIB1 1 1 1 1 1 1 1 1 1
Yersinia_pseudotuberculosis_IP32953 1 1 1 1 1 1 1 1 1
Yersinia_pseudotuberculosis_IP32953_bis 1 1 1 1 0 1 1 1 1
Yersinia_pseudotuberculosis_NZYP4713 1 1 1 1 1 1 1 1 1
Yersinia_pseudotuberculosis_PA3606 1 1 1 1 1 1 1 1 1
Yersinia_pseudotuberculosis_PB1+_bis 1 1 1 1 1 0 1 1 1

ggtree_and_gheatmap_yopK

alignment_yopK

  1. This step uses rsync to download data from the NCBI server to a local directory, save all gff-files in the directory prokka.

    rsync --copy-links --recursive --times --verbose rsync://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/696/305/GCF_001696305.1_UCN72.1 Yersinia_pestis_1045
  2. The step processes GFF files containing gene annotations for a set of samples in the directory prokka. The primary goal is to modify the GFF files and create new ones with specific changes and to save them in the directory prokka_plus. The script operates on each sample one by one, and for each sample, it performs the following steps:

    • Replace all occurrences of \tCDS\t with CDS in the original GFF file.

    • Extract all lines containing CDS and save them in a new file with the suffix _CDS.gff.

    • Replace all occurrences of ID= with ID_old= in the new _CDS.gff file.

    • Cut the second field (delimited by ;) from the _CDS.gff file and save it in a new file with the suffix _CDS_f2.

    • Replace all occurrences of Parent=gene- with ID= in the _CDS_f2 file.

    • Paste the contents of the _CDS.gff and _CDS_f2 files side by side, with a ; delimiter, and save the result in a new file with the suffix CDS.gff.

    • Run the enum.py script on the CDS.gff file to add line numbers at the end, and save the result in a new file with the suffix _CDS__.gff. import sys

        if len(sys.argv) < 2:
            print("Please provide a filename as an argument.")
            sys.exit(1)
      
        filename = sys.argv[1]
      
        try:
            with open(filename) as f:
                for i, line in enumerate(f):
                    print(f"{line.strip()}_{i+1}")
        except FileNotFoundError:
            print(f"File {filename} not found.")
    • Extract all lines from the original GFF file that do not contain CDS and save them in a new file with the suffix _nonCDS.gff.

    • Remove all lines containing ### from the _nonCDS.gff file and save the result in a new file with the suffix nonCDS.gff.

    • Concatenate the contents of the nonCDS.gff and _CDS__.gff files and save the result in a new file with the suffix _nonCDS_CDS.gff.

    • Replace all occurrences of CDS with \tCDS\t in the _nonCDS_CDS.gff file.

    • Append the string ##FASTA to the end of the _nonCDS_CDS.gff file.

    • Modify the FASTA file associated with the sample by replacing the first field (delimited by a space) with the corresponding sample name.

    • Concatenate the modified GFF file (_nonCDS_CDS.gff) and the modified FASTA file, and save the result in the ../prokka_plus/ directory with a new name based on the sample name.

    • After processing all samples, the script removes intermediate files generated during the process.

      for sample in Yersinia_pestis_1045 Yersinia_pestis_SCPM-O-B-6291_C-25 Yersinia_pestis_2944 Yersinia_pestis_KIM10+ Yersinia_pestis_M-1482; do
        sed -i 's/\tCDS\t/_CDS_/g' ${sample}.gff
        grep "_CDS_" ${sample}.gff > ${sample}_CDS.gff
        sed -i 's/ID=/ID_old=/g' ${sample}_CDS.gff
        cut -d';' -f2 ${sample}_CDS.gff > ${sample}_CDS_f2
        sed -i 's/Parent=gene-/ID=/g' ${sample}_CDS_f2
        paste -d';' ${sample}_CDS.gff ${sample}_CDS_f2 > ${sample}_CDS_.gff
        python enum.py ${sample}_CDS_.gff > ${sample}_CDS__.gff   # add a line number to end to avoid the sameple Gene_ID
      
        grep -v "_CDS_" ${sample}.gff > ${sample}_nonCDS.gff
        grep -v "###" ${sample}_nonCDS.gff > ${sample}_nonCDS_.gff
      
        cat ${sample}_nonCDS_.gff ${sample}_CDS__.gff > ${sample}_nonCDS_CDS.gff
        sed -i 's/_CDS_/\tCDS\t/g' ${sample}_nonCDS_CDS.gff
        echo "##FASTA" >> ${sample}_nonCDS_CDS.gff
      
        cut -d' ' -f1 ../assembly/${sample}.fna > ../assembly/${sample}.fasta;
        cat ${sample}_nonCDS_CDS.gff ../assembly/${sample}.fasta > ../prokka_plus/$(echo $sample | cut -d'_' -f3- | tr " " "_").gff;
      done
      rm *_CDS.gff *_CDS_f2 *_CDS_.gff *_CDS__.gff *_nonCDS.gff *_nonCDS_.gff
  3. This step runs Roary, a tool for pan-genome analysis. It takes annotated bacterial genomes in GFF3 format as input and clusters the genes based on sequence similarity.

    roary -p 4 -f ./roary -i 95 -cd 99 -s -e -n -v  prokka_plus/1045.gff prokka_plus/SCPM-O-B-6291_C-25.gff prokka_plus/2944.gff prokka_plus/KIM10+.gff
  4. This step extracts the coding sequences (CDS) of specific genes from multiple genome files and saves them to an output file. Start-files: roary/pan_genome_reference.fa and roary/gene_presence_absence.csv

    #grep "yopT" roary/gene_presence_absence.csv
    > yopT_seq.txt
    for gene_id in M486_RS20945_3996 YE105_RS20560_4012; do
      for gbff in  Yersinia_massiliensis_2011N-4075/GCF_013282765.1_ASM1328276v1/GCF_013282765.1_ASM1328276v1_genomic.gbff.gz Yersinia_pestis_EV_NIIEG/GCF_000590535.2_ASM59053v2/GCF_000590535.2_ASM59053v2_genomic.gbff.gz Yersinia_pestis_Shasta/GCF_000834335.1_ASM83433v1/GCF_000834335.1_ASM83433v1_genomic.gbff.gz Yersinia_ruckeri_NVI-492/GCF_023212565.2_ASM2321256v2/GCF_023212565.2_ASM2321256v2_genomic.gbff.gz Yersinia_pestis_Pestoides_G/GCF_000834985.1_ASM83498v1/GCF_000834985.1_ASM83498v1_genomic.gbff.gz; do
        output=$(python3 extract_CDS_of_a_locus_tag.py ${gbff} $(echo "${gene_id}" | cut -d '_' -f 1-2))
        if [[ ! -z "${output}" ]]; then
            gbff_short=$(echo "${gbff}" | cut -d '/' -f 1)
            printf "%s\t%s\n" "${gbff_short}" "${output}" >> yopT_seq.txt
        fi
      done
    done
    
    #-- code snippet of extract_CDS_of_a_locus_tag.py --
    from Bio import SeqIO
    import sys
    import gzip
    
    #python3 extract_CDS_of_a_locus_tag.py GCF_001188735.1_ASM118873v1_genomic.gbff.gz M486_RS20950
    
    # Get the file name from the command-line argument
    if len(sys.argv) == 3:
        filename = sys.argv[1]
        locus_tag = sys.argv[2]
    else:
        print(sys.argv)
        print("Error: no file name provided")
        sys.exit(1)
    
    # Open the compressed file and read the sequences
    with gzip.open(filename, "rt") as f:
    
        # Open the GenBank file and read the first record
        #    record = SeqIO.read("GCF_001188735.1_ASM118873v1_genomic.gbff", "genbank")
        for record in SeqIO.parse(f, "genbank"):
            #print("%s %i" % (record.id, len(record)))
    
            # Define the locus_tag you want to extract
            #locus_tag = "M486_RS20950"
    
            # Loop through the features and extract the CDS with the specified locus_tag
            for feature in record.features:
                if feature.type == "CDS" and "locus_tag" in feature.qualifiers and feature.qualifiers["locus_tag"][0] == locus_tag:
                    # Extract the CDS location information and sequence
                    location = feature.location
                    seq = location.extract(record).seq
                    #print(f">{locus_tag}")
                    print(f"{seq}")
                    # Translate the nucleotide sequence to protein sequence
                    #protein_seq = seq.translate()
                    #print(f"Locus tag: {locus_tag}\nProtein sequence: {protein_seq}")
  5. The given code converts a DNA sequence file to a protein sequence file, aligns the protein sequences using MAFFT or MUSCLE, and constructs a phylogenetic tree using FastTree.

    #mafft --clustalout --adjustdirection yopK_seqs.fasta > yopK_seqs.output
    #fasttree -gtr -gamma -nt  yopK_alignment.fasta > yopK_alignment.tree
    #raxml-ng --all --model GTR+G+ASC_LEWIS --prefix core_gene_raxml --threads 6 --msa yopK_alignment_.fasta --bs-trees 1000 -slow 
    #fasttree 
    > for yop in yopJ yopB yopT yopE yopD yopM yopK yopO yopH; do python3 ../../../plotTreeHeatmap/dna_to_protein.py ${yop}_seq.txt ${yop}_protein.fasta python3 ../../../plotTreeHeatmap/protein_alignment.py ${yop}_protein.fasta ${yop}_aligned_protein.fasta mafft awk -F ‘_’ ‘/^>/ { printf(“>%s”, $3); for (i = 4; i <= NF; ++i) printf("_%s", $i); printf("\n"); next } { print }' ${yop}_aligned_protein.fasta > ${yop}_aligned_protein_.fasta done fasttree yopE_aligned_protein_.fasta > yopE_aligned_protein.tree fasttree yopO_aligned_protein_.fasta > yopO_aligned_protein.tree fasttree yopT_aligned_protein_.fasta > yopT_aligned_protein.tree grep “>” yopE_aligned_protein.fasta > typing_yopE.csv grep “>” yopO_aligned_protein.fasta > typing_yopO.csv grep “>” yopT_aligned_protein.fasta > typing_yopT.csv #– construct typing_for yopE — cut -f1 -d’_’ typing_yopE.csv > f1 cut -f2 -d’_’ typing_yopE.csv > f2 cut -f3- -d’_’ typing_yopE.csv > f3_ paste f3_ typing_yopE.csv > temp1 paste f1 f2 > temp2 paste temp1 temp2 > typing_yopE_.csv #Isolate,Name,Genus,Species,yopE #R,Yersinia_pestis_R,Yersinia,pestis,Yes #20,Yersinia_pestis_20,Yersinia,pestis,Yes #… #– for yopO — cut -f1 -d’_’ typing_yopO.csv > f1 cut -f2 -d’_’ typing_yopO.csv > f2 cut -f3- -d’_’ typing_yopO.csv > f3_ paste f3_ typing_yopO.csv > temp1 paste f1 f2 > temp2 paste temp1 temp2 > typing_yopO_.csv #– for yopT — cut -f1 -d’_’ typing_yopT.csv > f1 cut -f2 -d’_’ typing_yopT.csv > f2 cut -f3- -d’_’ typing_yopT.csv > f3_ paste f3_ typing_yopT.csv > temp1 paste f1 f2 > temp2 paste temp1 temp2 > typing_yopT_.csv # code snippet of dna_to_protein.py from Bio import SeqIO from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord import sys def translate_dna_to_protein(dna_fasta_file, protein_fasta_file): protein_records = [] for record in SeqIO.parse(dna_fasta_file, “fasta”): protein_seq = record.seq.translate(to_stop=True) protein_record = SeqRecord(protein_seq, id=record.id, description=record.description) protein_records.append(protein_record) SeqIO.write(protein_records, protein_fasta_file, “fasta”) if __name__ == “__main__”: input_file = sys.argv[1] output_file = sys.argv[2] translate_dna_to_protein(input_file, output_file) # code snippet of protein_alignment.py import sys from Bio.Align.Applications import MafftCommandline, MuscleCommandline from Bio import SeqIO from Bio import AlignIO def run_alignment(input_file, output_file, aligner): if aligner.lower() == “mafft”: mafft_cline = MafftCommandline(input=input_file) stdout, stderr = mafft_cline() elif aligner.lower() == “muscle”: muscle_cline = MuscleCommandline(input=input_file, out=output_file) stdout, stderr = muscle_cline() else: print(“Invalid aligner. Please choose ‘mafft’ or ‘muscle’.”) sys.exit(1) with open(output_file, “w”) as aligned_fasta: aligned_fasta.write(stdout) if __name__ == “__main__”: if len(sys.argv) < 4: print("Usage: python protein_alignment.py input_fasta output_fasta aligner") print("Example: python protein_alignment.py input_protein.fasta aligned_protein.fasta mafft") sys.exit(1) else: input_fasta = sys.argv[1] output_fasta = sys.argv[2] aligner_choice = sys.argv[3] run_alignment(input_fasta, output_fasta, aligner_choice)
  6. This step generates a circular phylogenetic tree, a heatmap, and a multiple alignment sequence in a single figure. The tree is constructed from the aligned protein sequence “yopE_alignedprotein.fasta” and the heatmap is based on the “typingyopE.csv” data. The multiple alignment sequence is added into the final figure by extracting specific sequences from the aligned protein sequence file.

    library(ggtree)
    library(ggplot2)
    library(Biostrings)
    library(stringr)
    library(dplyr)
    
    library(ape)
    library(ggmsa)
    
    #install.packages("cowplot")
    library(cowplot)
    
    setwd("/home/jhuang/DATA/Data_Gunnar_Yersiniomics/plotTreeHeatmap/")
    
    # -- edit tree --
    #https://icytree.org/
    #0.000780
    #info <- read.csv("typing_198.csv")
    info <- read.csv("typing_yopE_.csv")
    
    info$name <- info$Isolate
    #rownames(info) <- info$name
    info$yopE <- factor(info$yopE)
    
    #tree <- read.tree("core_gene_alignment_fasttree_directly_from_186isoaltes.tree")  --> NOT GOOD!
    #tree <- read.tree("raxml.tree")
    #tree <- read.tree("yopK_alignment_modified.tree") 
    tree <- read.tree("yopE_aligned_protein.tree") 
    #tree <- read.tree("core_gene_raxml.raxml.bestTree") 
    cols <- c(Yes='purple2', No='skyblue2')     
    
    # -- tree --
    tree_plot <- ggtree(tree, layout='circular', branch.length='none') %<+% info + scale_color_manual(values=cols) + geom_tiplab2(aes(label=name), offset=1) + geom_tippoint(aes(color=yopE)) 
    #+ scale_color_manual(values=cols) + geom_tiplab2(aes(label=name), offset=1)
    #, geom='text', align=TRUE,  linetype=NA, hjust=1.8,check.overlap=TRUE, size=3.3
    #difference between geom_tiplab and geom_tiplab2?
    #+ theme(axis.text.x = element_text(angle = 30, vjust = 0.5)) + theme(axis.text = element_text(size = 20))  + scale_size(range = c(1, 20))
    #font.size=10, 
    png("ggtree2.png", width=1260, height=1260)
    #png("ggtree.png", width=1000, height=1000)
    #svg("ggtree.svg", width=1260, height=1260)
    tree_plot
    dev.off()
    
    # -- heatmap --
    #heatmapData2 <- info %>% select(Isolate, cgMLST, Lineage)
    heatmapData2 <- info %>% select(Isolate, Species)
    rn <- heatmapData2$Isolate
    heatmapData2$Isolate <- NULL
    heatmapData2 <- as.data.frame(sapply(heatmapData2, as.character))
    rownames(heatmapData2) <- rn
      #"1","2","3","4","9","12","13","14","16","18",  "19","30","32","36","39","41","42","43","44","53",  "64","79","83","84","92","140","148","154","171","172", "173","194","215","217","252","277","279","282","290","312", "335","NA"
    #https://bookdown.org/hneth/ds4psy/D-3-apx-colors-basics.html
    #"blue","cyan", "skyblue2", "azure3","blueviolet","darkgoldenrod",  "tomato","mediumpurple4","indianred", 
    #"cornflowerblue","darkgreen","seagreen3","tan","red","green","orange","pink","brown","magenta",     "cornflowerblue","darkgreen","red","tan","brown",
    
    #heatmap.colours <- c("cornflowerblue","darkgreen","seagreen3","tan","red",  "navyblue", "gold",     "lightcyan3","green","orange","pink","purple","magenta","brown", "darksalmon","chocolate4","darkkhaki",  "maroon","lightgreen",      "darkgreen","seagreen3","tan","red",  "navyblue", "gold",     "green","orange","pink","purple","magenta","brown", "darksalmon","chocolate4","darkkhaki", "lightcyan3", "maroon","lightgreen", "darkgrey")
    #names(heatmap.colours) <- c("pestis","pseudotuberculosis","similis","enterocolitica","frederiksenii","kristensenii","occitanica","intermedia","hibernica","canariae","alsatica","rohdei","massiliensis","bercovieri","aleksiciae","mollaretii","aldovae","ruckeri","entomophaga",     "1","1Aa","1B","2","2/3-9a","2/3-9b","4","5","6","8","10","11","12","13","14","16","17","29","NA")
    
    heatmap.colours <- c("cornflowerblue","darkgreen","seagreen3","tan","red",  "navyblue", "gold",     "lightcyan3","green","orange","pink","purple","magenta","brown", "darksalmon","chocolate4","darkkhaki",  "maroon","lightgreen")
    names(heatmap.colours) <- c("pestis","pseudotuberculosis","similis","enterocolitica","frederiksenii","kristensenii","occitanica","intermedia","hibernica","canariae","alsatica","rohdei","massiliensis","bercovieri","aleksiciae","mollaretii","aldovae","ruckeri","entomophaga")
    
    #"cornflowerblue","darkgreen","seagreen3","tan","red","green","orange","pink","brown","magenta","cornflowerblue",
    #"2.MED","4.ANT","2.ANT","1.ORI","1.IN","1.ANT","0.ANT3","0.PE4","0.PE3","0.PE2","1a",
    #mydat$Regulation <- factor(mydat$Regulation, levels=c("up","down"))
    #rochesterensis-->occitanica
    
    png("ggtree_and_gheatmap_yopE.png", width=1000, height=800)
    #png("ggtree_and_gheatmap.png", width=1290, height=1000)
    #png("ggtree_and_gheatmap.png", width=1690, height=1400)
    #svg("ggtree_and_gheatmap.svg", width=1290, height=1000)
    #svg("ggtree_and_gheatmap.svg", width=17, height=15)
    tree_heatmap_plot <- gheatmap(tree_plot, heatmapData2, width=0.2,colnames_position="top", colnames_angle=90, colnames_offset_y = 0.1, hjust=0.5, font.size=4, offset = 8) + scale_fill_manual(values=heatmap.colours) +  theme(legend.text = element_text(size = 14)) + theme(legend.title = element_text(size = 14)) + guides(fill=guide_legend(title=""), color = guide_legend(override.aes = list(size = 5)))  
    tree_heatmap_plot
    dev.off()
    
    #samtools faidx yopE_aligned_protein_.fasta "1522" > yopE_aligned_protein_sorted.fasta
    #samtools faidx yopE_aligned_protein_.fasta "Pestoides_F_bis" >> yopE_aligned_protein_sorted.fasta
    #...
    #samtools faidx yopE_aligned_protein_.fasta "8081" >> yopE_aligned_protein_sorted.fasta
    #samtools faidx yopE_aligned_protein_.fasta "WA" >> yopE_aligned_protein_sorted.fasta
    
    #https://bioconductor.org/packages/devel/bioc/vignettes/ggtreeExtra/inst/doc/ggtreeExtra.html
    #https://github.com/YuLab-SMU/supplemental-ggmsa
    #https://github.com/YuLab-SMU/ggmsa
    
    library(ggmsa)
    library(ggplot2)
    library(ggtree)
    #library(gggenes)
    library(ape)
    library(Biostrings)
    library(ggnewscale)
    library(dplyr)
    library(ggtreeExtra)
    library(phangorn)
    library(RColorBrewer)
    library(patchwork)
    library(ggplotify)
    library(aplot)
    library(magick)
    library(treeio)
    
    #data <- "../supplemental-ggmsa-main/data/s_RBD.fasta"
    data <- "yopE_aligned_protein_.fasta"
    #data <- "yopE_aligned_protein_sorted.fasta"
    #dms <- read.csv("data/DMS.csv")
    #del <- c("expr_lib1", "expr_lib2","expr_avg","bind_lib1","bind_lib2")
    #dms <- dms[,!colnames(dms) %in% del]
    tidymsa <- tidy_msa(data)
    #tidymsa <- assign_dms(tidymsa, dms)
    #Mapping the position to the protein-protein interaction plot
    #tidymsa$position <- tidymsa$position + 330
    
    #dms = TRUE, 
    png("alignment_yopE.png", width=1100, height=5400)
    msa_plot <- ggplot() +
    geom_msa(data = tidymsa, char_width = 0.5, seq_name = TRUE, show.legend = TRUE) + theme_msa() + facet_msa(50)
    #scale_fill_gradientn(name = "ACE2 binding", colors = c(colRD(75),colBU(25)))
    msa_plot
    dev.off()
    
    # #Combine the ggtree and ggmsa plots using the cowplot package:
    # combined_plot <- cowplot::plot_grid(msa_plot, tree_heatmap_plot, ncol = 1, align = "v", axis = "l", rel_heights = c(3, 2))
    # ggsave("combined_plot.png", combined_plot, width = 10, height = 10, dpi = 300)

Episome (外质体)

在中文里,“episome”可以翻译为“外质体”。外质体是一种遗传元素,它可以作为独立的环状DNA分子存在,也可以整合到宿主的染色体中。外质体与宿主的染色体DNA分开复制,并在细胞分裂过程中在细胞之间传递。无论是原核生物还是真核生物,都可以找到外质体。

在细菌中,外质体通常是质粒,质粒是一种小的环状DNA分子,携带对宿主有益的基因,如抗生素抗性或产生毒素的能力。细菌质粒可以通过称为“共轭”的过程在细菌细胞之间传递,从而促进有益基因在细菌种群中的传播。

在真核细胞中,例如人类细胞,外质体可以是病毒基因组,如Epstein-Barr病毒(EBV)基因组。感染后,EBV使其线性DNA环化并作为外质体建立潜伏在宿主细胞的细胞核中。在潜伏期间,病毒外质体在宿主细胞中保持存在,但不会立即造成损害,病毒可以在以后重新激活,可能导致疾病。作为外质体,病毒基因组可以避免被宿主免疫系统检测和清除,同时保持在重新激活后复制并产生新的病毒颗粒的能力。

An episome is a genetic element that can exist either independently as a circular DNA molecule or integrate into a host’s chromosome. Episomes replicate separately from the host’s chromosomal DNA and can be transferred between cells during cell division. They are found in both prokaryotic and eukaryotic organisms.

In bacteria, episomes are often plasmids, which are small, circular DNA molecules that carry genes beneficial to the host, such as antibiotic resistance or the ability to produce toxins. Bacterial plasmids can be transferred between bacterial cells through a process called conjugation, which promotes the spread of beneficial genes within a bacterial population.

In eukaryotic cells, such as human cells, episomes can be viral genomes, like the Epstein-Barr virus (EBV) genome. Upon infection, EBV circularizes its linear DNA and establishes latency as an episome in the host cell’s nucleus. During latency, the viral episome persists in the host cell without causing immediate damage, and the virus can reactivate at a later time, potentially causing disease. By existing as an episome, the viral genome can avoid detection and clearance by the host’s immune system while maintaining the ability to replicate and produce new virus particles upon reactivation.

Human gammaherpesvirus 4 (HHV-4) is another name for Epstein-Barr virus (EBV). EBV is a member of the Herpesviridae family and belongs to the Gammaherpesvirinae subfamily. It is a double-stranded DNA virus that infects humans and is associated with various diseases, including infectious mononucleosis, certain types of lymphomas, and nasopharyngeal carcinoma.

EBV establishes a latent infection in the host’s B cells, where it can persist as an episome. The virus can reactivate at a later time, potentially leading to disease.

EV (细胞外囊泡:生物学中的关键细胞间通讯调控者)

In biology, “EV” often stands for “extracellular vesicle.” Extracellular vesicles are membrane-bound particles released by cells into the extracellular environment. They play essential roles in cell-to-cell communication, both within the same organism and between different organisms.

Extracellular vesicles can range in size from approximately 30 nm to 1 µm or more, and their contents include proteins, lipids, and nucleic acids (such as DNA, mRNA, and microRNAs). These vesicles can be classified into various categories, such as exosomes, microvesicles, and apoptotic bodies, based on their biogenesis, size, and molecular composition.

EVs are involved in numerous physiological and pathological processes, including immune responses, angiogenesis, cell differentiation, and cancer progression. They can transfer their contents from one cell to another, influencing the recipient cell’s behavior and function. Given their importance in intercellular communication, extracellular vesicles have become a significant area of research in recent years, with potential applications in diagnostics, therapeutics, and drug delivery.

在生物学中,“EV”通常表示“细胞外囊泡”(extracellular vesicle)。细胞外囊泡是由细胞释放到细胞外环境的膜结构颗粒。它们在同一生物体内以及不同生物体之间的细胞间通讯中起着至关重要的作用。

细胞外囊泡的大小范围从约30纳米到1微米或更大,其内容包括蛋白质、脂质和核酸(如DNA、mRNA和microRNAs)。根据它们的生物生成途径、大小和分子组成,这些囊泡可以被分为不同的类别,例如外泌体(exosomes)、微囊泡(microvesicles)和凋亡小体(apoptotic bodies)。

EVs参与许多生理和病理过程,包括免疫应答、血管生成、细胞分化和癌症进展。它们可以将其内容从一个细胞传递到另一个细胞,影响受体细胞的行为和功能。鉴于它们在细胞间通信中的重要性,近年来细胞外囊泡已成为一个重要的研究领域,具有潜在的诊断、治疗和药物递送应用。

有趣的基因轶事

基因学是一个充满惊奇和趣事的领域。以下是一些有趣的基因轶事:

  1. 鳄梨基因:一项研究表明,鳄梨的基因组有一个特别之处,那就是它保留了一些远古恐龙时代的基因。这些基因可能帮助鳄梨适应恶劣的环境,使它们在漫长的历史中得以幸存。
  2. CCR5-Δ32:CCR5-Δ32是一种罕见的基因突变,能让人免疫艾滋病毒(HIV)。这种突变使得艾滋病毒无法进入人体免疫系统的细胞,从而保护突变基因携带者不受感染。
  3. 嗅觉差异:我们对气味的感知能力主要取决于基因。有一种名为OR6A2的基因,它决定了我们对香草和肉桂气味的喜好。携带某种突变形式的人会觉得香草和肉桂有恶臭。
  4. 抗草酸基因:有一种名为“抗草酸”基因,它使得某些人在进食菠菜、瑞士甜菜和甜菜等含草酸高的食物时不会感到不适。草酸会与钙结合形成结石,而携带这种基因的人则能够防止草酸在体内形成结石。
  5. 猫咪拟态:澳洲的一种叫做“猫咪拟态”的植物,通过模拟猫咪的气味来保护自己。这种植物产生一种化合物,它的气味类似猫咪的信息素,使得其天敌如老鼠等动物误以为有猫咪在附近,从而避开它们。
  6. 左撇子基因:尽管左撇子在全球人口中只占10%左右,但研究发现,一个名为LRRTM1的基因与左撇子有关。具有这种基因的人更有可能成为左撇子。
  7. 蓝眼睛基因:蓝眼睛实际上是一种罕见的基因突变。研究表明,所有蓝眼睛的人都有一个共同的祖先,这个祖先生活在大约6,000-10,000年前。这种突变影响了名为OCA2的基因,导致黑色素在虹膜中的减少,使眼睛呈现蓝色。
  8. 咖啡因代谢:每个人对咖啡因的反应不同,这主要归功于基因。携带CYP1A2基因特定突变的人能更快地代谢咖啡因,因此对咖啡因不那么敏感。
  9. 甜品基因:有些人对甜食特别钟爱,这可能与名为TAS1R2的基因有关。这个基因对甜味敏感,携带特定突变的人更喜欢甜食。
  10. 花生过敏基因:花生过敏是一种常见的食物过敏,携带名为HLA-DQ和HLA-DR的基因突变的人更容易对花生过敏。
  11. 音乐天赋基因:名为AVPR1A的基因与音乐天赋有关。携带这个基因特定突变的人在音乐创作、即兴表演和音乐记忆方面表现更好。
  12. 超级熬夜基因:有些人几乎不需要睡觉就能保持清醒和精力充沛。这与一种名为“超级熬夜基因”的基因突变有关,携带这种突变的人每天只需要较少的睡眠。
  13. 超级记忆基因:有些人拥有惊人的记忆力,能够轻松记住大量的细节。这可能与一种名为BDNF的基因变异有关。BDNF基因编码一种在大脑中起到关键作用的神经营养因子。携带这种基因变异的人可能具有更强大的记忆力。
  14. 恐怖片爱好者基因:有些人对恐怖片情有独钟,而这可能与一种名为COMT的基因变异有关。这个基因影响多巴胺代谢,多巴胺是一种与愉悦感相关的神经递质。携带这种基因变异的人在观看恐怖片时可能感到更多的刺激和愉悦。
  15. 不喜欢西兰花的基因:有些人对西兰花的味道特别反感,这可能与一种名为TAS2R38的基因变异有关。这个基因编码一种味觉受体,携带这种基因变异的人对西兰花中的苦味非常敏感。

这些有趣的基因轶事展示了基因对我们生活的各个方面的影响,从我们的生理特征到我们的行为习惯。随着科学家们对基因学的进一步研究,我们将不断发现更多有趣的基因现象。