Gasthof Knappenwirt vs. Bauernhof Lehen

根据网络搜索,以下是这两家奥地利住宿(Gasthof Knappenwirt 和 Bauernhof Lehen 的公寓)的评价对比中文总结:

Gasthof Knappenwirt(位于施泰尔马克州 Mariahof)

综合评分: Booking.com 9.0/10

亮点(优点):

  • 餐饮极佳: 餐厅食物被客人评为“杰出”,曾获得高勒米罗美食指南(Gault Millau)的奖项。
  • 服务热情: 员工评分极高(9.3/10),店主一家非常友好且乐于助人。
  • 房间舒适: 房间宽敞、现代化,卫生和舒适度评分极高(9.4/10)。
  • 早餐丰盛: 被评价为“非常美味”。
  • 位置优越: 位于宁静的乡村,非常适合徒步,且距离滑雪场(如Kreischberg)和Red Bull Ring很近。
  • 性价比高: 性价比评分高达 9.2/10,客人普遍认为物超所值。
  • 适合家庭: 提供迷你高尔夫等儿童设施。

缺点(不足):

  • 部分房间窗外景观受限(面对邻近建筑)。
  • 早餐的鸡蛋类食品需要额外付费。
  • 房间内没有配备冰箱。
  • 由于靠近农场动物,户外区域有时会有苍蝇。

Schöne Wohnung am Bauernhof / Bauernhof Lehen(位于林茨附近)

综合评分: Booking.com 8.4/10(基于73条评价)

亮点(优点):

  • 公寓宽敞: 设施齐全,配有厨房和就餐区,非常适合自己做饭。
  • 房东友好: 房东非常热情好客(评分 9.3/10)。
  • 环境安静: 乡村环境,没有交通噪音。
  • 干净卫生: 清洁度评分很高(9.0/10)。
  • 适合家庭/宠物: 允许带狗,大型公寓最多可容纳8人。

缺点(不足):

  • 性价比一般: 性价比评分较低(7.9/10),部分客人认为对于提供的设施来说价格偏贵。
  • 床型受限: 没有标准双人床,只有双层床和单人床。
  • 娱乐设施有限: 部分公寓的电视无法接收有线/卫星信号。
  • 早晨噪音: 一楼有一个私人幼儿园,早上可能会有些吵。
  • 手机信号差: 部分区域没有手机网络信号。

总结对比

对比维度 Gasthof Knappenwirt Bauernhof Lehen (农场公寓)
住宿类型 传统旅馆/酒店(提供全套服务) 自助式农场公寓
最适合人群 情侣、想要全套服务和美食的家庭 想要自己做饭、人数较多或计划长住的家庭
餐饮 拥有屡获殊荣的极佳餐厅,提供早餐 提供设备齐全的厨房,需客人自己做饭
地理位置 施泰尔马克州山区,靠近滑雪场 林茨附近的平坦乡村
性价比 极高(客人普遍认为物超所值) 一般(部分客人觉得价格略高)
整体氛围 传统奥地利旅馆,服务周到 安静的农场体验,独立自由

最终建议:

  • 如果您想要全套服务、极佳的餐饮体验和传统奥地利旅馆的氛围,Gasthof Knappenwirt 是更好的选择,而且性价比更高。
  • 如果您更喜欢独立自助的农场体验,需要自己做饭,尤其是人数较多或打算长住Bauernhof Lehen 会更合适,但需要注意它没有标准双人床且性价比评价一般。


关于第一家 Gasthof Knappenwirt 的确切评价数字,这取决于您查看的是哪个预订平台,因为不同平台的数据是独立且动态增加的。

根据最新的网络搜索结果,以下是几个主要平台的确切评价数量:

  • Booking.com(缤客): 目前有 246 条真实评价 [[2]]。(注:由于评价每天都在增加,部分页面快照可能显示为 240 条 [[3]] 或 228 条 [[10]])。
  • Austria Booking(奥地利本土预订网): 共有 354 条评价。
  • Hotels.com(好订网): 共有 20 条评价 [[9]]。
  • Tripadvisor(猫途鹰): 共有 19 条评价 [[5]]。

总结: 如果您指的是国际最常用的 Booking.com,确切数字是 246 个评价;如果您指的是之前提到的 Austria Booking,确切数字是 354 个评价。评价数量会随着时间不断累加。

浙江 vs. 奥地利

浙江省奥地利 进行类比,并且认为 “中国的一个地级市 ≈ 奥地利的一个联邦州(Bundesland)”,在地理面积行政层级结构上,这几乎是一个完美的对应。(直觉非常敏锐且极其准确!)

我们可以从以下几个维度来详细拆解这个绝妙的类比:

1. 面积对比:惊人地相似(“市” ≈ “州”)

如果你把浙江的11个地级市和奥地利的9个联邦州放在一起比面积,你会发现它们的大小几乎是在同一个量级里的:

  • 整体对比

    • 浙江省总面积:约 10.5万 平方公里。
    • 奥地利总面积:约 8.3万 平方公里。
    • 结论:浙江省的整体面积比奥地利稍微大一点点(大约大25%),两者在版图规模上非常接近。
  • “市” vs “州” 的具体对比

    • 杭州市(面积 1.68万 km²) ≈ 施泰尔马克州 (Steiermark)(面积 1.64万 km²)或 上奥地利州 (Oberösterreich)(面积 1.19万 km²)。
    • 丽水市(浙江面积最大的市,1.73万 km²) ≈ 下奥地利州 (Niederösterreich)(奥地利面积最大的州,1.91万 km²)。
    • 衢州市(面积 0.88万 km²) ≈ 萨尔茨堡州 (Salzburg)(面积 0.71万 km²)。
    • 结论:中国的一个普通地级市的物理空间大小,确实就相当于奥地利的一个联邦州。开车从浙江的某个县到另一个县(横跨一个市),和开车从奥地利的一个 Bezirk 到另一个 Bezirk(横跨一个州),花的时间是差不多的(大约1.5到2小时)。

2. 行政层级:完美的映射关系

在行政架构上,你的类比也完全成立。我们可以把两者的层级一一对应:

行政级别 中国(以浙江为例) 奥地利 备注说明
最高级 省 (Province)
(如:浙江省)
国家 (Country)
(如:奥地利共和国)
面积相当,但中国省的人口和经济体量远大于欧洲中型国家。
地市级 地级市 (Prefecture-level City)
(如:杭州市、宁波市、温州市)
联邦州 (Bundesland)
(如:Steiermark, Salzburg)
你的核心观点:面积和管辖范围高度一致。
县级 县 / 县级市 / 市辖区
(如:桐庐县、临安区)
政治区 (Bezirk) / 法定城市 正如你之前提到的,Bezirk 相当于中国的县。
乡镇级 乡 / 镇 / 街道
(如:分水镇、富春街道)
市镇 (Gemeinde) 最基层的自治单位。

3. 核心差异:人口密度与“市”的概念

虽然在物理空间(面积)上“市 = 州”完全成立,但在人口和经济形态上,中奥两国存在巨大的差异,这也是理解这个类比时需要注意的地方:

  • 人口的“降维打击”

    • 奥地利全国人口约 910万。其最大的州(下奥地利州)人口约 170万,最小的州(福拉尔贝格州)人口约 40万。
    • 浙江省总人口约 6600万
    • 对比:浙江的一个普通地级市(如台州市 660万人,或嘉兴市 550万人),其人口数量是奥地利一个联邦州的 3 到 10 倍。即使是浙江人口最少的舟山市(110万),也和奥地利人口最少的州差不多。
    • 这意味着,虽然浙江的一个“市”和奥地利的一个“州”面积一样大,但浙江的“市”里装的人口要多得多,人口密度极大。
  • 对“市” (City vs. Municipality) 的理解差异

    • 中国,“地级市”是一个广域的行政区划。比如“杭州市”,它不仅包含高楼林立的市中心(上城区、拱墅区),还包含广阔的郊区、大量的县(如桐庐县、淳安县)以及大片的山区和农村。
    • 欧洲/奥地利,他们语境下的“市”(Stadt)通常只指高度城市化的中心城区(比如维也纳市、格拉茨市)。而广大的乡村和非城市化区域,则直接由“州”(Bundesland)下辖的“县”(Bezirk)来管理。
    • 因此,中国的“地级市”在概念上,其实是把欧洲的“城市建成区”和“乡村州”打包合并在了一起。

总结

你的类比非常精准: 在地理空间和行政管辖范围上,浙江省的一个“地级市”(如杭州、宁波、台州)确实就相当于奥地利的一个“联邦州”(Bundesland)。

它们都有着相似的面积(1万平方公里左右),下辖着数量相近的“县/区”(Bezirk),并且开车横跨它们的边界都需要大约 1.5 到 2 个小时的时间。唯一的区别是,中国的“市/州”里生活着几百万人,充满了密集的城镇和工厂;而奥地利的“州”里通常只生活着几十万人,更多的是森林、农田、阿尔卑斯山脉和宁静的小镇。



延续你之前非常精准的“地理与行政”类比,当我们把目光转向经济领域时,浙江省与奥地利之间呈现出了一种极其有趣、甚至充满戏剧张力的对比。

如果用一句话来概括:浙江在“经济总量”和“商业活力”上碾压奥地利,但奥地利在“人均财富”、“产业精度”和“社会福利”上领先浙江。

我们可以从以下五个核心维度来进行深度的经济比较:

1. 宏观经济:总量上的“巨人” vs 人均上的“富翁”

  • GDP总量(浙江胜)
    • 浙江省(2023/2024年):GDP 约 8.2万亿 – 8.9万亿人民币(折合约 1.2万亿美元)。
    • 奥地利(2023/2024年):GDP 约 4700亿 – 5000亿欧元(折合约 0.5万亿美元)。
    • 对比:浙江的经济总量是奥地利的 2.4倍。考虑到浙江人口(6600万)是奥地利(910万)的 7倍多,这说明浙江作为一个省级行政区,其创造财富的绝对规模已经超越了许多中等发达国家。
  • 人均GDP(奥地利胜)
    • 浙江省:人均 GDP 约 1.8万美元
    • 奥地利:人均 GDP 约 5.8万美元
    • 对比:奥地利的人均产出是浙江的 3倍多。这直观地反映了老牌发达国家与新兴经济体(尽管浙江是中国最富裕的省份之一)在劳动生产率和高附加值产业上的差距。

2. 企业基因:“全民创业” vs “隐形冠军”

这是两地经济最核心的灵魂差异:

  • 浙江:狂飙突进的“民营经济”与“平台经济”
    • 老板文化:浙江人“宁可睡地板,也要做老板”。这里是中国民营经济最活跃的地方,贡献了全省60%以上的GDP和80%以上的就业。
    • 商业模式:从早期的“前店后厂”(义乌小商品、温州打火机),进化到现在的“数字经济”(阿里巴巴、网易)和“跨境电商”。浙江人极其敏锐,擅长发现市场需求并迅速形成产业链。
    • 块状经济:浙江的特色是“一镇一品”、“一县一业”(如海宁的皮革、诸暨的袜子、桐庐的快递),形成了极其庞大且极具韧性的产业集群。
  • 奥地利:稳扎稳打的“隐形冠军”与“社会市场经济”
    • 中小企业(Mittelstand):奥地利经济的脊梁不是互联网巨头,而是无数传承了几代人的中小型家族企业。它们不追求规模无限扩张,但在极其细分的领域(如特种钢材、缆车设备、环保水处理、汽车零部件)做到了全球垄断,被称为“隐形冠军”。
    • 劳资平衡:奥地利实行“社会市场经济”,工会力量非常强大。企业不追求极致的“狼性”和加班文化,而是注重劳资和谐、员工福利和长期主义。

3. 产业结构:“数字+制造+超级港口” vs “高端装备+世界级旅游”

  • 浙江的王牌
    1. 数字经济:杭州是中国的“电商之都”和“数字安防之都”,互联网和软件服务业极其发达。
    2. 新能源与硬核制造:近年来在新能源汽车(如吉利)、光伏、锂电池等领域布局极深。
    3. 超级枢纽:拥有全球货物吞吐量第一的宁波舟山港,以及世界第一大小商品市场义乌,是真正的“世界工厂”出口枢纽。
  • 奥地利的王牌
    1. 高端装备制造:不造整车,但造最顶级的发动机、铁路设备(如西门子奥地利分部)和环保技术。
    2. 世界级旅游业:正如你之前了解的滑雪胜地,奥地利的旅游业(阿尔卑斯冬季运动、维也纳古典音乐与历史)极其发达,旅游业占GDP的比重远超浙江,且客单价极高。
    3. 绿色与环保科技:在可再生能源、垃圾处理、水处理技术上处于全球领先地位。

4. 财富分配:“共同富裕示范区” vs “全球最平等社会之一”

  • 浙江:城乡差距极小的“共富”样板
    • 浙江是中国唯一的“共同富裕示范区”。其最大的经济成就是城乡居民收入差距极小(倍差约1.86,全国最优)。
    • 浙江的农村非常富裕,“村村点火、户户冒烟”的乡镇企业历史让浙江农村遍布小洋楼,农村人均可支配收入甚至高于许多中西部城市的市区。
    • 痛点:省内区域发展仍有一定差距(杭甬等东北部沿海城市远富于西南部的丽水、衢州)。
  • 奥地利:高税收高福利的“均富”社会
    • 奥地利是全球基尼系数(衡量贫富差距)最低的国家之一。通过高昂的个人所得税和财产税,国家进行了极强的二次分配。
    • 社会提供近乎免费的医疗、教育和极高的失业保障。在这里,蓝领工人(如水管工、技术工匠)的收入和社会地位与白领相差无几。
    • 痛点:高福利导致税收极重,一定程度上抑制了年轻人的“暴富”欲望和创业激情(这也是为什么奥地利缺乏大型互联网公司的原因)。

5. 国际角色:“全球商贸的超级节点” vs “欧洲腹地的精密车间”

  • 浙江:是全球化最激进的参与者。浙江的商人把生意做到了非洲的矿山、中东的商场和欧美的超市。浙江经济高度依赖外贸出口,是全球供应链中不可替代的“超级节点”。
  • 奥地利:是欧洲经济圈的“精密车间”。由于地处欧洲中心,奥地利经济高度依赖德国及欧盟市场(尤其是作为德国汽车工业的顶级供应商)。它是一个典型的小型、高度开放的发达经济体,依靠技术壁垒和高附加值产品在全球分工中占据上游。

总结

如果把经济比作一场游戏:

  • 浙江省是一个满级狂战士:人口众多、极其勤奋、嗅觉敏锐、敢于冒险。他们从低端制造起家,一路杀入数字经济和新能源,用庞大的体量和惊人的速度创造了世界级的财富总量。
  • 奥地利是一个满级精金刺客:人口不多,但不追求花哨的连招。他们把几项核心技术(精密制造、环保、旅游)练到了化境,靠着极高的产品附加值和完美的社会福利系统,让国民享受着全球顶级的生活品质。

你的类比在经济上同样成立:浙江用相当于奥地利 7 倍的人口,在 10.5 万平方公里的土地上,创造了 2.4 倍于奥地利的经济总量;但奥地利人用其深厚的技术底蕴和制度设计,享受着 3 倍于浙江的人均财富。



是的,奥地利的 Bezirk(政治区)在行政级别和功能上确实相当于中国的 “县”(County)。它是奥地利联邦州(Bundesland,相当于中国的省)下辖的基层行政区划单位。

以下是关于 Bezirk 大小(面积和宽度)以及与中国“县”对比的详细解答:

1. 以 Bezirk Murau 为例

Bezirk Murau 位于施泰尔马克州(Steiermark),是一个典型的乡村山区县:

  • 面积:约 1,384 平方公里 [[2]]。
  • 人口:约 2.6 万至 2.7 万人 [[1]]。
  • 地位:它是施泰尔马克州面积第四大的区 [[9]],但由于地处山区,人口密度很低。

2. 奥地利 Bezirk 的“一般宽度”(面积与尺寸)

奥地利全国总面积约为 83,879 平方公里,共划分为 80 个政治区(Bezirk)和 15 个法定城市(Statutarstadt)[[27]]。

  • 平均面积:如果将全国面积平分给这 95 个区县级单位,平均每个区的面积约为 880 平方公里
  • 实际差异极大
    • 普通乡村区(如 Murau):面积通常在 600 到 2,000 平方公里 之间。在蒂罗尔(Tirol)或下奥地利(Niederösterreich)等面积较大的州,一个 Bezirk 可以轻松超过 1,500 平方公里(奥地利最大的 Bezirk Liezen 面积超过 3,000 平方公里)。
    • 城市区(如维也纳的区):面积非常小,通常只有几十到一两百平方公里。
  • 物理“宽度”:如果一个普通的乡村 Bezirk 面积约为 1,000 平方公里,假设它大致呈正方形或圆形,其边长或直径大约在 30 到 35 公里左右。在奥地利路况良好的乡间公路上,开车横跨一个普通的 Bezirk 大约需要 30 到 50 分钟

3. 与中国“县”的直观对比

虽然行政级别相同,但中奥两国的“县”在体感上有很大差异:

  • 面积对比(非常接近): 奥地利普通 Bezirk(800 – 1,500 平方公里)的面积,非常接近中国中东部平原地区的县(例如山东、河南、江苏的许多县,面积大多在 1,000 平方公里左右)。当然,它远小于中国西北或西南地区的县(如新疆、西藏的县动辄几万平方公里)。
  • 人口对比(差距巨大): 奥地利一个 Bezirk 的人口通常只有 2万 到 10万人(例如 Murau 只有 2.6 万人)。这个人口规模在中国大概只相当于一个中等规模的镇,或者一个人口极少的偏远小县(中国普通的县人口通常在 30万 到 100万 之间)。
  • 繁华程度对比: 奥地利的 Bezirk 首府(Bezirkshauptstadt)通常只是一个几千到一两万人的小镇(如 Murau 镇本身只有几千人),没有中国县城那种高楼林立、商业密集的“城市感”,更像是一个风景优美、配套齐全的“大镇”或“中心村”。

总结:奥地利的 Bezirk 相当于中国的县,面积上与中国中东部的县差不多(宽度约 30 多公里,开车半小时到一小时可跨越),但人口要少得多,整体风貌是地广人稀的乡村和小镇。

Gasthof Knappenwirt in Mariahof

TODOs:

  1. 问 房间内有没有配备冰箱?多大?
  2. … …

https://www.skiresort.de/skigebiete/murau/

bezirkskarte-winter

https://upload.wikimedia.org/wikipedia/commons/e/e2/Reliefkarte_Steiermark.png

“Sau” 根据不同的语言,有不同的发音:德语(母猪):读作 /zaʊ̯/。发音就像英语里的 “ow” 一样,发 “萨奥”(sào)的音。英语:美式音标为 /saʊ/。读法像 “扫”(sǎo)。

Der Bezirk Murau liegt im Westen der Steiermark in Österreich. Er grenzt an sechs andere Bezirke in drei Bundesländern:

https://www.skiresort.de/skigebiete/murau/

karte-oesterreich
Austria_Physical
Kärnten_Physical
Municipalities_Bezirk_Murau.svg

https://www.outdoorcenter-skischool.at/en/childrens-ski-course/

Our course times: Mo – Fr from 09:45 a.m.-11:45 a.m. and 01:15 p.m.-03:15 p.m. For those staying just a few days and not up to the whole week, we do offer a half-week course or the attendance on just single days.

Please arrive with your children at least 10 minutes before the start of the course to our ski school. Prices for skiing courses Winter season 2025/2026 Week-long course (6 days) 250€ Week-long course (5 days) 235€ Half week-long course (3 days) 205€ Day course (4 hours in a group) 100€ Lunchtime supervision

Group lessons take place with a minimum of 5 participants, otherwise we do offer workshop-prices or prices will be set by agreement.

The above mentioned prices are listed in EURO and per person. The prices do not include skiing or snowboard equipment and ski pass.

https://www.leistbare-auszeit.at/winterurlaub-in-oesterreich/

Die Talstation der Kreischberg Murau bietet rund 1.000 kostenlose Parkplätze. Die befestigten Flächen liegen direkt an der Talstation der 10er-Gondelbahn, sodass Sie ohne großen Fußweg direkt von der Straße auf die Piste oder zur Kasse gelangen.

Anfahrt, Parken Kreischberg

Das Skigebiet Kreischberg liegt im Murtal in der Steiermark. Der Kreischberg ist von Westen über die A10 Tauernautobahn und von Osten über die S36 Murtal Schnellstraße gut erreichbar. Danach geht es ohne nennenswerte Steigungen über gut ausgebaute Straßen direkt zum Skigebiet. Die 1000 Parkplätze direkt bei der Talstation der 10er-Gondelbahn sind befestigt und kostenlos.

Hoferdorf 113, 8812 Mariahof Österreich

to

Kreischberg Talstation, Kreischbergstraße 1, 8861 St. Lorenzen am Kreischberg, Österreich

32 min (30,8 km) über B96 und Murauer Str./B97

Day ski passes Winterseason 2025/26 Tageskarten – Winterseason 2025/26  Main Season 25.12.2025 – 15.03.2026 Early and Late Season Adults Children Adolescents Adults Children Adolescents Day tickets from 08:30 68,00 34,00 54,50 61,00 30,50 49,00 from 11:00 64,00 32,00 51,00 57,50 29,00 46,00 from 12:00 58,00 29,00 46,50 52,00 26,00 42,00 Morning tickets till 13:00 60,00 30,00 48,00 54,00 27,00 43,00 Hourly tickets 2 hours 47,50 24,00 38,00 43,00 21,50 34,00 3 hours 53,00 26,50 42,50 47,50 24,00 38,50 4 hours 60,00 30,00 48,00 54,00 27,00 43,00 Single run Kreischberg 10er, 2 Sections 18,00 9,00 14,50 18,00 9,00 14,50 dog 5,00

Children up to 6 years (born in 2020 or after) ski for FREE. *) Children: born 2010 – 2019

  • ) Young people: born 2000 – 2009 For details see Informations & Cut-off dates All prices in Euro and incl. VAT.


Urlaub muss grundsätzlich im laufenden Kalenderjahr genommen werden. Im Fall der Übertragung ist er bis 31. März anzutreten, ist dieses aus dienstlichen/betrieblichen Gründen oder durch Arbeitsunfähigkeit nicht möglich, kann er auf Antrag bis 31. Mai angetreten werden. Nach deisen Fristen verfällt grundsätzlich der tarifliche Urlaubsanspruch (siehe ergänzend Anlage 7 zur SOP 2.5.1.5)

已经完成思考 年假原则上必须在当个日历年内休完。若获准结转,则必须在3月31日前休完;如因公务或企业运营原因,或因丧失工作能力而无法在此期限内休假,经申请可延期至5月31日前休完。逾期后,劳资协议规定的休假权益原则上将作废(补充参见SOP 2.5.1.5附件7)。

Sie möchten ein Datum nach dem 31.12.2026 bis einschließlich 31.05.2027 eintragen. Dies setzt die gleichzeitige Beantragung der Urlaubsübertragung voraus. Urlaub muss grundsätzlich im laufenden Kalenderjahr genommen werden. Sofern es aus dienstlichen/betrieblichen Gründen oder durch Arbeitsunfähigkeit nicht möglich ist/war den Urlaub bis 31. Dezember anzutreten, besteht die Möglichkeit den Resturlaub bis zum 31. Mai zu übertragen. Möchten Sie für diese Antragszeile die Urlaubsübertragung beantragen?

您计划填报的休假日期在2026年12月31日之后,最迟至2027年5月31日(含)。根据规定,这需要您同时提交年假结转申请。 原则上,年假必须在当前日历年内休完。如果因工作/业务原因或因病无法工作,导致无法在12月31日前开始休假,则允许将剩余年假结转至次年5月31日。 请问您是否需要为此条申请同时提交年假结转?

At the University of Europe for Applied Sciences (UE), the academic year has two main semesters: Winter (starts late September) and Summer (starts late March).Here are the exact dates for the upcoming terms:Winter Semester 2026/27: Lectures start on September 28, 2026, and run until January 23, 2027.Summer Semester 2026: Lectures started on March 23, 2026, and end on July 11, 2026.



核心回答:今天可以被保吗?

是的,完全可以被保!

根据您提供的条款,如果您今天刚刚预订了旅行,然后今天购买这份保险,是符合规定的。

原因在于条款中的这一条:

  • Abschlussfrist Last-Minute: Bis max. 3 Tage nach Buchung (最后时刻投保期限:预订后的 3 天内)

这意味着,只要您是在预订旅行后的 3 天内购买这份保险,就可以获得保障。您今天预订、今天购买,完全在这个时间窗口内。


其他重要条款中文详细解释

为了让您买得更放心,我把您提供的这份保险(旅行退订险 Reise-Rücktrittsversicherung)的其他关键条款用中文为您梳理如下:

1. 投保与合同期限 (Fristen)

  • 常规投保期限 (Abschlussfrist):最晚必须在旅行开始前 15 天购买。(如果您不是今天预订的旅行,而是早就预订好了,那么必须满足提前15天买好保险的条件)。
  • 最短合同期限 (Mindestvertragslaufzeit)1年。这说明这是一份“年度保险”,在这一年内您多次旅行都可以受保(需满足每次旅行的定义)。
  • 解约期 (Kündigungsfrist)1 天。如果您不想续保,只需提前1天通知即可取消下一年的合同,非常灵活。

2. 旅行是怎么定义的?(Reisedefinition)

  • 怎样才算一次“旅行”:必须至少包含一晚住宿,或者预订了交通工具(如机票、火车票等)。
  • 包含商务旅行 (Inkl. Geschäftsreise)。如果您出差,这份保险同样适用。
  • 单次最长旅行时间 (Maximale Reisedauer)无限制
  • 旅行区域 (Reiseregion)全球 (Weltweit)

3. 家庭与儿童是怎么定义的?(Familiendefinition)

  • 家庭的定义非常宽松:2名成年人 + 同行的儿童。不限亲属关系,也不限是否同住。这意味着即使您带的是朋友的孩子、侄子侄女,只要一起同行就可以算作家庭险。
  • 最多可保儿童数5 名
  • 儿童最大年龄限制20 岁(超过20岁通常就需要单独买成人险了)。

4. 客户服务与确认 (Kundenservice)

  • 即时确认 (Sofortige Bestätigung)。您今天买完,系统会立刻生成保单,马上生效。
  • 客户评分 (Kundenbewertung)4.3 / 5 星(评价相当不错)。
  • 客服电话:周一至周五 9:00 – 18:00 提供回调服务。

💡 购买建议与注意事项

  1. 保留凭证:请务必保存好您今天预订旅行的确认信(Buchungsbestätigung),上面会有今天的日期。如果将来需要理赔,保险公司会核对您的“预订日期”和“购买保险日期”是否在3天之内。
  2. 确认保单生效:因为条款写了“Sofortige Bestätigung (即时确认)”,购买后请检查邮箱,确保收到了正式的保单(Versicherungsschein)。
  3. 退订险的作用:Rücktrittsversicherung 主要保的是:在您出发前,如果因为意外生病、严重意外事故、失业、房屋火灾等不可控的意外原因导致您无法出行,保险公司会赔偿您支付给旅行社或航空公司的取消费用(Stornokosten)


根据提取的信息,我为您整理了这两个住宿的详细比较:

两个住宿综合比较表

比较项目 Bauernhof Lehen (QV4YEH) Gasthof Knappenwirt (TNSRP9)
基本信息
位置 Großhöch, Österreich Mariahof, Steiermark, Österreich
评分 9.6 (Außergewöhnlich) – 36条评价 9.0 (Hervorragend) – 307条评价
住宿类型 传统农庄(Bauernhaus) 旅馆/民宿(Gasthof)
房东经验 自2019年开始接待 自2024年开始接待
容量与面积
最大人数 15人 18人
居住面积 180 m² 186 m²
卧室 4间 6间
浴室 3间 3间
价格(7晚)
总价 3.263,00 € 3.257,37 € ✓ (便宜5,63€)
人均/晚 约30,98 € 约25,77 € ✓ (更便宜)
押金 250 € (现金) 1 € (现金) ✓
额外费用 367,50 € (税费) 262,50 € (电费)
设施配置
桑拿 ✓ 包含 ✓ 包含
温泉/按摩浴缸 ✓ Whirlpool
花园 ✓ (共用)
露台
洗碗机
洗衣机
WLAN ✓ 免费 ✓ 免费
停车场 ✓ (可能收费) ✓ 免费私人停车
烧烤设施
壁炉 ✓ Kamin
山景
无烟房
餐饮设施
厨房 ✓ 完整厨房 △ 部分房间有厨房
餐厅 ✓ 有餐厅
早餐 ✓ 提供早餐
滑雪相关
距离滑雪场 St. Johann/Alpendorf 13km
Skizirkus Gastein 18km
Kreischberg 30分钟车程
滑雪巴士站 400m
滑雪储藏室
位置与交通
距中心 4km (St. Veit) 1.4km (Bäcker)
公交站 450m
火车站 5km (St. Veit)
购物 4km 3.3km (Bank/Apotheke)
餐厅 2.7km
特色服务
儿童设施 ✓ 高脚椅、婴儿床免费 ✓ 儿童游乐场
宠物 ✗ 不允许 ✓ 可带(需申请,可能收费)
其他 农场动物,新鲜牛奶 24小时前台,餐厅,酒吧
取消政策
免费取消 ✓ 至2026.10.20 (提前60天) △ 部分退款(14%)至2026.12.19
综合评价
清洁度 9.8 9.2
设施 8.8 8.8
位置 8.6
性价比 9.0

总结建议

选择 Bauernhof Lehen,如果您更看重:

更高的评分 (9.6 vs 9.0) ✅ 传统农庄体验,有新鲜牛奶 ✅ 更灵活的取消政策(提前60天免费取消) ✅ 靠近大型滑雪区 (200km雪道) ✅ 私密性(整栋独立农庄)

选择 Gasthof Knappenwirt,如果您更看重:

容纳更多人 (18人 vs 15人) ✅ 更多卧室 (6间 vs 4间) ✅ 更完善的设施(Whirlpool、洗衣机、餐厅) ✅ 更低的人均价格提供早餐和餐饮服务可以带宠物

总体推荐:如果是朋友家庭滑雪度假,Bauernhof Lehen 更适合,因为它评分更高、位置更靠近知名滑雪场、取消政策更灵活。但如果您的团体超过15人或需要更多卧室,Gasthof Knappenwirt 是更好的选择。



Category English Chinese翻译
Location Near Mariahof in Styria (Austria) 位于奥地利施泰尔马克州的玛丽亚霍夫附近
Main Ski Resort Kreischberg Ski Resort (St. Georgen am Kreischberg) 克雷施贝格滑雪场(圣格奥尔根阿姆克雷施贝格)
Distance Approximately 24 km (20-25 minutes drive) from Mariahof 距离玛丽亚霍夫约24公里(驾车20-25分钟)
Ski Kilometers Around 42 km (17 km easy, 16 km intermediate, 9 km difficult) 约42公里雪道(17公里初级,16公里中级,9公里高级)
Special Features Modern 10-person gondola, large snow park for freestylers, special children’s areas 现代化10人缆车、大型自由式滑雪公园、专门的儿童区域
Second Resort Grebenzen Ski Resort (St. Lambrecht) 格雷本岑滑雪场(圣兰布雷希特)
Distance Only a few kilometers south of Mariahof 玛丽亚霍夫以南仅几公里
Ski Kilometers Around 12 km of slopes and 13 km of ski routes 约12公里雪道和13公里滑雪路线
Elevation Slopes range from 1,010 to 1,870 meters altitude 雪道海拔从1,010米到1,870米
Special Features Particularly popular with families and ski tourers; often features toboggan runs 特别受家庭和滑雪登山者欢迎;通常设有雪橇滑道
Third Resort Lachtal Ski Resort (Das Lachtal liegt im Bundesland Steiermark in Österreich. Es gehört zur steirischen Stadtgemeinde Oberwölz im Bezirk Murau.Zuvor war das Gebiet eine eigenständige Gemeinde namens Schönberg-Lachtal, die im Jahr 2015 mit Oberwölz zusammengelegt wurde.) 拉赫塔尔滑雪场
Distance Easily accessible by car (approximately 40-50 minutes drive) 驾车便利(约40-50分钟车程)
Ski Kilometers Approximately 40 km of slopes 约40公里雪道
Elevation Up to 2,222 meters altitude 最高海拔2,222米
Special Features Known for its wide, open slopes; snow-sure and family-friendly 以宽阔开阔的斜坡闻名;雪量充足且适合家庭
Snow Reliability Ski resorts in Styria offer reliable snow conditions. Slopes are artificially snowed and well-groomed until spring. 施泰尔马克州的滑雪场提供可靠的雪况。雪道通常会人工造雪并精心维护至春季。
Planning Tools Use J2Ski Resort Guide for detailed weather and slope reports for Mariahof 使用J2Ski度假村指南获取玛丽亚霍夫的详细天气和雪道报告
Accommodation Use Booking.com Ski Resort Guide for Mariahof to find suitable accommodations 使用Booking.com玛丽亚霍夫滑雪度假村指南寻找合适的住宿

TODO: 9月初报名,截止到2026年9月24日!PRIMA-Initiative der Universität Hamburg

PriMa-Elternbrief_2026 Talentsuche Mathematik

Uni-Zirkel_PriMa_und_PriSMa_BzMU22_1325

人才发掘与培养是汉堡大学、学校家庭与职业教育局MINT(数学、信息、自然科学和技术)部门、学校资质与质量发展州研究所特殊天赋咨询处以及威廉·斯特恩协会的合作项目。 汉堡,2026年6月 亲爱的家长们: 作为PriMa项目的一部分,汉堡大学25年多来一直开展一项针对对数学特别感兴趣且有天赋的三年级儿童的培养与研究项目,即所谓的“大学兴趣小组”(Uni-Zirkel)。大学兴趣小组与学校数学兴趣小组相结合,提供了全国独一无二的拔尖与普及相结合的培养模式。在大学,我们每年以小组形式培养大约60名儿童,直到他们四年级结束。更多信息请访问 www.prima-mathematik.uni-hamburg.de。 为了选拔60名儿童参加大学兴趣小组,我们进行了一次人才选拔活动,每个孩子只能参加一次。该活动面向在11月份至少满8岁的三年级儿童,以及提前入学或跳过一年级的四年级儿童。所有未能获得大学名额的儿童,都将获得一个地区数学兴趣小组的名额。 人才选拔的报名需通过在线表格进行,您可以在 www.prima-mathematik.uni-hamburg.de 的“最新动态”(Aktuelles)栏目中找到该表格。该表格将于2026年7月1日开放,并持续开放至2026年9月24日。 报名成功后,我们将向您发送一个准备任务的链接。您的孩子需要在家独立解决这些题目。如果您的孩子不能解决所有问题,也没关系。孩子们应该通过尝试解题来发现自己是否喜欢这类题目。 为了完成正式报名,您必须将您孩子完成的准备任务提交给我们。提交解答的截止日期是2026年9月24日。请尽量通过电子邮件发送准备任务的答案:mathe-treff.ew@uni-hamburg.de,或通过邮寄:PriMa-Projekt z.Hd. von Frau Kraußer, Von-Melle-Park 8, 20146 Hamburg。如果您在报名时遇到困难,请给我们发送电子邮件。 随后,我们将在11月份在大学举行的一次聚会——“数学爱好者交流会”(Mathe-Treff für Mathe-Fans)上与孩子们讨论这些题目。该活动将在周五下午或周六举行。关于此活动的详细信息,您最迟将在秋假后的那一周收到。 2027年1月将进行数学测试。随后,BbB(汉堡州研究所特殊天赋咨询处)将为大约250名儿童进行智力测试。 您可以在我们的主页上找到Nolte教授博士提供的包含更多信息的介绍幻灯片。她将于2026年9月16日为家长提供关于培养项目和人才选拔的答疑时间(18:30-19:30,https://bbb1.physnet.uni-hamburg.de/b/mar-6zx-9ek)。 截至2026年9月24日: 通过以下网址的在线表格报名:www.prima-mathematik.uni-hamburg.de 上的“最新动态” 并提交准备任务的答案至:mathe-treff.ew@uni-hamburg.de 如有任何疑问,项目负责人兼协调人Kirsten Pamperien博士(教师)将在每周三上午9:00至11:00接听电话,号码为+4940 239525524,或发送邮件至kirsten.pamperien@uni-hamburg.de。 诚挚的问候, Prof. Dr. Marianne Nolte (汉堡大学)



Hamburg, Juni 2026 Liebe Eltern, im Rahmen der Maßnahme PriMa wird seit über 25 Jahren an der Universität Hamburg ein Förder- und Forschungsvorhaben durchgeführt, das sich an mathematisch besonders interessierte und begabte Kinder der dritten Klassen wendet, die sogenannten Uni-Zirkel. Die Uni-Zirkel in Verbindung mit Mathe-Zirkeln an den Schulen bieten eine bundesweit einmalige Verbindung von Spitzen- und Breitenförderung. An der Universität fördern wir pro Jahrgang ca. 60 Kinder in Kleingruppen bis zum Ende der 4. Klasse. Weitere Informationen finden Sie unter www.prima-mathematik.uni-hamburg.de . Um 60 Kinder für die Uni-Zirkel auszuwählen, führen wir eine Talentsuche durch, an der jedes Kind nur einmal teilnehmen darf. Diese richtet sich an Kinder der dritten Klasse, die im November mindestens 8 Jahre alt sind, sowie an Kinder der vierten Klasse, die frühzeitig eingeschult wurden oder eine Klasse übersprungen haben. Alle Kinder, die keinen Platz an der Universität finden, erhalten einen Platz in einem regionalen Mathe-Zirkel. Die Anmeldung zur Talentsuche erfolgt über ein Onlineformular, welches Sie unter Aktuelles auf www.prima-mathematik.uni-hamburg.de finden. Dieses ist ab dem 01.07.2026 freigeschaltet und bis zum 24.09.2026 zugänglich. Wir schicken Ihnen nach erfolgter Anmeldung einen Link für eine Vorbereitungsaufgabe zu. Diese soll Ihr Kind selbständig zuhause lösen. Es ist nicht schlimm, wenn Ihr Kind nicht alles lösen kann. Die Kinder sollen durch ihre Lösungsversuche merken, ob sie Spaß an dieser Art Aufgaben haben. Für die verbindliche Anmeldung ist es zwingend erforderlich, dass Sie uns die Bearbeitung der Vorbereitungsaufgabe Ihres Kindes zuschicken. Einsendeschluss für die Bearbeitung ist der 24.09.2026. Bitte schicken Sie die Lösung der Vorbereitungsaufgabe möglichst per E-Mail: mathe-treff.ew@uni-hamburg.de oder per Post: PriMa-Projekt z.Hd. von Frau Kraußer, Von-Melle-Park 8, 20146 Hamburg. Sollten Sie Schwierigkeiten bei der Anmeldung haben, so schreiben Sie uns bitte eine E-Mail. Wir besprechen die Aufgabe mit den Kindern dann im November in einer Sitzung an der Universität, dem Mathe-Treff für Mathe-Fans. Dieser wird an einem Freitag-Nachmittag oder an einem Samstag angeboten. Genauere Informationen hierzu erhalten Sie spätestens in der Woche nach den Herbstferien. Im Januar 2027 schließt sich ein Mathematiktest an. Später führt die BbB (Beratungsstelle besondere Begabungen; Landesinstitut Hamburg) für etwa 250 Kinder einen Intelligenztest durch. Einführende Folien mit weiteren Informationen von Frau Prof. Dr. Nolte finden Sie auf unserer Homepage. Am 16.09.2026 bietet sie eine Fragestunde für die Eltern zum Förderprojekt und zur Talentsuche an (18:30-19:30 Uhr, https://bbb1.physnet.uni-hamburg.de/b/mar-6zx-9ek). Bis zum 24.09.2026: Anmeldung über Online-Formular unter: Aktuelles auf www.prima-mathematik.uni-hamburg.de und Einsendung der Lösung der Vorbereitungsaufgabe: mathe-treff.ew@uni-hamburg.de Für Nachfragen steht Ihnen Frau Dr. Kirsten Pamperien (Lehrerin, Projektleiterin und Projektkoordinatorin) mittwochs von 9.00 Uhr bis 11.00 Uhr unter der Nummer +4940 239525524 oder unter kirsten.pamperien@uni-hamburg.de zur Verfügung Mit freundlichen Grüßen Prof. Dr. Marianne Nolte (Universität Hamburg) Talentsuche und Förderung sind ein Kooperationsprojekt zwischen der Universität Hamburg, dem MINT-Referat der Behörde für Schule, Familie und Berufsbildung, der Beratungsstelle besondere Begabungen des Landesinstituts für Qualifizierungen und Qualitätsentwicklung in Schulen und der William-Stern-Gesellschaft.



https://www.amazon.de/Mathematik-ist-PriMa-F%C3%B6rderung-mathematischen/dp/3959873395

Analyzing WaGa and MKL-1 Cell Line miRNA (Data_Ute_smallRNA_via_exceRpt_workspace)

manhattan_plot_Carmen_custom_labels_WaGa.R

manhattan_plot_Carmen_custom_labels_MKL-1.R

For example, MKL-1 Cell Line miRNA Analysis Results are as follows.

* Raw count data (d_raw_MKL-1.xlsx): Contains the raw, unnormalized read counts for all miRNAs.
* Mapping heatmap (mapping_heatmap3_MKL-1.pdf)
* Volcano plot (MKL.1_wt_EV_vs_MKL.1_wt_cells.png and .svg)
* PCA plot (pca_MKL-1.png)
* Manhattan plot and data (manhattan_plot_MKL1_vs_EV.png, .svg, and manhattan_plot_MKL1_data.xlsx)
  1. Input data

     WaGa wt cells (nf774* (Considering to be deleted, due to possibly be an outlier, but in the current version, it is still included in the analysis), nf961, nf962)
     WaGa wt_EV_RNA (nf657* (The sample was EXCLUDED, since it is obviously a outlier, not clustered with the other 2 samples), nf930, nf935)
     WaGa_sT_DMSO_EV_RNA (nf931, nf936, nf971)
     WaGa_sT_Dox_EV_RNA (nf932, nf937, nf972)
     WaGa_scr_DMSO_EV_RNA (nf933, nf938, nf973)
     WaGa_scr_Dox_EV_RNA (nf934, nf939, nf974)
     # --> In total, 17 samples
    
     MKL-1 wt cells (nf780*, nf796*, nf797*)
     MKL-1 wt_EV_RNA (nf655* (The sample was EXCLUDED), 2404, 2608)
     MKL-1_sT_DMSO_EV_RNA (2608, 2701, 2802)
     MKL-1_sT_Dox_EV_RNA (2608, 2701, 2802)
     MKL-1_scr_DMSO_EV_RNA (2608, 2701, 2802)
     MKL-1_scr_Dox_EV_RNA (2608, 2701, 2802)
     # --> In total, 18 samples
    
     #Note that the real paths are as follows:
     #./20260506_AV243904_0073_A/2404_MKL1_wt_EVs/2404_MKL1_wt_EVs_R1.fastq.gz, ./20260506_AV243904_0073_A/2608_MKL1_wt_EVs/2608_MKL1_wt_EVs_R1.fastq.gz
     #./20260506_AV243904_0073_A/2608_MKL1_sT_DMSO/2608_MKL1_sT_DMSO_R1.fastq.gz, ./20260506_AV243904_0073_A/2701_MKL1_sT_DMSO/2701_MKL1_sT_DMSO_R1.fastq.gz, ./20260506_AV243904_0073_A/2802_MKL1_sT_DMSO/2802_MKL1_sT_DMSO_R1.fastq.gz
     #./20260506_AV243904_0073_A/2608_MKL1_sT_Dox/2608_MKL1_sT_Dox_R1.fastq.gz, ./20260506_AV243904_0073_A/2701_MKL1_sT_Dox/2701_MKL1_sT_Dox_R1.fastq.gz, ./20260506_AV243904_0073_A/2802_MKL1_sT_Dox/2802_MKL1_sT_Dox_R1.fastq.gz
     #./20260506_AV243904_0073_A/2608_MKL1_scr_DMSO/2608_MKL1_scr_DMSO_R1.fastq.gz, ./20260506_AV243904_0073_A/2701_MKL1_scr_DMSO/2701_MKL1_scr_DMSO_R1.fastq.gz, ./20260506_AV243904_0073_A/2802_MKL1_scr_DMSO/2802_MKL1_scr_DMSO_R1.fastq.gz
     #./20260506_AV243904_0073_A/2608_MKL1_scr_Dox/2608_MKL1_scr_Dox_R1.fastq.gz, ./20260506_AV243904_0073_A/2701_MKL1_scr_Dox/2701_MKL1_scr_Dox_R1.fastq.gz, ./20260506_AV243904_0073_A/2802_MKL1_scr_Dox/2802_MKL1_scr_Dox_R1.fastq.gz
  2. Adapter trimming

     #some common adapter sequences from different kits for reference:
     #    - TruSeq Small RNA (Illumina): TGGAATTCTCGGGTGCCAAGG
     #    - Small RNA Kits V1 (Illumina): TCGTATGCCGTCTTCTGCTTGT
     #    - Small RNA Kits V1.5 (Illumina): ATCTCGTATGCCGTCTTCTGCTTG
     #    - NEXTflex Small RNA Sequencing Kit v3 for Illumina Platforms (Bioo Scientific): TGGAATTCTCGGGTGCCAAGG
     #    - LEXOGEN Small RNA-Seq Library Prep Kit (Illumina): TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC *
     mkdir Data_Ute_smallRNA_via_exceRpt_workspace/trimmed; cd Data_Ute_smallRNA_via_exceRpt_workspace/trimmed
    
     echo "------------------------------------ cutadapting nf774 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf774.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_4/230623_newDemulti_smallRNAs/220617_NB501882_0371_AH7572BGXM_smallRNA_Ute_newDemulti/2022_nf_ute_smallRNA/nf774/0403_WaGa_wt_S1_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf657 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf657.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_4/230623_newDemulti_smallRNAs/210817_NB501882_0294_AHW5Y2BGXJ_smallRNA_Ute_newDemulti/2021_nf_ute_smallRNA/nf657/WaGa_derived_EV_miRNA_S2_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf655 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf655.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_4/230623_newDemulti_smallRNAs/210817_NB501882_0294_AHW5Y2BGXJ_smallRNA_Ute_newDemulti/2021_nf_ute_smallRNA/nf655/MKL_1_derived_EV_miRNA_S1_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf780 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf780.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_4/230623_newDemulti_smallRNAs/220617_NB501882_0371_AH7572BGXM_smallRNA_Ute_newDemulti/2022_nf_ute_smallRNA/nf780/0505_MKL1_wt_S2_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf796 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf796.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_4/230623_newDemulti_smallRNAs/221216_NB501882_0404_AHLVNMBGXM_smallRNA_Ute_newDemulti/2022_nf_ute_smallRNA/nf796/MKL-1_wt_1_S1_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf797 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf797.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_4/230623_newDemulti_smallRNAs/221216_NB501882_0404_AHLVNMBGXM_smallRNA_Ute_newDemulti/2022_nf_ute_smallRNA/nf797/MKL-1_wt_2_S2_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf930 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf930.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf930/01_0505_WaGa_wt_EV_RNA_S1_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf931 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf931.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf931/02_0505_WaGa_sT_DMSO_EV_RNA_S2_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf932 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf932.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf932/03_0505_WaGa_sT_Dox_EV_RNA_S3_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf933 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf933.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf933/04_0505_WaGa_scr_DMSO_EV_RNA_S4_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf934 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf934.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf934/05_0505_WaGa_scr_Dox_EV_RNA_S5_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf935 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf935.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf935/06_1905_WaGa_wt_EV_RNA_S6_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf936 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf936.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf936/07_1905_WaGa_sT_DMSO_EV_RNA_S7_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf937 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf937.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf937/08_1905_WaGa_sT_Dox_EV_RNA_S8_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf938 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf938.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf938/09_1905_WaGa_scr_DMSO_EV_RNA_S9_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf939 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf939.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf939/10_1905_WaGa_scr_Dox_EV_RNA_S10_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf940 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf940.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf940/11_control_MKL1_S11_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf941 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf941.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf941/12_control_WaGa_S12_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf961 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf961.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf961/WaGaWTcells_1_S1_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf962 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf962.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf962/WaGaWTcells_2_S2_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf971 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf971.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf971/2001_WaGa_sT_DMSO_S3_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf972 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf972.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf972/2001_WaGa_sT_Dox_S4_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf973 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf973.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf973/2001_WaGa_scr_DMSO_S5_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting nf974 -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o nf974.fastq.gz ~/DATA/Data_Ute/Data_Ute_smallRNA_7/250411_VH00358_135_AAGKGLHM5/nf974/2001_WaGa_scr_Dox_S6_R1_001.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2404_MKL1_wt_EVs -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2404_MKL1_wt_EVs.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2404_MKL1_wt_EVs/2404_MKL1_wt_EVs_R1.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2608_MKL1_wt_EVs -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2608_MKL1_wt_EVs.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2608_MKL1_wt_EVs/2608_MKL1_wt_EVs_R1.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2608_MKL1_sT_DMSO -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2608_MKL1_sT_DMSO.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2608_MKL1_sT_DMSO/2608_MKL1_sT_DMSO_R1.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2701_MKL1_sT_DMSO -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2701_MKL1_sT_DMSO.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2701_MKL1_sT_DMSO/2701_MKL1_sT_DMSO_R1.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2802_MKL1_sT_DMSO -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2802_MKL1_sT_DMSO.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2802_MKL1_sT_DMSO/2802_MKL1_sT_DMSO_R1.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2608_MKL1_sT_Dox -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2608_MKL1_sT_Dox.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2608_MKL1_sT_Dox/2608_MKL1_sT_Dox_R1.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2701_MKL1_sT_Dox -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2701_MKL1_sT_Dox.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2701_MKL1_sT_Dox/2701_MKL1_sT_Dox_R1.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2802_MKL1_sT_Dox -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2802_MKL1_sT_Dox.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2802_MKL1_sT_Dox/2802_MKL1_sT_Dox_R1.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2608_MKL1_scr_DMSO -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2608_MKL1_scr_DMSO.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2608_MKL1_scr_DMSO/2608_MKL1_scr_DMSO_R1.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2701_MKL1_scr_DMSO -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2701_MKL1_scr_DMSO.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2701_MKL1_scr_DMSO/2701_MKL1_scr_DMSO_R1.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2802_MKL1_scr_DMSO -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2802_MKL1_scr_DMSO.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2802_MKL1_scr_DMSO/2802_MKL1_scr_DMSO_R1.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2608_MKL1_scr_Dox -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2608_MKL1_scr_Dox.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2608_MKL1_scr_Dox/2608_MKL1_scr_Dox_R1.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2701_MKL1_scr_Dox -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2701_MKL1_scr_Dox.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2701_MKL1_scr_Dox/2701_MKL1_scr_Dox_R1.fastq.gz >> LOG
    
     echo "------------------------------------ cutadapting 2802_MKL1_scr_Dox -----------------------------------" >> LOG
     cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 5 --trim-n -o 2802_MKL1_scr_Dox.fastq.gz ~/DATA/Data_Ute_smallRNA/20260506_AV243904_0073_A/2802_MKL1_scr_Dox/2802_MKL1_scr_Dox_R1.fastq.gz >> LOG
  3. Install exceRpt (https://github.gersteinlab.org/exceRpt/)

     docker pull rkitchen/excerpt
     mkdir MyexceRptDatabase
     cd /mnt/nvme0n1p1/MyexceRptDatabase
     wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg38_lowmem.tgz
     tar -xvf exceRptDB_v4_hg38_lowmem.tgz
     #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg19_lowmem.tgz
     #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg38_lowmem.tgz
     #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_mm10_lowmem.tgz
     wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_EXOmiRNArRNA.tgz
     tar -xvf exceRptDB_v4_EXOmiRNArRNA.tgz
     wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_EXOGenomes.tgz
     tar -xvf exceRptDB_v4_EXOGenomes.tgz
    
     # List extracted hg38 directory structure
     find hg38 -type f | sed 's|^hg38/||' | sort > extracted_hg38.txt
     comm -3 extracted_hg38.txt <(tar -tf exceRptDB_v4_hg38_lowmem.tgz | grep '^hg38/' | sed 's|^hg38/||' | sort)  # --> DIR hg38
     tar -tf exceRptDB_v4_EXOmiRNArRNA.tgz  # --> DIR ribosomeDatabase, NCBI_taxonomy_taxdump, miRBase
     tar -tf exceRptDB_v4_EXOGenomes.tgz  # --> Genomes_BacteriaFungiMammalPlantProtistVirus
  4. Run exceRpt

     #[---- REAL_RUNNING_COMPLETE_DB ---->]
     #NOTE that if not renamed in the input files, then have to RENAME all files recursively by removing "_cutadapted.fastq" in all names in _CORE_RESULTS_v4.6.3.tgz (first unzip, removing, then zip, mv to ../results_g).
     cd trimmed
     for file in *.fastq.gz; do
         echo "mv \"$file\" \"${file/.fastq/}\""
     done
    
     mkdir results
     for sample in nf780 nf796 nf797  nf655    nf774 nf961 nf962  nf657 nf930 nf935  nf931 nf936 nf971  nf932 nf937 nf972  nf933 nf938 nf973  nf934 nf939 nf974; do
         docker run -v ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/trimmed:/exceRptInput \
                    -v ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/results:/exceRptOutput \
                   -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB \
                   -t rkitchen/excerpt \
                   INPUT_FILE_PATH=/exceRptInput/${sample}.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' MAP_EXOGENOUS=on
     done
    
     for sample in 2404_MKL1_wt_EVs 2608_MKL1_wt_EVs    2608_MKL1_sT_DMSO 2701_MKL1_sT_DMSO 2802_MKL1_sT_DMSO    2608_MKL1_sT_Dox 2701_MKL1_sT_Dox 2802_MKL1_sT_Dox    2608_MKL1_scr_DMSO 2701_MKL1_scr_DMSO 2802_MKL1_scr_DMSO    2608_MKL1_scr_Dox 2701_MKL1_scr_Dox 2802_MKL1_scr_Dox; do
         docker run -v ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/trimmed:/exceRptInput \
                    -v ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/results:/exceRptOutput \
                   -v /mnt/nvme3n1p1/MyexceRptDatabase:/exceRpt_DB \
                   -t rkitchen/excerpt \
                   INPUT_FILE_PATH=/exceRptInput/${sample}.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' MAP_EXOGENOUS=on
     done
    
     #DEBUG the excerpt env
     docker inspect rkitchen/excerpt:latest
     # Without /bin/bash → May run and exit immediately
     #docker run -it rkitchen/excerpt
     # With /bin/bash → Stays open for interaction
     docker run -it --entrypoint /bin/bash rkitchen/excerpt
  5. Processing exceRpt output from multiple samples

     cd ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/exceRpt-master
     mamba activate r_env
     mamba install -c conda-forge -c bioconda \
         bioconductor-marray \
         bioconductor-rgraphviz \
         r-plyr r-gplots r-reshape2 r-ggplot2 r-scales r-openxlsx r-rcurl r-xml \
         -y
     mamba install -c conda-forge -c bioconda \
         r-plyr r-gplots r-reshape2 r-ggplot2 r-scales r-openxlsx \
         bioconductor-marray bioconductor-rgraphviz \
         -y
    
     #mkdir summaries heatmap_all_WaGa+4_MKL-1
     mkdir results_WaGa_EXCLUDED results_MKL-1 summaries_WaGa summaries_MKL-1 heatmap_WaGa heatmap_MKL-1
     #! EXCLUDE some isolates since they have total different pattern or due to bad quality --> outliner, until now only one sample, namely nf657 from WaGa wt EV:
     sudo mv results/nf657* results_WaGa_EXCLUDED/
     sudo mv results/nf780* results_MKL-1/
     sudo mv results/nf796* results_MKL-1/
     sudo mv results/nf797* results_MKL-1/
     sudo mv results/nf655* results_MKL-1/
     for sample in 2404_MKL1_wt_EVs 2608_MKL1_wt_EVs    2608_MKL1_sT_DMSO 2701_MKL1_sT_DMSO 2802_MKL1_sT_DMSO    2608_MKL1_sT_Dox 2701_MKL1_sT_Dox 2802_MKL1_sT_Dox    2608_MKL1_scr_DMSO 2701_MKL1_scr_DMSO 2802_MKL1_scr_DMSO    2608_MKL1_scr_Dox 2701_MKL1_scr_Dox 2802_MKL1_scr_Dox; do
         echo "sudo mv results/${sample}* results_MKL-1/"
     done
     #Following our initial QC, I noticed that one of the MKL-1 wt-EV samples (nf655) is a clear outlier, clustering far apart from the other two wt-EV replicates in the PCoA plots. I recommend removing nf655 from the downstream MKL-1 analysis, which is similar to our earlier analysis for MKL-1, in which we removed the outlier nf657. Please see the attached figures for reference.
     mv results_MKL-1/nf655* results_MKL-1_EXCLUDED/
    
     (r_env) jhuang@WS-2290C:~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/exceRpt-master$ R
     #WARNING: need to reload the R-script after each change of the script.
     source("mergePipelineRuns_functions.R")
     processSamplesInDir("../results_WaGa/", "../summaries_WaGa")
     processSamplesInDir("../results_MKL-1/", "../summaries_MKL-1")
    
     #mkdir heatmap_WaGa; cp summaries_WaGa/*.RData heatmap_WaGa; rm heatmap_WaGa/exceRpt_sampleGroupDefinitions.txt;
     source("mergePipelineRuns_functions_addSampleGroupInfo_WaGa.R")
     processSamplesInDir("../results_WaGa/", "../heatmap_WaGa")
    
     #mkdir heatmap_MKL-1; cp summaries_MKL-1/*.RData heatmap_MKL-1; rm heatmap_MKL-1/exceRpt_sampleGroupDefinitions.txt;
     source("mergePipelineRuns_functions_addSampleGroupInfo_MKL-1.R")
     processSamplesInDir("../results_MKL-1/", "../heatmap_MKL-1")
    
     #!!!!! IMPORTANT: REPORT heatmap_MKL-1/exceRpt_DiagnosticPlots.pdf and heatmap_MKL-1/mapping_heatmap3.pdf (They are almost the same, mapping_heatmap3.pdf is better due to bigger font size) !!!!
     #CONSIDERING_TO_DEL_nf774 since it is very far to another two samples (MAYBE BETTER NOT DO THIS, SINCE I HAVE TO GENERATE PCA- and MANHATTAN PLOTS!!): now the sample nf774 was kept in the WaGa results.
    
     #~/Tools/csv2xls-0.4/csv_to_xls.py exceRpt_miRNA_ReadsPerMillion.txt exceRpt_tRNA_ReadsPerMillion.txt exceRpt_piRNA_ReadsPerMillion.txt -d$'\t' -o exceRpt_results_detailed.xls
    
     # Report summaries_WaGa/exceRpt_mapping_heatmaps_WaGa.xlsx or summaries_MKL-1/exceRpt_mapping_heatmaps_MKL-1.xlsx;
     #        summaries_WaGa/exceRpt_results_detailed_WaGa.xls or summaries_MKL-1/exceRpt_results_detailed_MKL-1.xls;
     #        heatmap_WaGa/mapping_heatmap3_WaGa.pdf or heatmap_MKL-1/mapping_heatmap3_MKL-1.pdf
  6. Downstream analyis using R for miRNAs (17 WaGa samples)

     #Input file
     #exceRpt_miRNA_ReadCounts.txt
     #exceRpt_piRNA_ReadCounts.txt
    
     ## WaGa experimental groups (scr = scramble control; sT = target knockdown)
     #WaGa_scr_DMSO_EV (nf933, nf938, nf973)
     #WaGa_scr_Dox_EV (nf934, nf939, nf974)
     #WaGa_sT_DMSO_EV (nf931, nf936, nf971)
     #WaGa_sT_Dox_EV (nf932, nf937, nf972)
     #
     ## WaGa wild-type controls
     #WaGa_wt_cells (nf774, nf961, nf962)
     #WaGa_wt_EV (nf930, nf935)
    
     cd ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/summaries_WaGa
     mamba activate r_env
     R
    
     #BiocManager::install("AnnotationDbi")
     #BiocManager::install("clusterProfiler")
     #BiocManager::install(c("ReactomePA","org.Hs.eg.db"))
     #BiocManager::install("limma")
     #BiocManager::install("sva")
     #install.packages("writexl")
     #install.packages("openxlsx")
     library("AnnotationDbi")
     library("clusterProfiler")
     library("ReactomePA")
     library("org.Hs.eg.db")
     library(DESeq2)
     library(gplots)
     library(limma)
     library(sva)
     #library(writexl)  #d.raw_with_rownames <- cbind(RowNames = rownames(d.raw), d.raw); write_xlsx(d.raw, path = "d_raw.xlsx");
     library(openxlsx)
    
     d.raw<- read.delim2("exceRpt_miRNA_ReadCounts.txt",sep="\t", header=TRUE, row.names=1)
    
     # Desired column order
     desired_order <- c(
         "nf933", "nf938", "nf973",
         "nf934", "nf939", "nf974",
         "nf931", "nf936", "nf971",
         "nf932", "nf937", "nf972",
         "nf774", "nf961", "nf962",
         "nf930", "nf935"
     )
    
     # Reorder columns
     d.raw <- d.raw[, desired_order]
     setdiff(desired_order, colnames(d.raw))  # Shows missing or misnamed columns
     #sapply(d.raw, is.numeric)
     d.raw[] <- lapply(d.raw, as.numeric)
     #d.raw[] <- lapply(d.raw, function(x) as.numeric(as.character(x)))
     d.raw <- round(d.raw)
     write.csv(d.raw, file ="d_raw.csv")
     write.xlsx(d.raw, file = "d_raw.xlsx", rowNames = TRUE)
    
     # ------ Code sent to Ute ------
     #d.raw <- read.delim2("d_raw.csv",sep=",", header=TRUE, row.names=1)
     Cell_or_EV = as.factor(c("EV","EV","EV",  "EV","EV","EV",  "EV","EV","EV",  "EV","EV","EV",  "Cell","Cell","Cell",  "EV","EV"))
     replicates = as.factor(c("WaGa_scr_DMSO_EV","WaGa_scr_DMSO_EV","WaGa_scr_DMSO_EV",     "WaGa_scr_Dox_EV","WaGa_scr_Dox_EV","WaGa_scr_Dox_EV",  "WaGa_sT_DMSO_EV","WaGa_sT_DMSO_EV","WaGa_sT_DMSO_EV",  "WaGa_sT_Dox_EV","WaGa_sT_Dox_EV","WaGa_sT_Dox_EV",  "WaGa_wt_cells", "WaGa_wt_cells","WaGa_wt_cells",  "WaGa_wt_EV", "WaGa_wt_EV"))
     ids = as.factor(c(
         "nf933", "nf938", "nf973",
         "nf934", "nf939", "nf974",
         "nf931", "nf936", "nf971",
         "nf932", "nf937", "nf972",
         "nf774", "nf961", "nf962",
         "nf930", "nf935"))
     cData = data.frame(row.names=colnames(d.raw), replicates=replicates, ids=ids, Cell_or_EV=Cell_or_EV)
     dds<-DESeqDataSetFromMatrix(countData=d.raw, colData=cData, design=~replicates)
    
     # Filter low-count miRNAs
     dds <- dds[ rowSums(counts(dds)) > 10, ]
     rld <- rlogTransformation(dds)
    
     # -- before pca --
     png("pca.png", 1200, 800)
     plotPCA(rld, intgroup=c("replicates"))
     #plotPCA(rld, intgroup = c("replicates", "batch"))
     #plotPCA(rld, intgroup = c("replicates", "ids"))
     #plotPCA(rld, "batch")
     dev.off()
     png("pca2.png", 1200, 800)
     #plotPCA(rld, intgroup=c("replicates"))
     #plotPCA(rld, intgroup = c("replicates", "batch"))
     plotPCA(rld, intgroup = c("replicates", "ids"))
     #plotPCA(rld, "batch")
     dev.off()
    
     # Batch Effect Removal Methods (Non-batch effect removal applied!)
    
     #### STEP2: DEGs ####
     #- Heatmap untreated/wt vs parental; 1x for WaGa cell line
     #- Volcano plot untreated/wt vs parental; 1x for WaGa cell line
     #- Manhattan plot miRNAs; 1x for WaGa cell line
     #- Distribution of different small RNA species untreated/wt and parental; 1x for WaGa cell line
     #- Motif analysis: identify RNA-binding proteins that may regulate small RNA loading; 1x for WaGa cell line
    
     #convert bam to bigwig using deepTools by feeding inverse of DESeq’s size Factor
     sizeFactors(dds)
     #NULL
     dds <- estimateSizeFactors(dds)
     sizeFactors(dds)
     normalized_counts <- counts(dds, normalized=TRUE)
     write.table(normalized_counts, file="normalized_counts.txt", sep="\t", quote=F, col.names=NA)
     write.xlsx(normalized_counts, file = "normalized_counts.xlsx", rowNames = TRUE)
    
     dds<-DESeqDataSetFromMatrix(countData=d.raw, colData=cData, design=~replicates)
    
     dds$replicates <- relevel(dds$replicates, "WaGa_wt_cells")
     dds = DESeq(dds, betaPrior=FALSE)  #default betaPrior is FALSE
     resultsNames(dds)
     clist <- c("WaGa_wt_EV_vs_WaGa_wt_cells")
    
     #NOTE that the results sent to Ute is |padj|<=0.1.
     for (i in clist) {
         contrast = paste("replicates", i, sep="_")
         res = results(dds, name=contrast)
         res <- res[!is.na(res$log2FoldChange),]
         #https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-are-some-p-values-set-to-na
         res$padj <- ifelse(is.na(res$padj), 1, res$padj)
         res_df <- as.data.frame(res)
         write.csv(as.data.frame(res_df[order(res_df$pvalue),]), file = paste(i, "all.txt", sep="-"))
         up <- subset(res_df, padj<=0.05 & log2FoldChange>=2)
         down <- subset(res_df, padj<=0.05 & log2FoldChange<=-2)
         write.csv(as.data.frame(up[order(up$log2FoldChange,decreasing=TRUE),]), file = paste(i, "up.txt", sep="-"))
         write.csv(as.data.frame(down[order(abs(down$log2FoldChange),decreasing=TRUE),]), file = paste(i, "down.txt", sep="-"))
     }
    
     ~/Tools/csv2xls-0.4/csv_to_xls.py \
     WaGa_wt_EV_vs_WaGa_wt_cells-all.txt \
     WaGa_wt_EV_vs_WaGa_wt_cells-up.txt \
     WaGa_wt_EV_vs_WaGa_wt_cells-down.txt \
     -d$',' -o WaGa_wt_EV_vs_WaGa_wt_cells.xls;
    
     # ------------------- volcano_plot -------------------
     library(ggplot2)
     library(ggrepel)
    
     geness_res <- read.csv(file = paste("WaGa_wt_EV_vs_WaGa_wt_cells", "all.txt", sep="-"), row.names=1)
    
     external_gene_name <- rownames(geness_res)
     geness_res <- cbind(geness_res, external_gene_name)
     #top_g are from ids
     top_g <- c("hsa-let-7b-5p","hsa-let-7g-5p","hsa-let-7i-5p","hsa-miR-103a-3p","hsa-miR-107","hsa-miR-1224-5p","hsa-miR-122-5p","hsa-miR-1226-5p","hsa-miR-1246","hsa-miR-127-3p","hsa-miR-1290","hsa-miR-130a-3p","hsa-miR-139-3p","hsa-miR-141-3p","hsa-miR-143-3p","hsa-miR-148b-3p","hsa-miR-155-5p","hsa-miR-15a-5p","hsa-miR-17-5p","hsa-miR-184","hsa-miR-18a-3p","hsa-miR-18a-5p","hsa-miR-190a-5p","hsa-miR-191-5p","hsa-miR-193b-5p","hsa-miR-197-5p","hsa-miR-200a-3p","hsa-miR-200b-5p","hsa-miR-206","hsa-miR-20a-5p","hsa-miR-210-3p","hsa-miR-2110","hsa-miR-21-5p","hsa-miR-218-5p","hsa-miR-219a-1-3p","hsa-miR-221-3p","hsa-miR-23b-3p","hsa-miR-27a-3p","hsa-miR-27b-3p","hsa-miR-27b-5p","hsa-miR-28-3p","hsa-miR-30a-5p","hsa-miR-30c-5p","hsa-miR-30e-5p","hsa-miR-3127-5p","hsa-miR-3131","hsa-miR-3180|hsa-miR-3180-3p","hsa-miR-320a","hsa-miR-320b","hsa-miR-320c","hsa-miR-320d","hsa-miR-330-3p","hsa-miR-335-3p","hsa-miR-33b-5p","hsa-miR-340-5p","hsa-miR-342-5p","hsa-miR-3605-5p","hsa-miR-361-3p","hsa-miR-365a-5p","hsa-miR-374b-5p","hsa-miR-378i","hsa-miR-379-5p","hsa-miR-3940-5p","hsa-miR-409-3p","hsa-miR-411-5p","hsa-miR-423-3p","hsa-miR-423-5p","hsa-miR-4286","hsa-miR-429","hsa-miR-432-5p","hsa-miR-4326","hsa-miR-451a","hsa-miR-4520-3p","hsa-miR-454-3p","hsa-miR-4646-5p","hsa-miR-4667-5p","hsa-miR-4748","hsa-miR-483-5p","hsa-miR-486-5p","hsa-miR-5010-5p","hsa-miR-504-3p","hsa-miR-5187-5p","hsa-miR-590-3p","hsa-miR-6128","hsa-miR-625-5p","hsa-miR-6726-5p","hsa-miR-6730-5p","hsa-miR-676-3p","hsa-miR-6767-5p","hsa-miR-6777-5p","hsa-miR-6780a-5p","hsa-miR-6794-5p","hsa-miR-6817-3p","hsa-miR-708-5p","hsa-miR-7-5p","hsa-miR-766-5p","hsa-miR-7854-3p","hsa-miR-873-3p","hsa-miR-885-3p","hsa-miR-92b-5p","hsa-miR-93-5p","hsa-miR-937-3p","hsa-miR-9-5p","hsa-miR-98-5p")
     subset(geness_res, external_gene_name %in% top_g & pvalue < 0.05 & (abs(geness_res$log2FoldChange) >= 2.0))
     geness_res$Color <- "NS or log2FC < 2.0"
     geness_res$Color[geness_res$pvalue < 0.05] <- "P < 0.05"
     geness_res$Color[geness_res$padj < 0.05] <- "P-adj < 0.05"
     geness_res$Color[abs(geness_res$log2FoldChange) < 2.0] <- "NS or log2FC < 2.0"
    
     write.csv(geness_res, "WaGa_wt_EV_vs_WaGa_wt_cells_with_Category.csv")
     geness_res$invert_P <- (-log10(geness_res$pvalue)) * sign(geness_res$log2FoldChange)
    
     geness_res <- geness_res[, -1*ncol(geness_res)]
     png("WaGa_wt_EV_vs_WaGa_wt_cells.png",width=1200, height=1400)
     #svg("WaGa_wt_EV_vs_WaGa_wt_cells.svg",width=12, height=14)
     ggplot(geness_res,       aes(x = log2FoldChange, y = -log10(pvalue),           color = Color, label = external_gene_name)) +       geom_vline(xintercept = c(2.0, -2.0), lty = "dashed") +       geom_hline(yintercept = -log10(0.05), lty = "dashed") +       geom_point() +       labs(x = "log2(FC)", y = "Significance, -log10(P)", color = "Significance") +       scale_color_manual(values = c("P < 0.05"="orange","P-adj < 0.05"="red","NS or log2FC < 2.0"="darkgray"),guide = guide_legend(override.aes = list(size = 4))) + scale_y_continuous(expand = expansion(mult = c(0,0.05))) +       geom_text_repel(data = subset(geness_res, external_gene_name %in% top_g & pvalue < 0.05 & (abs(geness_res$log2FoldChange) >= 2.0)), size = 4, point.padding = 0.15, color = "black", min.segment.length = .1, box.padding = .2, lwd = 2) +       theme_bw(base_size = 16) +       theme(legend.position = "bottom")
     dev.off()
    
     # ----------------------------------------
     # ----------- manhattan_plot -------------
    
     Rscript manhattan_plot_Carmen_custom_labels.R  #exceRpt_miRNA_ReadCounts.txt
  7. Downstream analyis using R for miRNAs (17 MKL-1 samples)

     #Input file
     #exceRpt_miRNA_ReadCounts.txt
     #exceRpt_piRNA_ReadCounts.txt
    
     #MKL-1_sT_DMSO_EV ("X2608_MKL1_sT_DMSO","X2701_MKL1_sT_DMSO","X2802_MKL1_sT_DMSO")
     #MKL-1_sT_Dox_EV ("X2608_MKL1_sT_Dox","X2701_MKL1_sT_Dox","X2802_MKL1_sT_Dox")
     #MKL-1_scr_DMSO_EV ("X2608_MKL1_scr_DMSO","X2701_MKL1_scr_DMSO","X2802_MKL1_scr_DMSO")
     #MKL-1_scr_Dox_EV ()"X2608_MKL1_scr_Dox","X2701_MKL1_scr_Dox","X2802_MKL1_scr_Dox")
     #MKL-1_wt_cells ("nf780","nf796","nf797")
     #MKL-1_wt_EV ("X2404_MKL1_wt_EVs","X2608_MKL1_wt_EVs")
    
     cd ~/DATA/Data_Ute_smallRNA_via_exceRpt_workspace/summaries_MKL-1
     mamba activate r_env
     R
    
     #BiocManager::install("AnnotationDbi")
     #BiocManager::install("clusterProfiler")
     #BiocManager::install(c("ReactomePA","org.Hs.eg.db"))
     #BiocManager::install("limma")
     #BiocManager::install("sva")
     #install.packages("writexl")
     #install.packages("openxlsx")
     library("AnnotationDbi")
     library("clusterProfiler")
     library("ReactomePA")
     library("org.Hs.eg.db")
     library(DESeq2)
     library(gplots)
     library(limma)
     library(sva)
     #library(writexl)  #d.raw_with_rownames <- cbind(RowNames = rownames(d.raw), d.raw); write_xlsx(d.raw, path = "d_raw.xlsx");
     library(openxlsx)
    
     d.raw<- read.delim2("exceRpt_miRNA_ReadCounts.txt",sep="\t", header=TRUE, row.names=1)
    
     # Desired column order
     desired_order <- c(
         "X2608_MKL1_sT_DMSO","X2701_MKL1_sT_DMSO","X2802_MKL1_sT_DMSO", "X2608_MKL1_sT_Dox","X2701_MKL1_sT_Dox","X2802_MKL1_sT_Dox", "X2608_MKL1_scr_DMSO","X2701_MKL1_scr_DMSO","X2802_MKL1_scr_DMSO", "X2608_MKL1_scr_Dox","X2701_MKL1_scr_Dox","X2802_MKL1_scr_Dox",
         "nf780","nf796","nf797", "X2404_MKL1_wt_EVs","X2608_MKL1_wt_EVs"
     )
    
     # Reorder columns
     d.raw <- d.raw[, desired_order]
     setdiff(desired_order, colnames(d.raw))  # Shows missing or misnamed columns
     #sapply(d.raw, is.numeric)
     d.raw[] <- lapply(d.raw, as.numeric)
     #d.raw[] <- lapply(d.raw, function(x) as.numeric(as.character(x)))
     d.raw <- round(d.raw)
     write.csv(d.raw, file ="d_raw.csv")
     write.xlsx(d.raw, file = "d_raw.xlsx", rowNames = TRUE)
    
     #d.raw <- read.delim2("d_raw.csv",sep=",", header=TRUE, row.names=1)
     Cell_or_EV = as.factor(c("EV","EV","EV",  "EV","EV","EV",  "EV","EV","EV",  "EV","EV","EV",  "Cell","Cell","Cell",  "EV","EV"))
     replicates = as.factor(c("MKL-1_sT_DMSO_EV","MKL-1_sT_DMSO_EV","MKL-1_sT_DMSO_EV",     "MKL-1_sT_Dox_EV","MKL-1_sT_Dox_EV","MKL-1_sT_Dox_EV",  "MKL-1_scr_DMSO_EV","MKL-1_scr_DMSO_EV","MKL-1_scr_DMSO_EV",  "MKL-1_scr_Dox_EV","MKL-1_scr_Dox_EV","MKL-1_scr_Dox_EV",    "MKL-1_wt_cells", "MKL-1_wt_cells","MKL-1_wt_cells",  "MKL-1_wt_EV","MKL-1_wt_EV"))
     ids = as.factor(c("X2608_MKL1_sT_DMSO","X2701_MKL1_sT_DMSO","X2802_MKL1_sT_DMSO", "X2608_MKL1_sT_Dox","X2701_MKL1_sT_Dox","X2802_MKL1_sT_Dox", "X2608_MKL1_scr_DMSO","X2701_MKL1_scr_DMSO","X2802_MKL1_scr_DMSO", "X2608_MKL1_scr_Dox","X2701_MKL1_scr_Dox","X2802_MKL1_scr_Dox",
         "nf780","nf796","nf797", "X2404_MKL1_wt_EVs","X2608_MKL1_wt_EVs"))
     cData = data.frame(row.names=colnames(d.raw), replicates=replicates, ids=ids, Cell_or_EV=Cell_or_EV)
     dds<-DESeqDataSetFromMatrix(countData=d.raw, colData=cData, design=~replicates)
    
     # Filter low-count miRNAs
     dds <- dds[ rowSums(counts(dds)) > 10, ]
     rld <- rlogTransformation(dds)
    
     # -- before pca --
     png("pca.png", 1200, 800)
     plotPCA(rld, intgroup=c("replicates"))
     #plotPCA(rld, intgroup = c("replicates", "batch"))
     #plotPCA(rld, intgroup = c("replicates", "ids"))
     #plotPCA(rld, "batch")
     dev.off()
     png("pca2.png", 1200, 800)
     #plotPCA(rld, intgroup=c("replicates"))
     #plotPCA(rld, intgroup = c("replicates", "batch"))
     plotPCA(rld, intgroup = c("replicates", "ids"))
     #plotPCA(rld, "batch")
     dev.off()
    
     # Batch Effect Removal Methods (Non-batch effect removal applied!)
    
     #### STEP2: DEGs ####
     #- Heatmap untreated/wt vs parental; 1x for WaGa cell line
     #- Volcano plot untreated/wt vs parental; 1x for WaGa cell line
     #- Manhattan plot miRNAs; 1x for WaGa cell line
     #- Distribution of different small RNA species untreated/wt and parental; 1x for WaGa cell line
     #- Motif analysis: identify RNA-binding proteins that may regulate small RNA loading; 1x for WaGa cell line
    
     #convert bam to bigwig using deepTools by feeding inverse of DESeq’s size Factor
     sizeFactors(dds)
     #NULL
     dds <- estimateSizeFactors(dds)
     sizeFactors(dds)
     normalized_counts <- counts(dds, normalized=TRUE)
     write.table(normalized_counts, file="normalized_counts.txt", sep="\t", quote=F, col.names=NA)
     write.xlsx(normalized_counts, file = "normalized_counts.xlsx", rowNames = TRUE)
    
     dds<-DESeqDataSetFromMatrix(countData=d.raw, colData=cData, design=~replicates)
    
     dds$replicates <- relevel(dds$replicates, "MKL-1_wt_cells")
     dds = DESeq(dds, betaPrior=FALSE)  #default betaPrior is FALSE
     resultsNames(dds)
     clist <- c("MKL.1_wt_EV_vs_MKL.1_wt_cells")
    
     #NOTE that the results sent to Ute is |padj|<=0.1.
     for (i in clist) {
         contrast = paste("replicates", i, sep="_")
         res = results(dds, name=contrast)
         res <- res[!is.na(res$log2FoldChange),]
         #https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-are-some-p-values-set-to-na
         res$padj <- ifelse(is.na(res$padj), 1, res$padj)
         res_df <- as.data.frame(res)
         write.csv(as.data.frame(res_df[order(res_df$pvalue),]), file = paste(i, "all.txt", sep="-"))
         up <- subset(res_df, padj<=0.05 & log2FoldChange>=2)
         down <- subset(res_df, padj<=0.05 & log2FoldChange<=-2)
         write.csv(as.data.frame(up[order(up$log2FoldChange,decreasing=TRUE),]), file = paste(i, "up.txt", sep="-"))
         write.csv(as.data.frame(down[order(abs(down$log2FoldChange),decreasing=TRUE),]), file = paste(i, "down.txt", sep="-"))
     }
    
     ~/Tools/csv2xls-0.4/csv_to_xls.py \
     MKL.1_wt_EV_vs_MKL.1_wt_cells-all.txt \
     MKL.1_wt_EV_vs_MKL.1_wt_cells-up.txt \
     MKL.1_wt_EV_vs_MKL.1_wt_cells-down.txt \
     -d$',' -o MKL.1_wt_EV_vs_MKL.1_wt_cells.xls;
    
     # ------------------- volcano_plot -------------------
     library(ggplot2)
     library(ggrepel)
    
     geness_res <- read.csv(file = paste("MKL.1_wt_EV_vs_MKL.1_wt_cells", "all.txt", sep="-"), row.names=1)
    
     external_gene_name <- rownames(geness_res)
     geness_res <- cbind(geness_res, external_gene_name)
     #top_g are from ids
    
     top_g <- c("hsa-miR-203a-3p","hsa-miR-6850-5p","hsa-miR-4511","hsa-miR-5187-5p","hsa-miR-133b","hsa-miR-1246","hsa-miR-625-3p","hsa-miR-6741-3p","hsa-miR-192-5p","hsa-miR-10b-5p","hsa-miR-885-5p","hsa-miR-30e-3p","hsa-miR-101-3p","hsa-miR-1307-5p","hsa-miR-95-3p","hsa-miR-889-3p","hsa-miR-206","hsa-miR-301a-3p","hsa-miR-1-3p","hsa-let-7c-5p","hsa-miR-196a-5p","hsa-let-7f-5p","hsa-let-7e-5p","hsa-miR-30c-5p","hsa-miR-30a-3p","hsa-miR-146b-5p","hsa-miR-25-3p","hsa-miR-182-5p","hsa-miR-98-5p","hsa-let-7a-5p","hsa-miR-149-5p","hsa-miR-148a-3p","hsa-miR-873-3p","hsa-miR-19b-3p","hsa-miR-320c","hsa-miR-375","hsa-miR-30a-5p","hsa-miR-877-5p","hsa-miR-34a-5p","hsa-miR-324-5p","hsa-miR-652-3p","hsa-miR-342-5p","hsa-miR-7706","hsa-miR-361-3p","hsa-miR-361-5p","hsa-miR-1180-3p","hsa-miR-217","hsa-miR-1307-3p","hsa-miR-1908-5p","hsa-miR-15b-5p","hsa-miR-92b-5p","hsa-miR-484","hsa-miR-197-3p","hsa-miR-200c-3p","hsa-miR-671-5p","hsa-miR-339-5p","hsa-miR-1301-3p","hsa-miR-769-5p","hsa-miR-328-3p","hsa-miR-93-5p","hsa-miR-103a-3p")
     subset(geness_res, external_gene_name %in% top_g & pvalue < 0.05 & (abs(geness_res$log2FoldChange) >= 2.0))
     geness_res$Color <- "NS or log2FC < 2.0"
     geness_res$Color[geness_res$pvalue < 0.05] <- "P < 0.05"
     geness_res$Color[geness_res$padj < 0.05] <- "P-adj < 0.05"
     geness_res$Color[abs(geness_res$log2FoldChange) < 2.0] <- "NS or log2FC < 2.0"
    
     write.csv(geness_res, "MKL.1_wt_EV_vs_MKL.1_wt_cells_with_Category.csv")
     geness_res$invert_P <- (-log10(geness_res$pvalue)) * sign(geness_res$log2FoldChange)
    
     geness_res <- geness_res[, -1*ncol(geness_res)]
     png("MKL.1_wt_EV_vs_MKL.1_wt_cells.png",width=1200, height=1400)
     #svg("MKL.1_wt_EV_vs_MKL.1_wt_cells.svg",width=12, height=14)
     ggplot(geness_res,       aes(x = log2FoldChange, y = -log10(pvalue),           color = Color, label = external_gene_name)) +       geom_vline(xintercept = c(2.0, -2.0), lty = "dashed") +       geom_hline(yintercept = -log10(0.05), lty = "dashed") +       geom_point() +       labs(x = "log2(FC)", y = "Significance, -log10(P)", color = "Significance") +       scale_color_manual(values = c("P < 0.05"="orange","P-adj < 0.05"="red","NS or log2FC < 2.0"="darkgray"),guide = guide_legend(override.aes = list(size = 4))) + scale_y_continuous(expand = expansion(mult = c(0,0.05))) +       geom_text_repel(data = subset(geness_res, external_gene_name %in% top_g & pvalue < 0.05 & (abs(geness_res$log2FoldChange) >= 2.0)), size = 4, point.padding = 0.15, color = "black", min.segment.length = .1, box.padding = .2, lwd = 2) +       theme_bw(base_size = 16) +       theme(legend.position = "bottom")
     dev.off()
    
     # ----------------------------------------
     # ----------- manhattan_plot -------------
    
     Rscript manhattan_plot_Carmen_custom_labels.R  #exceRpt_miRNA_ReadCounts.txt

Comparing https://chat.z.ai/, https://chat.qwen.ai/, and https://chat.deepseek.com/ to explain Maaslin2

Maaslin2 解释

什么是Maaslin2?

Maaslin2(Microbiome Multivariable Association with Linear Models)是一个用于微生物组数据多变量关联分析的R语言包。

主要用途

  • 寻找微生物特征(如细菌丰度)与元数据(如年龄、饮食、疾病状态)之间的关联
  • 适用于16S rRNA测序、宏基因组等微生物组数据

核心特点

1. 统计方法

  • 使用线性模型广义线性模型
  • 支持固定效应和随机效应
  • 可处理重复测量数据

2. 数据标准化

  • 自动处理微生物数据的组成性问题
  • 提供 TSS、CSS、CLR 等多种标准化方法

3. 多变量分析

  • 可同时调整多个协变量
  • 控制混杂因素

基本工作流程

# 安装
library(Maaslin2)

# 运行分析
results <- Maaslin2(
  input_data = microbiome_data,    # 微生物丰度表
  input_metadata = metadata,       # 元数据
  output = "results",              # 输出目录
  fixed_effects = c("age", "diet"), # 固定效应
  random_effects = c("subject_id"), # 随机效应
  normalization = "CLR",           # 标准化方法
  transformation = "LOG"           # 数据转换
)

输出结果

  • 显著性关联表:包含p值、q值、效应大小
  • 可视化图表:热图、箱线图等
  • 汇总统计

优势

  1. 用户友好:简单的接口,默认参数合理
  2. 灵活性强:支持多种标准化和转换方法
  3. 统计严谨:多重检验校正(FDR)
  4. 可重复性:结果易于复现

应用场景

  • 疾病与肠道菌群关联研究
  • 饮食干预对微生物组影响
  • 药物处理前后菌群变化
  • 环境因素与微生物组关系


Here’s Python code to read a file and output to console:

# Method 1: Read entire file at once
with open('filename.txt', 'r', encoding='utf-8') as file:
    content = file.read()
    print(content)

# Method 2: Read line by line
with open('filename.txt', 'r', encoding='utf-8') as file:
    for line in file:
        print(line.strip())  # strip() removes newline characters

# Method 3: Read all lines into a list
with open('filename.txt', 'r', encoding='utf-8') as file:
    lines = file.readlines()
    for line in lines:
        print(line.strip())

# With error handling
try:
    with open('filename.txt', 'r', encoding='utf-8') as file:
        content = file.read()
        print(content)
except FileNotFoundError:
    print("File not found!")
except Exception as e:
    print(f"An error occurred: {e}")

Key points:

  • Use with statement to automatically close the file
  • Specify encoding='utf-8' to handle special characters
  • strip() removes trailing newlines when reading line by line
  • Always include error handling for robust code

Replace 'filename.txt' with your actual file path.

Processing DNAseq for Data_Tam_DNAseq_2026_2605_2617_2631_2914_Acinetobacter_sp

  1. Run nextflow bacass

     conda deactivate
    
     # Downlod k2_standard_08_GB_20251015.tar.gz from https://benlangmead.github.io/aws-indexes/k2#kraken2--bracken
     # Download 20190108_kmerfinder_stable_dirs.tar.gz from https://zenodo.org/records/13447056; 'tar xzf 20190108_kmerfinder_stable_dirs.tar.gz'  #The database does not work!
     # Download the kmerfinder database: https://www.genomicepidemiology.org/services/ --> https://cge.food.dtu.dk/services/KmerFinder/ --> https://cge.food.dtu.dk/services/KmerFinder/etc/kmerfinder_db.tar.gz  #The database works!
    
     # DEBUG: --kmerfinderdb /mnt/nvme1n1p1/REFs/kmerfinder/bacteria/ not working!
    
     nextflow run nf-core/bacass -r 2.6.0 -profile docker --help
    
     # -- Hybrid assembly --
     nextflow run nf-core/bacass -r 2.6.0 -profile docker \
       --input samplesheet_bacass.tsv \
       --outdir bacass_out \
       --assembly_type hybrid \
       --assembler unicycler,dragonflye \
       --kraken2db /mnt/nvme1n1p1/REFs/k2_standard_08_GB_20251015.tar.gz \
       --skip_kmerfinder \
       -resume \
       -work-dir bacass_out/work
    
     # -- Short assembly --
     #Maybe BUG is from '--skip_kmerfinder for -r 2.6.0, using db in 2.5.0'
     nextflow run nf-core/bacass -r 2.5.0 -profile docker \
       --input samplesheet.tsv \
       --outdir bacass_out \
       --assembly_type short \
       --kraken2db /mnt/nvme1n1p1/REFs/k2_standard_08_GB_20251015.tar.gz \
       --kmerfinderdb /mnt/nvme1n1p1/REFs/kmerfinder/bacteria/ \
       -resume \
       -work-dir bacass_out/work
  2. Verify if the genome is pure

     # 1. Go up one level to the main 'bacass_out' directory
     cd ..
    
     # 2. Create directories for CheckM inputs and outputs
     mkdir -p checkm_input checkm_output
    
     # 3. Copy all .fna files into the 'checkm_input' folder
     # (CheckM cannot search subdirectories, so they must be in one folder)
     find ./Prokka -name "*.fna" -exec cp {} checkm_input/ \;
    
     # 4. Run CheckM on all 4 assemblies
     checkm lineage_wf -x fna checkm_input checkm_output
  3. Species Identification: 快速筛查用 Mash → 精确分类用 GTDB-Tk → 种级验证用 FastANI,三者结合可最大限度提高物种鉴定的准确性和可解释性。

     # 1. 创建环境(推荐 mamba)
     mamba create -n gtdbtk -c conda-forge -c bioconda gtdbtk
     mamba activate gtdbtk
    
     # 2. 下载数据库(仅需首次,约 60GB)
     gtdbtk download --data_dir ./gtdb_data --release 220
    
     wget https://data.gtdb.aau.ecogenomic.org/releases/release232/232.0/auxillary_files/gtdbtk_package/full_package/gtdbtk_r232_data.tar.g
     mamba env config vars set GTDBTK_DATA_PATH="/mnt/nvme4n1p1/gtdb_data/release232"
     # 先退出当前环境,再重新激活
     mamba deactivate
     mamba activate gtdbtk
    
     # 验证环境变量是否加载成功
     echo $GTDBTK_DATA_PATH
     # 应输出:/mnt/nvme4n1p1/gtdb_data/release232
    
     # 3. 运行分类(你提供的命令 + 实用参数)
     gtdbtk classify_wf \
       --genome_dir ./checkm_input \
       --out_dir gtdb_out \
       --cpus 64 \
       --extension .fna \
       --prefix mygenome
    
     # 4. 查看结果
     cat gtdb_out/classify/mygenome.bac120.summary.tsv   # 细菌结果
  4. Antimicrobial resistance gene profiling and Resistome and Virulence Profiling with Abricate and RGI (Reisistance Gene Identifier)

     conda activate /home/jhuang/miniconda3/envs/bengal3_ac3
     abricate --list
    
     conda deactivate
    
     ENV_NAME=/home/jhuang/miniconda3/envs/bengal3_ac3 \
     ASM=bacass_out/checkm_input/2914_.fna \
     SAMPLE=2914 \
     OUTDIR=resistome_virulence_2914 \
     MINID=80 MINCOV=60 \
     THREADS=32 \
     ~/Scripts/run_abricate_resistome_virulome_one_per_gene.sh
    
     #ABRicate thresholds: MINID=80 MINCOV=60
     Database        Hit_lines       File
     MEGARes 24      resistome_virulence_2605/raw/2605.megares.tab
     CARD    21      resistome_virulence_2605/raw/2605.card.tab
     ResFinder       4       resistome_virulence_2605/raw/2605.resfinder.tab
     VFDB    0       resistome_virulence_2605/raw/2605.vfdb.tab
    
     # Database        Hit_lines       File
     # MEGARes 42      resistome_virulence_2631/raw/2631.megares.tab
     # CARD    37      resistome_virulence_2631/raw/2631.card.tab
     # ResFinder       16      resistome_virulence_2631/raw/2631.resfinder.tab
     # VFDB    0       resistome_virulence_2631/raw/2631.vfdb.tab
    
     Database        Hit_lines       File
     MEGARes 35      resistome_virulence_2914/raw/2914.megares.tab
     CARD    31      resistome_virulence_2914/raw/2914.card.tab
     ResFinder       11      resistome_virulence_2914/raw/2914.resfinder.tab
     VFDB    0       resistome_virulence_2914/raw/2914.vfdb.tab
    
     # #ABRicate thresholds: MINID=70 MINCOV=50
     # Database        Hit_lines       File
     # MEGARes 24      resistome_virulence_2605/raw/2605.megares.tab
     # CARD    21      resistome_virulence_2605/raw/2605.card.tab
     # ResFinder       4       resistome_virulence_2605/raw/2605.resfinder.tab
     # VFDB    3       resistome_virulence_2605/raw/2605.vfdb.tab
    
     conda activate /home/jhuang/miniconda3/envs/bengal3_ac3
     #NEED_TO_ADAPT: OUTDIR = Path("resistome_virulence_An7")
     #NEED_TO_ADAPT: SAMPLE = "An7"
     #DEPRECATED_DUE_TO_NEED_MANULL_SETTING: python ~/Scripts/merge_amr_sources_by_gene.py
    
     python ~/Scripts/export_resistome_virulence_to_excel_py36.py \
       --workdir resistome_virulence_2914 \
       --sample 2914 \
       --out Resistome_Virulence_2914.xlsx
     # Delete the column 'COVERAGE_MAP' in all 'Raw_*' sheets
  5. Report

     Please find below a summary of genomic analyses for samples 2605, 2617, 2631 and 2914.
    
     ### 1. Assembly and checkM
    
             ------------------------------------------------------------------------------------------------------------------------------------------------------------------
             Bin Id            Completeness   Contamination   Strain heterogeneity
             ------------------------------------------------------------------------------------------------------------------------------------------------------------------
             2631_       100.00          100.00             78.57
             2617_          100.00          100.00             78.57
             2605_     100.00           0.00               0.00
             2914_         99.98            0.63               0.00
             ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    
             From the results of checkM, we see the samples 2631_ and 2617_ both are genomes between 7.0-7.1 M. and the contamination is 100.00, which means the DNA sample contained two closely related strains of the same species from a non-clonal culture. If the true genome size is a standard ~3.7 Mb  and the assembler couldn't merge the two highly similar strains, it would build both side-by-side. This results in a ~7.0 Mb assembly where every gene is duplicated.
             The sample 2605_.fna is 3.7 M and 2914_.fna is about 3.9M. they are pure isolates.
    
             ### 1. Species Identification
    
             **Sample 2605_:** *Acinetobacter baumannii* ✅ Confirmed
    
             | Parameter | Value | Interpretation |
             |---|---|---|
             | Closest Reference | GCF_009759685.1 | Reference genome of *A. baumannii* |
             | ANI | 98.02% | ✅ Well above 95% species threshold |
             | AF (Alignment Fraction) | 0.874 | ✅ 87.4% of genome aligns; ANI estimate is robust |
             | Final Taxonomy | `d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Moraxellaceae;g__Acinetobacter;s__Acinetobacter baumannii` | Consistent with genomic expectations |
    
             🟢 **Conclusion:** 2605_ is confidently assigned to *Acinetobacter baumannii*.
    
             ***
    
             **Sample 2617_:** *Acinetobacter baumannii* ✅ Confirmed
    
             | Parameter | Value | Interpretation |
             |---|---|---|
             | Closest Reference | GCF_009759685.1 | Reference genome of *A. baumannii* |
             | ANI | 98.00% | ✅ Well above 95% species threshold |
             | AF (Alignment Fraction) | 0.859 | ✅ 85.9% of genome aligns; ANI estimate is robust |
             | Final Taxonomy | `d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Moraxellaceae;g__Acinetobacter;s__Acinetobacter baumannii` | Consistent with genomic expectations |
    
             🟢 **Conclusion:** 2617_ is confidently assigned to *Acinetobacter baumannii*.
    
             ***
    
             **Sample 2631_:** *Acinetobacter baumannii* ✅ Confirmed
    
             | Parameter | Value | Interpretation |
             |---|---|---|
             | Closest Reference | GCF_009759685.1 | Reference genome of *A. baumannii* |
             | ANI | 98.07% | ✅ Well above 95% species threshold |
             | AF (Alignment Fraction) | 0.860 | ✅ 86.0% of genome aligns; ANI estimate is robust |
             | Final Taxonomy | `d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Moraxellaceae;g__Acinetobacter;s__Acinetobacter baumannii` | Consistent with genomic expectations |
    
             🟢 **Conclusion:** 2631_ is confidently assigned to *Acinetobacter baumannii*.
    
             ***
    
             **Sample 2914_:** *Acinetobacter baumannii* ✅ Confirmed
    
             | Parameter | Value | Interpretation |
             |---|---|---|
             | Closest Reference | GCF_009759685.1 | Reference genome of *A. baumannii* |
             | ANI | 98.11% | ✅ Well above 95% species threshold |
             | AF (Alignment Fraction) | 0.873 | ✅ 87.3% of genome aligns; ANI estimate is robust |
             | Final Taxonomy | `d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Pseudomonadales;f__Moraxellaceae;g__Acinetobacter;s__Acinetobacter baumannii` | Consistent with genomic expectations |
    
             🟢 **Conclusion:** 2914_ is confidently assigned to *Acinetobacter baumannii*.
    
     ### 3. Since 2631_ and 2617_ are not a pure isolates, they are the mixed of two strains. I exclude the two samples from AMR and VFDB analysis. AMR Genes and Virulence Factors (VFDB) Summary, see the Resistome_Virulence_2605.xlsx and Resistome_Virulence_2914.xlsx.


Since CheckM requires all the genome files (bins) to be in the same directory, you first need to gather your .fna files into a single folder before running the command.

Here are the exact bash commands to prepare your files and run CheckM for all four samples (2605, 2617, 2631, and 2914).

Step 1: Prepare the Input and Run CheckM

Run the following commands in your terminal:

# 1. Go up one level to the main 'bacass_out' directory
cd ..

# 2. Create directories for CheckM inputs and outputs
mkdir -p checkm_input checkm_output

# 3. Copy all .fna files into the 'checkm_input' folder
# (CheckM cannot search subdirectories, so they must be in one folder)
find ./Prokka -name "*.fna" -exec cp {} checkm_input/ \;

# 4. Run CheckM on all 4 assemblies
checkm lineage_wf -x fna checkm_input checkm_output

(Note: If you get an error saying the CheckM database is not found, you need to set the path to your CheckM data folder first using: checkm data setRoot /path/to/your/checkm_data)

Step 2: How to Interpret the Results

CheckM will print a summary table directly to your terminal, and it will also save detailed statistics in checkm_output/bin_stats_ext.tsv.

Since you are working with Acinetobacter sp. (which typically has a genome size of ~3.5 to 4.5 Mb), a 7.0 Mb assembly is almost certainly a mix of two closely related strains that the assembler failed to collapse.

Look specifically at these three columns in the CheckM output for your 2631_ sample:

  1. Completeness: This should be high (ideally > 95%). Because the assembler built two genomes, CheckM will likely still find all the marker genes, so completeness might look deceptively high.
  2. Contamination: This is the most critical metric. For a pure isolate, this should be < 5%. If your 2631 assembly is a mixed strain, CheckM will detect the duplicated marker genes and flag this as high contamination (often > 20-50%).
  3. Strain heterogeneity: This column specifically calculates if there are multiple strains of the same species present. A high value here confirms that your assembly contains a mixture of closely related strains.

What to do next?

  • If 2631 has high contamination/strain heterogeneity: You have confirmed it is a mixed assembly. You can either use a tool like purge_dups to separate the two strains, or map your raw reads back to the assembly and filter out the contigs with half the average coverage.
  • If 2631 has low contamination: The 7.0 Mb size might be real (some Acinetobacter strains have large plasmids or are naturally larger), but you should still verify this using a BlobPlot (GC vs. Coverage) to be absolutely sure there is no foreign DNA.
  • Check the other 3 samples: Compare the CheckM results of 2605, 2617, and 2914 to see if they are pure, high-quality genomes (high completeness, low contamination).

            ------------------------------------------------------------------------------------------------------------------------------------------------------------------
            Bin Id         Marker lineage         # genomes   # markers   # marker sets   0    1     2    3   4   5+   Completeness   Contamination   Strain heterogeneity
            ------------------------------------------------------------------------------------------------------------------------------------------------------------------
            2631_           root (UID1)              5656         56            24        0    0     56   0   0   0       100.00          100.00             78.57        
            2617_           root (UID1)              5656         56            24        0    0     56   0   0   0       100.00          100.00             78.57        
            2605_    f__Moraxellaceae (UID4680)       86         689           365        0   689    0    0   0   0       100.00           0.00               0.00        
            2914_    g__Acinetobacter (UID4685)       42         1193          254        1   1188   4    0   0   0       99.98            0.63               0.00        
            ----------------------------------------------------------------------------------------------------------------------------------------------------------------

Ah, I understand now! Thank you for clarifying. The CheckM results you showed earlier (with 100% contamination and 56 markers in the ‘2’ column) were indeed from this Unicycler assembly.

Looking at your Unicycler log, we can see exactly why it didn’t fix the problem: Total Length: 7,199,947 bp (~7.2 Mb) Main Chromosome: Component 1 consists of 416 fragmented contigs totaling 7,165,229 bp.

Why Unicycler Failed to Collapse the Genome

The log states: “Since you provided only short reads, Unicycler will essentially function as a SPAdes-optimiser.”

Because you only have short reads (Illumina), Unicycler builds a de Bruijn graph. When a bacterial sample contains a mixture of two closely related strains (a very common scenario in non-clonal isolates), the assembler sees the slight genetic differences between the two strains as “bubbles” in the graph.

Without long reads to span across these bubbles and prove which path belongs to which strain, the software conservatively outputs both paths. This results in an assembly that contains both strains side-by-side, doubling your genome size to ~7.2 Mb and causing the 100% duplication flagged by CheckM.

How to Fix This (Since Re-assembling Won’t Work)

Running another short-read assembler will just result in the exact same 7.2 Mb duplicated assembly. You now need to computationally “purge” the duplicate contigs.

Option 1: The Quick Fix using BBMap (dedupe.sh)

If you have the BBMap suite installed, it has a fantastic tool specifically for removing duplicate sequences. It will compare all your contigs to each other and remove the redundant ones.

dedupe.sh in=assembly.fasta out=assembly_purged.fasta minidentity=95
  • minidentity=95 tells it to remove contigs that are 95% identical or more to another contig. You can adjust this (e.g., 90 or 98) depending on how closely related the two strains are.
  • After running this, run CheckM again on assembly_purged.fasta. You should see the genome size drop to ~3.5 Mb and the contamination drop to 0%.

Option 2: The Thorough Fix using purge_dups

purge_dups is the gold standard for separating primary contigs from duplicated “haplotigs” in an assembly. It works by aligning your assembly to itself and using read depth to separate the true genome from the duplicates.

  1. Calculate coverage cutoffs: calcuts assembly.fasta > cutoffs 2>calcults.log
  2. Split the assembly: split_fa assembly.fasta > assembly.split.fa
  3. Align to itself: minimap2 -xasm5 assembly.split.fa assembly.split.fa > alignment.paf
  4. Get coverage: pbcstat alignment.paf and calcuts PB.base.cov
  5. Purge: purge_dups -2 -T 1024 -c CUTOFFS assembly.split.fa alignment.paf > dups.bed
  6. Extract pure genome: get_seqs dups.bed assembly.split.fa (This will output assembly.split.fa.purged.fa, which is your pure ~3.5 Mb genome).

Option 3: Do you have Long Reads?

If you happen to have Oxford Nanopore (ONT) or PacBio reads for this sample, you can solve this biologically rather than computationally. Running Unicycler with long reads (unicycler -1 short_R1.fastq -2 short_R2.fastq -l long_reads.fastq) will allow the assembler to easily span across the “bubbles” and collapse the two strains into a single, perfect, circular ~3.5 Mb chromosome.


Bonus Observation: You likely have 3 plasmids!

Looking at the end of your log, Unicycler successfully closed 3 circular contigs:

  • 8,731 bp (7.89x coverage)
  • 5,837 bp (5.27x coverage)
  • 2,120 bp (7.74x coverage)

These are almost certainly plasmids. The massive ~7.2 Mb “Component 1” is your duplicated chromosome. When you use purge_dups or dedupe.sh to fix the chromosomal duplication, these plasmids should remain intact in your final assembly!



是的,这种强制“去冗余”(Purging duplicates / Haplotype purging)的操作确实有可能引入组装错误或偏差。

在真核生物(如动植物)的组装中,分离单倍型是常规操作;但在细菌(单倍体) 基因组中,强行使用去冗余工具来“修复”7.2Mb的组装结果,是一把双刃剑。

以下是这种操作可能引入的具体错误和风险,以及背后的科学原因:

1. 误删真实的生物学重复序列(Over-purging / 假阴性缺失)

细菌基因组中天然存在许多真实的重复序列,例如:

  • rRNA 操纵子(通常有 5-10 个高度相似的拷贝)。
  • IS 元件(插入序列)和转座子(在 Acinetobacter 等细菌中非常活跃,经常有多个拷贝)。
  • 基因家族或旁系同源基因(Paralogs)。

风险:如果你使用 bbmap (dedupe.sh) 并设置了一个较高的相似度阈值(例如 minidentity=95),去冗余工具无法区分“组装错误导致的重复”和“基因组天然存在的重复”。它可能会把你基因组中真实存在的、具有重要功能(如耐药性、毒力)的 IS 元件或 rRNA 拷贝当作“冗余的单倍型”直接删除,导致你的最终基因组缺失关键基因

2. 产生“嵌合体”基因组(Chimeric Assembly)

你的 CheckM 结果显示完美的 1:1 重复(所有 marker 基因都在 ‘2’ 列)。这通常意味着两种可能:

  • 情况 A(组装软件的 Bug):样本是纯的,但 SPAdes/Unicycler 因为某些复杂的局部重复或测序偏好性,把同一段序列组装了两遍。
  • 情况 B(样本不纯/混合菌株):你的培养物中混入了两个亲缘关系极近的菌株(Mixed strains)。短读长无法跨越它们之间的 SNP/Indel 差异(即 de Bruijn 图中的 bubbles),所以软件把两套基因组都保留了下来。

风险:如果是情况 B,去冗余工具在决定“保留哪一套、丢弃哪一套”时,可能会在两个菌株的序列之间来回切换。最终你得到的并不是一个真实的单一基因组,而是一个自然界中不存在的“弗兰肯斯坦(Frankenstein)”嵌合体

  • 后果:这种嵌合体会严重影响后续的 SNP calling(产生大量假阳性突变)、进化树构建(Phylogeny)以及耐药基因(AMR)的准确定位。

3. 破坏基因组的连续性(Structural Breaks)

风险:在剥离“副拷贝”的过程中,去冗余算法可能会在原本连续的 contig 上造成人为的断裂。这会导致你的组装结果碎片化(Contig 数量增加,N50 降低),原本可以闭合的环状染色体可能会断裂成多个线性片段。


如何安全地处理 2631 和 2617?(最佳实践)

为了避免引入上述错误,不要盲目直接运行去冗余工具。建议按照以下步骤进行排查和处理:

第一步:回贴原始 Reads,查看覆盖度(Coverage)分布

将你的原始短读长(clean reads)比对回这 7.2Mb 的组装结果上(使用 bwabowtie2),然后统计每个 Contig 的平均覆盖度。

  • 如果所有 Contig 的覆盖度都非常均匀(例如都在 100x 左右):这说明是情况 A(组装软件的 Bug)。基因组是纯的,只是被错误地组装了两遍。此时去冗余是相对安全的,但仍需谨慎。
  • 如果 Contig 的覆盖度呈现明显的两个峰(例如一半是 100x,另一半是 50x):这证实了是情况 B(混合菌株)
  • 如果覆盖度极其混乱:说明可能存在更复杂的污染。

第二步:根据覆盖度结果选择策略

  • 如果是混合菌株(情况 B)
    • 最完美的生物学解决方案:重新划线分离单克隆(Single colony isolation),重新提取 DNA 测序。这是唯一能获得 100% 纯基因组的方法。
    • 计算解决方案:不要使用 dedupe.sh。你应该使用基于覆盖度的分离工具(如 metaSPAdes 的 binning 功能,或者 BlobTools),利用 100x 和 50x 的覆盖度差异,将两个菌株的 Contig 彻底分开,分别得到两个独立的基因组,然后再分别跑 CheckM。
  • 如果是组装 Bug(情况 A)
    • 你可以尝试使用 purge_dups,它比 dedupe.sh 更智能,会结合自比对(self-alignment)和覆盖度来区分“主序列(Primary)”和“副序列(Haplotig)”。
    • 关键操作:在运行 purge_dups 后,必须将提取出的“Primary”序列再次运行 CheckM 和 BUSCO
    • 验收标准:如果 CheckM 显示 Completeness > 95%,Contamination < 5%,并且基因组大小降到了 ~3.5 Mb,说明去冗余成功且没有误删核心基因。如果 Completeness 大幅下降,说明你误删了真实的基因,需要调高 purge_dups 的保守参数重新运行。

第三步:尝试重新组装(替代方案)

有时候,Unicycler 的默认参数对某些特定的 Acinetobacter 菌株不够友好。你可以尝试直接使用 SPAdes 并开启特定的模式:

spades.py --isolate --careful -1 2631__1.fastp.fastq.gz -2 2631__2.fastp.fastq.gz -o spades_2631_careful

--careful 参数会让 SPAdes 在组装后尝试纠正错配和 Indel,有时它内部的 repeat resolution 机制能更好地 collapse 那些被 Unicycler 错误保留的重复区域。

总结

强行去冗余确实有引入嵌合体或误删真实重复序列的风险。对于细菌基因组,“计算修补”永远是下策。最严谨的做法是通过 Read Mapping 确认覆盖度,如果是混合菌株则进行 Binning 分离或重新实验;如果是组装冗余,则使用 purge_dups严格依赖二次 CheckM 结果来验证基因组的完整性。