gene_x 0 like s 689 view s
Tags: plot, python, RNA-seq
HSV (Herpes Simplex Virus) contains two types of repeated sequences, which are:
IRS (Internal Repeat, Short): This refers to the short repeated sequences located between the unique long (UL) and unique short (US) regions of the HSV genome. There are two copies of IRS in the genome flanking the US region.
TRL (Terminal Repeat, Long) and IRL (Internal Repeat, Long): These are the long repeated sequences in the HSV genome. The TRL sequences are found at the very ends (terminals) of the linear HSV genome, while the IRL sequences are found internally, flanking the UL region.
The organization of the HSV genome can be summarized as: TRL - UL - IRL - US - IRS - US (in reverse orientation) - IRL
Here's a brief breakdown:
UL (Unique Long): This is a unique sequence region found once in the genome.
US (Unique Short): This is another unique sequence region but it is shorter than UL and is found flanked by IRS sequences.
IRS (Internal Repeat Short): These are short repeated sequences that flank the US region.
TRL (Terminal Repeat Long) and IRL (Internal Repeat Long): The long repeated sequences found at the genome's terminals and internally flanking the UL region.
These repeated sequences play crucial roles in the HSV life cycle, especially during the processes of recombination, genome replication, and the switch between latency and active replication.
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
def read_gtf(filename):
with open(filename, 'r') as file:
lines = file.readlines()
features = []
for line in lines:
if not line.startswith("#"):
split_line = line.strip().split("\t")
feature_type = split_line[2]
start = int(split_line[3])
end = int(split_line[4])
try:
gene_name = [x for x in split_line[8].split(";") if "gene_id" in x][0].split('"')[1]
except:
gene_name = "unknown"
strand = split_line[6]
features.append((feature_type, start, end, gene_name, strand))
return features
def plot_features(features, genome_id):
fig, ax = plt.subplots(figsize=(12, 16))
y_offset = 0
y_increment = 1
y_positions = {}
for feature_type, start, end, gene_name, strand in features:
if feature_type == "gene":
if gene_name not in y_positions:
y_positions[gene_name] = y_offset
y_offset += y_increment
y_pos = y_positions[gene_name]
color = "lightblue" if strand == "-" else (1, 0.6, 0.6) # Using RGB
rect = mpatches.Rectangle([start, y_pos], end-start, 0.6, ec="none", fc=color)
ax.add_patch(rect)
if strand == "+":
ax.text((start + end) / 2, y_pos, gene_name, ha='center', va='center', fontsize=9)
else:
ax.text((start + end) / 2, y_pos, gene_name, ha='center', va='center', fontsize=9)
ax.set_xlim(0, max([f[2] for f in features]))
ax.set_ylim(0, y_offset)
ax.set_yticks([])
ax.set_xlabel("Position (bp)")
ax.set_title(f"") #f"Genomic Organization of {genome_id}"
plt.tight_layout()
plt.savefig(f"{genome_id}_genomic_organization.png")
plt.show()
if __name__ == "__main__":
genome_id = "chrHsv1_s17"
features = read_gtf("chrHsv1_s17.gtf")
plot_features(features, genome_id)
点赞本文的读者
还没有人对此文章表态
没有评论
RNA-seq skin organoids on GRCh38+chrHsv1 (final)
YopQ Secretion Boxplot and Fitting Function
© 2023 XGenes.com Impressum