Scaffolding and finishing an assembly with a reference genome

gene_x 0 like s 714 view s

Tags: processing, mutation, bacterium, variation, pipeline

Below are example commands for each step of scaffolding and finishing an assembly with a reference genome. Please note that the specific parameters may need to be adjusted based on your data and computational resources. Replace reference.fasta, contigs.fasta, reads_1.fq, and reads_2.fq with the paths to your reference genome, your contig sequences, and your paired-end sequencing reads, respectively. Furthermore, software like RagTag and GapFiller may require additional setup steps or files, like read mapping beforehand or specific configuration files. Always refer to the documentation of each software for the most accurate and effective usage.

  1. Alignment of Contigs to Reference:

    nucmer --maxgap=500 --mincluster=100 --prefix=out reference.fasta contigs.fasta
    
  2. Visualization: For visualizing the alignment, you can create a plot using MUMmer's mummerplot or view it using a genome browser like IGV.

    mummerplot --png --layout --filter --prefix=plot_out out.delta
    
  3. Scaffolding: Using RagTag to scaffold contigs against a reference.

    ragtag.py scaffold reference.fasta contigs.fasta
    
  4. Gap Filling: Using GapFiller to fill gaps within the scaffolds. Here config.txt will include paths to your reads and the initial assembly.

    GapFiller.pl -b config.txt -m 50 -o 2 -r 0.7 -n 10 -d 50 -i 3 -g 10 -t 30 -T 10 -B 200
    
  5. Validation: Re-mapping reads to your assembly to check for consistency, using Bowtie2 for example:

    bowtie2-build assembly.fasta assembly_index
    bowtie2 -x assembly_index -1 reads_1.fq -2 reads_2.fq -S aligned.sam
    
  6. Manual Curation: There is no direct command for this step as it involves manually inspecting and editing the assembly, but tools like Consed or Gap5 from the Staden package can be used for manual editing.

  7. Annotation: Using Prokka for bacterial genome annotation:

    prokka --outdir my_annotation --prefix my_bacteria assembly.fasta
    
  8. Examples

    ragtag.py correct ATCC17978.fasta  shovill/A19_17978_HQ/contigs.fa
    
    spades.py -1 results_ATCC19606/raw_data/A6_WT_HQ_R1.fastq.gz -2 results_ATCC19606/raw_data/A6_WT_HQ_R2.fastq.gz --careful --trusted-contigs ATCC19606.fasta -o spades_A6_ATCC19606
    spades.py -1 results_ATCC19606/raw_data/A10_CraA_HQ_R1.fastq.gz -2 results_ATCC19606/raw_data/A10_CraA_HQ_R2.fastq.gz --careful --trusted-contigs ATCC19606.fasta -o spades_A10_ATCC19606
    spades.py -1 results_AYE/raw_data/A12_AYE_HQ_R1.fastq.gz -2 results_AYE/raw_data/A12_AYE_HQ_R2.fastq.gz --careful --trusted-contigs AYE.fasta -o spades_A12_AYE
    spades.py -1 raw_data/A19_17978_HQ_R1.fastq.gz -2 raw_data/A19_17978_HQ_R2.fastq.gz --careful --trusted-contigs ATCC17978.fasta -o spades_A19_ATCC17978
    
    In SPAdes, the --careful and --trusted-contigs flags serve different purposes:
    
    --careful:
    
        This flag is used to reduce the number of mismatches and short indels. SPAdes performs additional mismatch correction and indel searching within the assembly algorithm when this flag is on. It makes the assembly process slower but can improve the quality of the final assembly by being more conservative in the handling of errors.
        The careful mode is particularly useful when you have high-quality sequencing data and you want to minimize the number of errors in the final assembly.
    
    --trusted-contigs:
    
        The --trusted-contigs option is used to provide SPAdes with high-quality contig sequences that you trust to be correct. This can be useful when you have a reference genome or partial assembly that is known to be of high quality, and you want to use this information to guide the assembly process.
        SPAdes will use these trusted contigs to assist in the assembly of the reads, essentially using them as a scaffold to fill in gaps and extend contigs. This can help in resolving repetitive regions or complex areas of the genome that are difficult to assemble using reads alone.
    
    Using --trusted-contigs does not make SPAdes any more conservative in terms of error correction; it simply provides additional information to help guide the assembly. The --careful mode, on the other hand, actively seeks to minimize errors throughout the assembly process.
    
    When combined, --careful and --trusted-contigs can be used to create a high-quality assembly that benefits from both error minimization and the use of trusted reference sequences.
    
    #debug the conda environment bengal3_ac3 on hamm
    mamba install -c anaconda openjdk=11
    

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默


© 2023 XGenes.com Impressum