Scaffolding and finishing an assembly with a reference genome

gene_x 0 like s 54 view s

Tags: processing, mutation, bacterium, variation, pipeline

Below are example commands for each step of scaffolding and finishing an assembly with a reference genome. Please note that the specific parameters may need to be adjusted based on your data and computational resources. Replace reference.fasta, contigs.fasta, reads_1.fq, and reads_2.fq with the paths to your reference genome, your contig sequences, and your paired-end sequencing reads, respectively. Furthermore, software like RagTag and GapFiller may require additional setup steps or files, like read mapping beforehand or specific configuration files. Always refer to the documentation of each software for the most accurate and effective usage.

  1. Alignment of Contigs to Reference:

    nucmer --maxgap=500 --mincluster=100 --prefix=out reference.fasta contigs.fasta
  2. Visualization: For visualizing the alignment, you can create a plot using MUMmer's mummerplot or view it using a genome browser like IGV.

    mummerplot --png --layout --filter --prefix=plot_out
  3. Scaffolding: Using RagTag to scaffold contigs against a reference. scaffold reference.fasta contigs.fasta
  4. Gap Filling: Using GapFiller to fill gaps within the scaffolds. Here config.txt will include paths to your reads and the initial assembly. -b config.txt -m 50 -o 2 -r 0.7 -n 10 -d 50 -i 3 -g 10 -t 30 -T 10 -B 200
  5. Validation: Re-mapping reads to your assembly to check for consistency, using Bowtie2 for example:

    bowtie2-build assembly.fasta assembly_index
    bowtie2 -x assembly_index -1 reads_1.fq -2 reads_2.fq -S aligned.sam
  6. Manual Curation: There is no direct command for this step as it involves manually inspecting and editing the assembly, but tools like Consed or Gap5 from the Staden package can be used for manual editing.

  7. Annotation: Using Prokka for bacterial genome annotation:

    prokka --outdir my_annotation --prefix my_bacteria assembly.fasta
  8. Examples correct ATCC17978.fasta  shovill/A19_17978_HQ/contigs.fa -1 results_ATCC19606/raw_data/A6_WT_HQ_R1.fastq.gz -2 results_ATCC19606/raw_data/A6_WT_HQ_R2.fastq.gz --careful --trusted-contigs ATCC19606.fasta -o spades_A6_ATCC19606 -1 results_ATCC19606/raw_data/A10_CraA_HQ_R1.fastq.gz -2 results_ATCC19606/raw_data/A10_CraA_HQ_R2.fastq.gz --careful --trusted-contigs ATCC19606.fasta -o spades_A10_ATCC19606 -1 results_AYE/raw_data/A12_AYE_HQ_R1.fastq.gz -2 results_AYE/raw_data/A12_AYE_HQ_R2.fastq.gz --careful --trusted-contigs AYE.fasta -o spades_A12_AYE -1 raw_data/A19_17978_HQ_R1.fastq.gz -2 raw_data/A19_17978_HQ_R2.fastq.gz --careful --trusted-contigs ATCC17978.fasta -o spades_A19_ATCC17978
    In SPAdes, the --careful and --trusted-contigs flags serve different purposes:
        This flag is used to reduce the number of mismatches and short indels. SPAdes performs additional mismatch correction and indel searching within the assembly algorithm when this flag is on. It makes the assembly process slower but can improve the quality of the final assembly by being more conservative in the handling of errors.
        The careful mode is particularly useful when you have high-quality sequencing data and you want to minimize the number of errors in the final assembly.
        The --trusted-contigs option is used to provide SPAdes with high-quality contig sequences that you trust to be correct. This can be useful when you have a reference genome or partial assembly that is known to be of high quality, and you want to use this information to guide the assembly process.
        SPAdes will use these trusted contigs to assist in the assembly of the reads, essentially using them as a scaffold to fill in gaps and extend contigs. This can help in resolving repetitive regions or complex areas of the genome that are difficult to assemble using reads alone.
    Using --trusted-contigs does not make SPAdes any more conservative in terms of error correction; it simply provides additional information to help guide the assembly. The --careful mode, on the other hand, actively seeks to minimize errors throughout the assembly process.
    When combined, --careful and --trusted-contigs can be used to create a high-quality assembly that benefits from both error minimization and the use of trusted reference sequences.
    #debug the conda environment bengal3_ac3 on hamm
    mamba install -c anaconda openjdk=11

like unlike






© 2023 Impressum