Constructing a Tree

We provide make_tree.py for a one step tree generation process. Additionally, we offer get_cds_alignments.py to generate the necessary CDS alignment files for tree construction. You can use these CDS alignments as input to build trees using other methods. You can utilize download_asfv_genome.py to download all the latest ASFV genomes from NCBI. Alternatively, you can use the directory "single_fasta" provided on github containing 312 genomes.

make_tree.py

Description

188 CDS from NC_044959.2 (ASFV Georgia 2007/1) were used as a reference and employed Exonerate for pairwise sequence comparison. This facilitated the acquisition of the corresponding CDS for each ASFV isolate. These obtained results were then utilized as input for the tree construction of "denovo" mode using uDance. Subsequently, the generated tree served as a backbone for a second iteration of tree construction by adjusting the backbone option to 'tree' in uDance, allowing the replacement of unplaceable taxa (the taxa that could not find their optimal placement in the tree).

Arguments

Argument name Required Description
-p, --processes No number of processes (default = 4)
-f, --file Yes a directory of multiple ASFV genome fasta files as input
-o, --output Yes name of output directory
--udance Yes path to udance directory

Example

make_tree.py -p 4 -f single_fasta -o tree --udance ./uDance --iteration

Output

The final tree file is a Newick file named 'tree.nwk', located in the output dir specified by the -o parameter.


get_cds_alignments.py

Description

Get the CDS for each ASFV isolate and perform multiple sequence alignment. The output obtained can be directly used as input for tree building (this task is already included in make_tree.py).

Arguments

Argument name Required Description
-f, --file Yes a directory of multiple ASFV genome fasta files as input
-c, --core No number of processes (default = 32)

Example

get_cds_alignments.py -f single_fasta

Output

A dir "alignments" containing the multiple sequence alignment for each CDS.