TreeExtender.pl reads phylogenetic trees in Newick format and converts them to the phyloXML format. It can also read ancestral state probabilities from external programs (at present, only BayesTraits is implemented) and annotate the phyloXML tree with these probabilities. The program can import phyloXML files, creating the possibility to sequentially add various traits to the same phyloXML tree. The program's main function is to prepare trees for the TreeGradients.pl script.
TreeGradients.pl is a Perl script. It requires that you have a recent version of Perl installed. In addition, TreeGradients.pl uses code from a number of external libraries. So, before running the script, make sure you have these modules installed. They can be obtained from CPAN.
- BioPerl bundle
- IO::File
- XML::DOM
- XML::Writer
The program is a Perl script that must be run from the command-line. Its functionality must be controlled with command-line parameters.
global parameters
| -i |
This option allows specifying the file containing the input tree(s). It is a required parameter. |
| -o |
This option allows specifying an output filename. Output is always written in phyloXML format. If no output filename is specified, the program will derive one from the input filename specified with –i. When the input filename has a common extension (e.g., .nex, .nwk), this will simply be replaced with .xml. If the input filename doesn't have an extension or if it is not recognized, the output filename becomes the input filename extended with .xml |
| -tf |
This option allows specifying the format of the input file. These formats are supported:
| newick |
Newick format, also known as New Hampshire format (parsed with BioPerl) |
| phyloxml |
The phyloXML format. Note that the parser in TreeExtender.pl doesn't support all the features that phyloXML offers. It can parse the files that TreeExtender.pl writes itself, allowing sequential additions of features to a tree. |
|
| -id |
This option allows specifying what the identifiers of internal nodes in the Newick or Nexus input tree represent. The identifiers are right after ) in the tree file (underlined in the following example).
((A:0.12,B:0.15)59:0.11,((C:0.21,D:0.24)100:0.03,E:0.12)83:0.19);
| id |
Node identifiers represent names of internal nodes (e.g., higher taxa, protein families) |
| bootstrap |
Node identifiers represent bootstrap proportions |
| posterior |
Node identifiers represent posterior probabilities |
|
activating the ancestral character state parser
| -p |
This option activates the file parser for ancestral character states and is used to specify which parser to use. The different parsers interpret output from different ancestral state reconstruction programs.
| mesq |
Mesquite |
| bt |
BayesTraits |
|
options for parsing Mesquite files
| -vt |
Specify the type of file you're parsing. Mesquite can generate several types of files that can be parsed by TreeExtender.pl. Currently, only two types are supported.
| cont |
Ancestor states for continuous characters inferred with the least squares MP option |
| disc |
Ancestor states for discrete (binary) characters inferred with ML |
|
| -f1 |
The file containing the Mesquite ancestral states output |
| -vn |
Specify the name of the character you're using (without spaces or weird characters). The program will store the information from the Mesquite output file under this variable name in the phyloXML file. When this option is not used, the TreeExtender.pl will extract the variable name specified in the Mesquite output. |
options for parsing BayesTraits files
| -vt |
Specify the type of file you're parsing. BayesTraits can generate several types of files of which only a single one can currently be parsed by TreeExtender.pl.
| double_discrete |
Ancestor states for two discrete variables inferred with the independent or interdependent models |
| multistate |
Ancestor states for a three-state variable inferred using the multistate model |
|
| -f1 |
The file containing the BayesTraits output |
| -f2 |
The file containing the data for terminal taxa (i.e., the input file used for BayesTraits) |
| -vn |
Specify the name of the character you're using (without spaces or weird characters). The program will store the information from the BayesTraits output file under this variable name in the phyloXML file. |
| -jp |
This option allows converting probabilities of the double_discrete type to probabilities for a single, discrete character. Basically, this calculates the probabilities of state 0 and state 1 for the desired characters by summing across the probabilities of both states for the other character.
e.g., Pr (a = 1) = Pr (a = 1, b = 0) + Pr (a = 1, b = 1)
| 1 |
Extracts the probabilities for the first character given as input to BayesTraits. |
| 2 |
Extracts the probabilities for the second character given as input to BayesTraits. |
|
When TreeExtender.pl is asked to parse output from ancestral trait reconstruction programs, it creates phyloXML files with custom tags at nodes. The tags are self-explanatory but I'm pasting some examples below to show what they look like.
- example of tag for discrete character
<custom>
<name>name_of_character</name>
<type>discrete</type>
<state>
<state>0</state>
<probability>0.85</probability>
</state>
<state>
<state>1</state>
<probability>0.15</probability>
</state>
</custom>
- example of tag for double_discrete character
<custom>
<name>name_of_character</name>
<type>discrete</type>
<state>
<state>00</state>
<probability>0.65</probability>
</state>
<state>
<state>01</state>
<probability>0.05</probability>
</state>
<state>
<state>10</state>
<probability>0.20</probability>
</state>
<state>
<state>11</state>
<probability>0.10</probability>
</state>
</custom>
- example of tag for multistate character
<custom>
<name>name_of_character</name>
<type>multistate</type>
<state>
<state>0</state>
<probability>0.519707</probability>
</state>
<state>
<state>1</state>
<probability>0.116460</probability>
</state>
<state>
<state>2</state>
<probability>0.363833</probability>
</state>
</custom>
- example of tag for continuous character
<custom>
<name>name_of_character</name>
<type>continuous</type>
<value>2.659123</value>
</custom>
This program uses Bio::Tree::Tree objects to store trees in memory and attach data to the tree nodes. Character data are added as tags; the tag name corresponding to the name of the character. Tags have the following structure:
- example of tag for discrete character
$tag = {
'type' => 'discrete',
'name' => 'name_of_character',
'state' => [
{'state' => 0, 'probability' => 0.85},
{'state' => 1, 'probability' => 0.15}
]
};
- example of tag for double_discrete character
$tag = {
'type' => 'double_discrete',
'name' => 'name_of_character',
'state' => [
{'state' => 00, 'probability' => 0.65},
{'state' => 01, 'probability' => 0.05}
{'state' => 10, 'probability' => 0.20},
{'state' => 11, 'probability' => 0.10}
]
};
- example of tag for multistate character
$tag = {
'type' => 'multistate',
'name' => 'name_of_character',
'state' => [
{'state' => 0, 'probability' => 0.519707},
{'state' => 1, 'probability' => 0.116460},
{'state' => 2, 'probability' => 0.363833}
]
};
- example of tag for continuous character
$tag = {
'type' => 'continuous',
'name' => 'name_of_character',
'value' => 2.659123
};