TreeExtender.pl reads
phylogenetic trees in Newick format and converts them to the phyloXML
format. It can also read ancestral state probabilities or ancestral
state values from external
programs and annotate the phyloXML tree with these probabilities. At
present, ancestral state values for continuous variables can be taken
from the ace function of the ape
package and ancestral state probabilities of discrete characters can be
taken from BayesTraits
output. TreeExtender.pl
can import phyloXML files, which allows the possibility to sequentially
add
various traits to the same phyloXML tree. The program's main function
is to prepare trees for the TreeGradients.pl script.
TreeGradients.pl is a Perl
script. It requires that you have a recent version of Perl installed.
In addition, TreeGradients.pl uses code from a number of external
libraries. So, before running the script, make sure you have these
modules installed. They can be obtained from CPAN.
- BioPerl
bundle
- IO::File
- XML::DOM
- XML::Writer
The program is a Perl script
that must be run from the command-line. Its functionality must be
controlled with command-line parameters.
global parameters
| -i |
This
option allows specifying the file containing the input tree(s). It is a
required parameter. |
| -o |
This
option allows specifying an output filename. Output is always written
in phyloXML format. If no output filename is specified, the program
will derive one from the input filename specified with –i.
When the input filename has a common extension (e.g., .nex, .nwk), this
will simply be replaced with .xml. If the input filename doesn't have
an extension or if it is not recognized, the output filename becomes
the input filename extended with .xml |
| -tf |
This
option allows specifying the format of the input file. These formats
are supported:
| newick |
Newick
format, also known as New Hampshire format (parsed with BioPerl) |
| phyloxml |
The
phyloXML format. Note that the parser in TreeExtender.pl doesn't
support all the features that phyloXML offers. It can parse the files
that TreeExtender.pl writes itself, allowing sequential additions of
features to a tree. |
|
| -id |
This
option allows specifying what the identifiers of internal nodes in the
Newick or Nexus input tree represent. The identifiers are right after )
in the tree file (underlined in the following example).
((A:0.12,B:0.15)59:0.11,((C:0.21,D:0.24)100:0.03,E:0.12)83:0.19);
| id |
Node
identifiers represent names of internal nodes (e.g., higher taxa,
protein families) |
| bootstrap |
Node
identifiers represent bootstrap proportions |
| posterior |
Node
identifiers represent posterior probabilities |
|
activating the ancestral
character state parser
| -p |
This
option activates the file parser for ancestral character states and is
used to specify which parser to use. The different parsers interpret
output from different ancestral state reconstruction programs.
| mesq |
Mesquite |
| bt |
BayesTraits |
|
options for parsing Mesquite
files
| -vt |
Specify
the type of file you're parsing. Mesquite can generate several types of
files that can be parsed by TreeExtender.pl. Currently, only two types
are supported.
| cont |
Ancestor
states for continuous characters inferred with the least squares MP
option |
| disc |
Ancestor
states for discrete (binary) characters inferred with ML |
|
| -f1 |
The
file containing the Mesquite ancestral states output |
| -vn |
Specify
the name of the character you're using (without spaces or weird
characters). The program will store the information from the Mesquite
output file under this variable name in the phyloXML file. When this
option is not used, the TreeExtender.pl will extract the variable name
specified in the Mesquite output. |
options for parsing
BayesTraits files
| -vt |
Specify
the type of file you're parsing. BayesTraits can generate
several types
of files of which only a single one can currently be parsed by
TreeExtender.pl.
| double_discrete |
Ancestor
states for two discrete variables inferred with the
independent or interdependent models |
| multistate |
Ancestor
states for a three-state variable inferred using the
multistate model |
|
| -f1 |
The
file containing the BayesTraits output |
| -f2 |
The
file containing the data for terminal taxa (i.e., the input file
used for BayesTraits) |
| -vn |
Specify
the name of the character you're using (without spaces or weird
characters). The program will store the information from the
BayesTraits output file under this variable name in the phyloXML file. |
| -jp |
This
option allows converting probabilities of the double_discrete type
to
probabilities for a single, discrete character. Basically, this
calculates the probabilities of state 0 and state 1 for the desired
characters by summing across the probabilities of both states for the
other character.
e.g., Pr (a = 1) = Pr (a = 1, b = 0) + Pr (a = 1, b = 1)
| 1 |
Extracts the
probabilities for the first character given as input to
BayesTraits. |
| 2 |
Extracts the
probabilities for the second character given as input to BayesTraits. |
|
options for parsing list files
| -vt |
Specify
the type of variable you are parsing.
| continuous |
Ancestor
states for a continuous variable |
|
| -f1 |
The
file containing the list of character states for each node |
| -vn |
Specify
the name of the character you're using (without spaces or weird
characters). The program will store the information from the list
output file under this variable name in the phyloXML file. |
The list file
contains ancestral character state values in a very
straightforward format. It is a plain text file in which each line has
a character value followed by a tab character and a list of
the terminal taxa subtended by the node to which the character value
applies. The list of terminal taxa must have a space character between
each taxon. So, obviously, you're not supposed to have spaces
in your
taxon names. This easy list format allows users to come up with
hypothetical ancestral values and plot them with TreeGradients. For
example, states for all nodes in a rooted a four-taxon tree can be
listed as follows:
- hypothetical list file for this tree
4.13 orangutan
3.75 gorilla
2.15 chimpanzee
2.23 human
2.35 chimpanzee human
4.02 chimpanzee human gorilla
4.88 orangutan chimpanzee human gorilla
When TreeExtender.pl is asked
to parse output from ancestral trait reconstruction programs, it
creates phyloXML files with custom tags at nodes. The tags are
self-explanatory but I'm pasting some examples below to show what they
look like.
- example of tag for discrete
character
<custom>
<name>name_of_character</name>
<type>discrete</type>
<state>
<state>0</state>
<probability>0.85</probability>
</state>
<state>
<state>1</state>
<probability>0.15</probability>
</state>
</custom>
- example of tag for double_discrete
character
<custom>
<name>name_of_character</name>
<type>discrete</type>
<state>
<state>00</state>
<probability>0.65</probability>
</state>
<state>
<state>01</state>
<probability>0.05</probability>
</state>
<state>
<state>10</state>
<probability>0.20</probability>
</state>
<state>
<state>11</state>
<probability>0.10</probability>
</state>
</custom>
- example of tag for multistate
character
<custom>
<name>name_of_character</name>
<type>multistate</type>
<state>
<state>0</state>
<probability>0.519707</probability>
</state>
<state>
<state>1</state>
<probability>0.116460</probability>
</state>
<state>
<state>2</state>
<probability>0.363833</probability>
</state>
</custom>
- example of tag for continuous
character
<custom>
<name>name_of_character</name>
<type>continuous</type>
<value>2.659123</value>
</custom>
This program uses
Bio::Tree::Tree objects to store trees in memory and attach data to the
tree nodes. Character data are added as tags; the tag name
corresponding to the name of the character. Tags have the following
structure:
- example of tag for discrete
character
$tag = {
'type' => 'discrete',
'name' => 'name_of_character',
'state' => [
{'state' => 0, 'probability' => 0.85},
{'state' => 1, 'probability' => 0.15}
]
};
- example of tag for double_discrete
character
$tag = {
'type' => 'double_discrete',
'name' => 'name_of_character',
'state' => [
{'state' => 00, 'probability' => 0.65},
{'state' => 01, 'probability' => 0.05}
{'state' => 10, 'probability' => 0.20},
{'state' => 11, 'probability' => 0.10}
]
};
- example of tag for multistate
character
$tag = {
'type' => 'multistate',
'name' => 'name_of_character',
'state' => [
{'state' => 0, 'probability' => 0.519707},
{'state' => 1, 'probability' => 0.116460},
{'state' => 2, 'probability' => 0.363833}
]
};
- example of tag for continuous
character
$tag = {
'type' => 'continuous',
'name' => 'name_of_character',
'value' => 2.659123
};