TreeExtender.pl user guide

TreeExtender.pl reads phylogenetic trees in Newick format and converts them to the phyloXML format. It can also read ancestral state probabilities or ancestral state values from external programs and annotate the phyloXML tree with these probabilities. At present, ancestral state values for continuous variables can be taken from the ace function of the ape package and ancestral state probabilities of discrete characters can be taken from BayesTraits output. TreeExtender.pl can import phyloXML files, which allows the possibility to sequentially add various traits to the same phyloXML tree. The program's main function is to prepare trees for the TreeGradients.pl script.

Dependencies

TreeGradients.pl is a Perl script. It requires that you have a recent version of Perl installed. In addition, TreeGradients.pl uses code from a number of external libraries. So, before running the script, make sure you have these modules installed. They can be obtained from CPAN.

  • BioPerl bundle
  • IO::File
  • XML::DOM
  • XML::Writer

Using the program

The program is a Perl script that must be run from the command-line. Its functionality must be controlled with command-line parameters.


global parameters

-i This option allows specifying the file containing the input tree(s). It is a required parameter.
-o This option allows specifying an output filename. Output is always written in phyloXML format. If no output filename is specified, the program will derive one from the input filename specified with –i. When the input filename has a common extension (e.g., .nex, .nwk), this will simply be replaced with .xml. If the input filename doesn't have an extension or if it is not recognized, the output filename becomes the input filename extended with .xml
-tf This option allows specifying the format of the input file. These formats are supported:
newick Newick format, also known as New Hampshire format (parsed with BioPerl)
phyloxml The phyloXML format. Note that the parser in TreeExtender.pl doesn't support all the features that phyloXML offers. It can parse the files that TreeExtender.pl writes itself, allowing sequential additions of features to a tree.
-id This option allows specifying what the identifiers of internal nodes in the Newick or Nexus input tree represent. The identifiers are right after ) in the tree file (underlined in the following example).
((A:0.12,B:0.15)59:0.11,((C:0.21,D:0.24)100:0.03,E:0.12)83:0.19);
id Node identifiers represent names of internal nodes (e.g., higher taxa, protein families)
bootstrap Node identifiers represent bootstrap proportions
posterior Node identifiers represent posterior probabilities

activating the ancestral character state parser

-p This option activates the file parser for ancestral character states and is used to specify which parser to use. The different parsers interpret output from different ancestral state reconstruction programs.
mesq Mesquite
bt BayesTraits

options for parsing Mesquite files

-vt Specify the type of file you're parsing. Mesquite can generate several types of files that can be parsed by TreeExtender.pl. Currently, only two types are supported.
cont Ancestor states for continuous characters inferred with the least squares MP option
disc Ancestor states for discrete (binary) characters inferred with ML
-f1 The file containing the Mesquite ancestral states output
-vn Specify the name of the character you're using (without spaces or weird characters). The program will store the information from the Mesquite output file under this variable name in the phyloXML file. When this option is not used, the TreeExtender.pl will extract the variable name specified in the Mesquite output.

options for parsing BayesTraits files

-vt Specify the type of file you're parsing. BayesTraits can generate several types of files of which only a single one can currently be parsed by TreeExtender.pl.
double_discrete Ancestor states for two discrete variables inferred with the independent or interdependent models
multistate Ancestor states for a three-state variable inferred using the multistate model
-f1 The file containing the BayesTraits output
-f2 The file containing the data for terminal taxa (i.e., the input file used for BayesTraits)
-vn Specify the name of the character you're using (without spaces or weird characters). The program will store the information from the BayesTraits output file under this variable name in the phyloXML file.
-jp This option allows converting probabilities of the double_discrete type to probabilities for a single, discrete character. Basically, this calculates the probabilities of state 0 and state 1 for the desired characters by summing across the probabilities of both states for the other character.
e.g., Pr (a = 1) = Pr (a = 1, b = 0) + Pr (a = 1, b = 1)
1 Extracts the probabilities for the first character given as input to BayesTraits.
2 Extracts the probabilities for the second character given as input to BayesTraits.

options for parsing list files

-vt Specify the type of variable you are parsing.
continuous Ancestor states for a continuous variable
-f1 The file containing the list of character states for each node
-vn Specify the name of the character you're using (without spaces or weird characters). The program will store the information from the list output file under this variable name in the phyloXML file.

Format of list file

The list file contains ancestral character state values in a very straightforward format. It is a plain text file in which each line has a character value followed by a tab character and a list of the terminal taxa subtended by the node to which the character value applies. The list of terminal taxa must have a space character between each taxon. So, obviously, you're not supposed to have spaces in your taxon names. This easy list format allows users to come up with hypothetical ancestral values and plot them with TreeGradients. For example, states for all nodes in a rooted a four-taxon tree can be listed as follows:

  • example tree
hominoid tree
  • hypothetical list file for this tree
4.13 orangutan
3.75 gorilla
2.15 chimpanzee
2.23 human
2.35 chimpanzee human
4.02 chimpanzee human gorilla
4.88 orangutan chimpanzee human gorilla

Custom tags in phyloXML

When TreeExtender.pl is asked to parse output from ancestral trait reconstruction programs, it creates phyloXML files with custom tags at nodes. The tags are self-explanatory but I'm pasting some examples below to show what they look like.

  • example of tag for discrete character
<custom>
<name>name_of_character</name>
<type>discrete</type>
<state>
<state>0</state>
<probability>0.85</probability>
</state>
<state>
<state>1</state>
<probability>0.15</probability>
</state>
</custom>
  • example of tag for double_discrete character
<custom>
<name>name_of_character</name>
<type>discrete</type>
<state>
<state>00</state>
<probability>0.65</probability>
</state>
<state>
<state>01</state>
<probability>0.05</probability>
</state>
<state>
<state>10</state>
<probability>0.20</probability>
</state>
<state>
<state>11</state>
<probability>0.10</probability>
</state>
</custom>
  • example of tag for multistate character
<custom>
<name>name_of_character</name>
<type>multistate</type>
<state>
<state>0</state>
<probability>0.519707</probability>
</state>
<state>
<state>1</state>
<probability>0.116460</probability>
</state>
<state>
<state>2</state>
<probability>0.363833</probability>
</state>
</custom>
  • example of tag for continuous character
<custom>
<name>name_of_character</name>
<type>continuous</type>
<value>2.659123</value>
</custom>

Custom tags in BioPerl trees

This program uses Bio::Tree::Tree objects to store trees in memory and attach data to the tree nodes. Character data are added as tags; the tag name corresponding to the name of the character. Tags have the following structure:

  • example of tag for discrete character
$tag = {
'type'  =>  'discrete',
'name'  =>  'name_of_character',
'state' =>  [
{'state' => 0, 'probability' => 0.85},
{'state' => 1, 'probability' => 0.15}
]
};
  • example of tag for double_discrete character
$tag = {
'type'  =>  'double_discrete',
'name'  =>  'name_of_character',
'state' =>  [
{'state' => 00, 'probability' => 0.65},
{'state' => 01, 'probability' => 0.05}
{'state' => 10, 'probability' => 0.20},
{'state' => 11, 'probability' => 0.10}
]
};
  • example of tag for multistate character
$tag = {
'type'  =>  'multistate',
'name'  =>  'name_of_character',
'state' =>  [
{'state' => 0, 'probability' => 0.519707},
{'state' => 1, 'probability' => 0.116460},
{'state' => 2, 'probability' => 0.363833}
]
};
  • example of tag for continuous character
$tag = {
'type'  =>  'continuous',
'name'  =>  'name_of_character',
'value' =>  2.659123
};