Skip to contents

The goal of phyf is to implement a tibble subclass useful for statistical modelling on phylogenetic trees. It mainly implements an phylogenetic flow (pf) object that is essentially a tibble with one or more phylogenetic flow collection (pfc) columns. Phylogenetic flow collection columns are collections of phylogenetic flow paths (pfp) and phylogenetic flow features (pff), which store data on how information flows through a phylogeny from its root node (phylogenies must be rooted to work with phyf) to its tips (and its internal nodes). This allows for easy manipulation of the phylogeny and associated data. The objects are used in the package fibre for phylogenetic branch regression models, a highly felxible framework for comparative analysis and modelling trait evolution across a phylogeny.

Installation

You can install the development version of phyf from GitHub with:

# install.packages("devtools")
devtools::install_github("rdinnager/phyf")

Example

library(phyf)
#> 
#> Attaching package: 'phyf'
#> The following object is masked from 'package:stats':
#> 
#>     pf
library(ape)

Create a random phylogeny for testing:

test_tree <- rtree(100) 
plot(test_tree)

Turn it into a tibble-based phylogenetic flow format (pf) using pf_as_pf():

tree_pf <- pf_as_pf(test_tree)
tree_pf
#> # A tibble: 198 × 3
#>    label is_tip phlo                                                            
#>    <chr> <lgl>  <pfc>                                                           
#>  1 t62   TRUE   ◎── 0.51──→ Node2 ── 0.93──→ No…── 0.23──→ Node4 ── 0.14──→ t62 
#>  2 t55   TRUE   ◎── 0.51──→ Node2 ── 0.93──→ No…── 0.61──→ Node6 ── 0.66──→ t55 
#>  3 t74   TRUE   ◎── 0.51──→ Node2 ── 0.93──→ No…── 0.61──→ Node6 ── 0.28──→ t74 
#>  4 t1    TRUE   ◎── 0.51──→ Node2 ── 0.93──→ No… ── 0.99──→ Node7 ── 0.30──→ t1 
#>  5 t30   TRUE   ◎── 0.51──→ Node2 ── 0.93──→ No…── 0.99──→ Node7 ── 0.91──→ t30 
#>  6 t19   TRUE   ◎── 0.51──→ Node2 ── 0.93──→ No…─ 0.13──→ Node10 ── 0.52──→ t19 
#>  7 t54   TRUE   ◎── 0.51──→ Node2 ── 0.93──→ No…─ 0.17──→ Node11 ── 0.98──→ t54 
#>  8 t10   TRUE   ◎── 0.51──→ Node2 ── 0.93──→ No…─ 0.36──→ Node12 ── 0.13──→ t10 
#>  9 t34   TRUE   ◎── 0.51──→ Node2 ── 0.93──→ No…─ 0.36──→ Node12 ── 0.02──→ t34 
#> 10 t14   TRUE   ◎──0.508──→ Node2 ──0.931──→ No…─0.097──→ Node14 ──0.162──→ t14 
#> # … with 188 more rows

This tibble is easy to join data by using the label, which include the tip labels from the phylogeny as well as ‘NodeXX’ for internal nodes, where XX starts a 1 and goes to the total number of internal nodes (the names are generated using ape::makeNodeLabel()). You can also use the node_num column (not yet implemented) to join by the node number, where the node number is based on the node at the end of each phylogeny edge, and uses the traditional ordering of the nodes from the ape package in phylo objects. Usually you will only have data on the tips, and so when joining to the pf object (using e.g. dplyr::left_join()) the internal node rows will recieve NA values. This is the desired behaviour. The missing values are easy to drop for fitting a model, but are useful later when making predictions (that is, ancestral state estimates).

Built-in Datasets

phyf includes a number of built-in datasets to demonstrate the format. For example the AVONET dataset, can be accessed as avonet:

avonet
#> # A tibble: 13,338 × 39
#>    label      is_tip phlo            Speci…¹ Family3 Order3 Total…² Female  Male
#>    <chr>      <lgl>  <pfc>           <chr>   <chr>   <chr>    <dbl>  <dbl> <dbl>
#>  1 Struthio_… TRUE   ◎──   2…amelus  Struth… Struth… Strut…       8      1     4
#>  2 Rhea_amer… TRUE   ◎── 26.…ricana  Rhea a… Rheidae Rheif…       5      2     2
#>  3 Rhea_penn… TRUE   ◎── 26.…ennata  Rhea p… Rheidae Rheif…       6      3     2
#>  4 Apteryx_a… TRUE   ◎──26.8…tralis  Aptery… Aptery… Apter…       6      2     2
#>  5 Apteryx_m… TRUE   ◎──26.8…ntelli  Aptery… Aptery… Apter…       4      2     2
#>  6 Apteryx_o… TRUE   ◎──26.8…owenii  Aptery… Aptery… Apter…       5      2     3
#>  7 Apteryx_h… TRUE   ◎──26.8…aastii  Aptery… Aptery… Apter…       9      6     3
#>  8 Dromaius_… TRUE   ◎──26.8…andiae  Dromai… Dromai… Casua…       5      2     3
#>  9 Casuarius… TRUE   ◎──26.8…uarius  Casuar… Casuar… Casua…       7      2     3
#> 10 Casuarius… TRUE   ◎──26.8…nnetti  Casuar… Casuar… Casua…       4      1     0
#> # … with 13,328 more rows, 30 more variables: Unknown <dbl>,
#> #   Complete.measures <dbl>, Beak.Length_Culmen <dbl>, Beak.Length_Nares <dbl>,
#> #   Beak.Width <dbl>, Beak.Depth <dbl>, Tarsus.Length <dbl>, Wing.Length <dbl>,
#> #   Kipps.Distance <dbl>, Secondary1 <dbl>, `Hand-Wing.Index` <dbl>,
#> #   Tail.Length <dbl>, Mass <dbl>, Mass.Source <chr>, Mass.Refs.Other <chr>,
#> #   Inference <chr>, Traits.inferred <chr>, Reference.species <chr>,
#> #   Habitat <chr>, Habitat.Density <dbl>, Migration <dbl>, …