22 July 2014

Morning

  • Spent some time this morning reading on machine learning. Discovered Metacademy as a potentially useful resource. Also found this old Coursera course run by Geoff Hinton from University of Toronto: Neural Networks for Machine Learning. I am still kicking myself for not realizing that Univerisity of Toronto has a world-class machine learning department, while I was there, so I could have taken advantage of it. How did I not know that? To be fair, though, I had not really gotten into machine learning yet when I finished my Ph.D. there.

  • Worked on a script to run Artificial Life Network (ALF) from R. Script uses whisker to plug R values into a mustache template of a basic ALF configuration file, then runs ALF on it. This just does what I need it to for my Alignment-free Diversity (#AFD) project for now. But I hope I can expand this into a full framework for running all of ALF functionality. The obvious name for this package should be Ralf (I’ve already started a skeleton package with this name on github :)

Afternoon

  • Split RTDs from #WBHC project into training and test set. WHy did this take so long to figure out? Oh well, did it with dplyr, so was pretty fast even with huge data once I got the code figured out.

New Project: Phylogenetic modelling of co-occuring clades

Did some thinking about a new model to analyze data where you have occurence data for two different taxonomic groups, where you think one of the taxonomic group’s occurence is related to the occurence of the other taxonomic group. e.g. Herbivores and plants, parasites and hosts, etc.

The idea is to treat the occurence of the plants or hosts as fixed, so think of them as a property of a site, which is valid either when one of the clades is experimentally manipulated, or one of the clades is long-lived and sessile compared to the other clade. Then you can model the herbivore’s and parasite’s occurence using logistic regression, with the presence or absence (or abundance) of the hosts as predictors. But since more than one host can occur at a site, this produces a type of mixture model. A good example of this is a paper from MEE: How many hosts?. This is a really great paper by Peter Vesk et. al. I want to take a simplified version of the model in this paper and mash it with the phylogenetic models of network interactions in Jarrod Hadfield et al.’s A tale of two phylogenies.

The idea would be to model an association matrix between host and parasite incorporating both host and parasite phylogeny and then use the association matrix to model the probability of occurence of the parasite when the host is observed, all in one model. There are a number of potential challenges in this, not the least of which will be finding the right software to implement. I think it can be done in Stan. I will try and work out some basic code to do it this week. I have already talked with Eric Lind about applying this model to Cedar Creek Arthropod data. I can test it using Kimberley rainforest patch bird and plant data, for which I have phylogenies.

I think the hashtag acronym for this project will be #PACC, which stands for Phylogenetic Analysis of Co-occuring Clades.



blog comments powered by Disqus