While waiting for the bus in the pouring rain to go from campus in La Jolla to my sopping wet bike in Hillcrest (smart move of the day #1 was leaving my bike helmet attached to my bike) I read a recent article by Aaron Ellison and Brian Dennis (http://www.esajournals.org/doi/abs/10.1890/080209) suggesting that all ecologists have two terms of college-level calculus, a touch of linear algebra and several probability courses. That’s a daunting list for a good number of ecologists (and I don’t exactly measure up—though I did sit in on linear algebra in my final year of grad school—clearly the perfect time to absorb new info), with some who far exceed it and many who dreadfully avoid differential equations.
That and a recent project re-synthesizing a >30 year dataset in plant phenology has got me wondering what the requirements should be for the more technical aspects of analyzing data. Lots of people say the future of ecology is synthesizing big datasets and asking questions on global scales, but that actually requires some key skills. I came out of grad school with a basic knowledge of R and thought I was ahead of the curve. Now that I am processing a number of long-term datasets I’m impressed by all the basics needed just to keep a decent pace, and also how much you can do with them.
For the 30-year dataset we needed to read in and manipulate hundreds of xls files into one nice, usable csv. Despite learning a couple new coding languages in the last year I can’t do that, but luckily Jim Regetz, a rare mix of ecologist and computer programmer can (in perl mixed with R). I play backup singer and help with post-processing in R. To do just that somewhat efficiently I have: a nice monitor (because monitors >24” but <30” increase productivity: http://www.kentshaffer.com/increase-your-productivity-with-a-24-computer-monitor/), a version control system to keep track of the code Jim and I share, project management software which tallies all the project issues and their someday resolutions with key notes along the way, an editor I love (emacs) that I use for all my code, including R, where I do 95% of my data work and an irc chat so Jim and I can discuss what the word 'satlks' means (since he’s in sunny Santa Barbara and I apparently live in a puddle). And this is for just one of 29 datasets (though I admit the gnarliest).
My monitor today while working on one dataset (I highly recommend my new Dell Ultrasharp 27inch by the way, and I am not being paid to say that). |
I am lucky to have an NCEAS working group that got the version control and project management stuff up and running but I am daunted by (well, first I am daunted that someday I need to set all these things up for myself, by myself) how much more we could get done in ecology if there were more people like Jim, slowly filtering the best resources to be productive and useful to labs and students. For most of grad school I used JMP and avoided scripting but I am no longer convinced it’s any harder to program my brain to remember to type a single word command than to remember which menu, submenu, right-click sequence to use to find the same command already pre-programmed for me. (And it’s of course immensely more useful for me to have all my code, crufty comments and all, to go back to, than to open JMP and a bunch of different, befuddled, saved file steps.)
No comments:
Post a Comment