Automated Ontology Building in Ecology

One of the more difficult aspects of trying to apply “big data” thinking in ecology is the massive heterogeneity of terms. I stumble over this issue every time I work on a data set for the Encyclopedia of Life. The many different ways to describe the same habitat (among other things) and the varying granularity with which people describe habitats make it very difficult for data consumers to find, for example, all the beetles that live in the desert. It’s doubly more difficult to go a step further and ask for traits of beetles that live in deserts, like color, for example.

As a side note, that example is very similar to some use cases I published with several colleagues about ways to combine phenotype and environment data.

Right now, we can ask Google “How much does a narwhal weigh?” and get the answer because of the fine work my EOL colleagues and I have been doing on TraitBank (go ahead, try it), but we’ve still got a way to go before we can ask “What color are beetles that live in the desert?”. We have a plan, though, and it involves semantic technology, i.e. ontologies.

Biology already has many ontologies available for use of varying quality. Most of them can be found at OBO Foundry. Not all domains of biology have good ontologies available, for example, ecology has been left out. That means there is no standard, machine-readable way of expressing which organisms are autotrophs, or nocturnal, or use camouflage, etc. Including terms such as these in an ontology is one of the many necessary steps before we can ask “Which organisms are nocturnal in an alpine forest habitat?” or, if we want to get more complicated, “Is there a relationship between the phylogeny of terrestrial, nocturnal organisms and latitude or elevation?”.

Building an ontology is a large, never-ending, hugely complicated task. One of my clients at University Colorado, Boulder, is the ClearEarth project. The goal of this project is to repurpose NLP and ML algorithms developed for biomedicine for use in geology and biology. These algorithms can read text and automatically generate ontologies. We’ve made a lot of progress annotating domain-specific text and will have some “auto-ontologies” by this summer. Very exciting! To support this effort and make sure the ontologies resulting from this project are meshed in with existing bio-ontologies, we are hosting an “ontology-a-thon” in Boulder this summer. Please take a look and apply, if you are interested in participating. We don’t have a detailed agenda just yet, but the idea is to get ontology and ecology experts in one room to curate the auto-ontology. All expenses paid, but space is limited.

Semantic Linking of Phenotypes and Environments

peerj-1470-graphical-abstract

One of the fundamental goals of biology is understanding the interactions of environment and phenotype, but this is a surprisingly difficult topic to study – not because of the concepts, but because of the data. Observations about environment and phenotype occur in separate data sets and the terms used are far too idiosyncratic for automated integration. Several biological domains, including conservation and phylogenetics could be advanced if these two data types could be easily merged on a large scale.

I led a recent paper, published in PeerJ, which suggests that the use of ontologies to standardize and link data about phenotypes and environments can enable scientific breakthroughs by increasing the scale and flexibility of research. This paper was a product of a workshop facilitated by the Phenotype RCN and supported by the National Science Foundation. My co-authors and I give several domain-specific use cases describing how an ontology can help advance science in four biological sciences. We then discuss the challenges to be addressed, present some proof-of-concept analyses, and discuss existing ontologies. The summary contains three suggestions for increasing interoperability between phenotype and environment data.

We hope this paper provides you with an overview of the landscape of ontologies available for integrating environmental data, and inspires you to use them in relation to your own data. For more information about ontologies and semantics, a good first read is Semantic Web for the Working Ontologist by Dean Allemang and Jim Hendler.