sea otter

Keystone Predators and Centrality: Ecosystem as Social Network Part 2

My last post looked at a very small, but well studied rocky intertidal ecosystem and was able to identify a keystone predator (Pisaster) in a network using centrality measures. I was worried, though, that this method would not work on a larger, more complicated system. Let’s try these same calculations on a slightly larger kelp forest ecosystem. These systems are commonly found on the west coast of continents and are characterized by the presence of large kelps. The sea otter, Enhydra lutris, is an important predator of herbivores (e.g., sea urchins) that eat macroalgae. In the absence of sea otters, sea urchin populations explode and overgraze the kelp. Will centrality measures be able to identify the sea otter as a keystone predator? In this network, I had trophic interactions, competition interactions, and new “habitatFor” interactions that described a relationship between two taxa wherein one provided habitat for the other. My initial list contained 69 interactions and the centrality measures were all pointing to the kelp as the keystone species. This is likely because the kelp provided habitat for nearly every species in the kelp forest.

This raises an interesting question regarding our definition of keystone species. Without the kelp there is no kelp forest and that’s why the centrality measures pointed to the kelp, but the sea otter is thought of as the keystone species is this system. An important part of the definition of a keystone species is its relative abundance. Keystone species are supposed to have a disproportionate effect on the ecosystem relative to its abundance. The kelp have a very large effect, but are also very abundant. The sea otters have a large effect and are nowhere near as abundant as the kelp. That is what makes the sea otter the keystone species and not the kelp. I can’t help but think that otters being cute and cuddly while kelp are cold and slimy has something to do with it.

An algorithm that identifies kelp as a keystone species of a kelp forest is not very helpful. The kelp are more of a foundation species. How can we identify the sea otter as a keystone species even though the kelp are far more influential? One strategy that is most direct is to include the relative biomass of each taxon, but this is often not known and not included in databases of networks and interactions. I am going to try and find a way to make the network calculations work, but the results of the various network measures are not very helpful (most point to the kelp as the most important) except for closeness vitality, which is highest for the sea otter. When I do the calculations on a network made up of only the trophic interactions, as I did with the rocky intertidal system, the sea otter comes out on top in all the centrality measures. This supports the importance of dividing the network by interaction type before analysis.

Two additional issues come to mind:

  • How can I compare centrality measures across networks with different numbers of nodes, edges, and different degrees of connectivity?
  • How does the size and granularity of a network affect the results of the connectivity calculations?

The first issue is relatively straightforward. The calculation results can be normalized against the highest value; thus, the highest result for each network is always 1. When I do this normlization, the values for Pisaster and the sea otter are both 1 and thus comparable.

To explore the second issue, I played a few games with the interactions in the kelp forest ecosystem. In the original list of 69 interactions, I have some that are a bit repetitive:

  • Enhydra lutris, eats, Strongylocentrotus franciscanus
  • Enhydra lutris, eats, Strongylocentrotus purpuratus
  • Enhydra lutris, eats, Strongylocentrotus droebachiensis

Strongylocentrotus is a genus of sea urchin. Each species of sea urchin is listed as eating the same five species of macroalgae. So, the network has three nodes (the three Strongylocentrotus nodes) with identical edges. What happens to the results if I collapse these three identical species nodes into one genus node? The answer is not much. The kelp still has the highest connectivity in the network containing all the interactions and the sea otter still has the highest connectivity in the network with only trophic interactions. In the end, I collapsed the urchins into one genus, but the macroalgae was grouped by annual kelp and perennial kelp. Clearly, I need to develop some guidelines for lumping nodes consistently. Considering the high degree of taxonomic change in some groups, having genus- or family-specific nodes may be more desirable than species-specific nodes. In some cases a node defined by function instead of taxonomy may be better.

The data files for this work can be found in the github repo.

The sea otter image is CC-BY-NC from Biopix.

Trickle Down Attribution

Last week I was in Portland, Oregon attending the annual meeting of Force11, a community interested in the future of research communications. There were many great speakers and panel discussions, but what interested me the most was the unveiling of OpenVIVO. Anyone with an ORCiD can “claim” their OpenVIVO profile. I logged in using my ORCiD and my research output was instantly imported into my OpenVIVO profile. As new works were added, I was asked to claim my role in creating them. These roles went far beyond traditional authorship. I could get specific credit for data curation, graphic design, being the equipment technician, and many other roles by clicking on check boxes. All of these roles were part of the VIVO-ISF ontology that helps standardize contribution types across institutions and disciplines.

I have an OpenVIVO profile that lists publications and data sets, but my profile information doesn’t stop there. Each publication and data set has an Altmetric badge. Here is an example from one of my more widely tweeted works. The badge is the “rainbow donut” in the upper right. Clicking on the donut will take you to a summary page at Altmetric that gives more information about how people have been interacting with my publication. Altmetric creates these colored donuts using data from 15 different “sources of attention”. The number in the middle of the donut is automatically calculated as a weighted count of all the attention the research product has received. It is hard to know the true meaning of these metrics, but I’m interpreting them as a measure of immediate interest. Time may prove otherwise, but I consider research products that received more attention to be more interesting to the community. I can get this information for publications and data sets, but what about individual data points?

Part of the work that I do for the Encyclopedia of Life involves EOL TraitBank, a semantic database for species traits. The TraitBank data model separates individual data points so that data sets can be pulled apart and reassembled to respond to user queries. The data in TraitBank comes from many different providers. Every datum is labeled with attribution that can include a Creator, Publisher, Contributor, and a bibliographic reference. TraitBank users are asked to cite the original data provider, so that credit can be assigned to the data source. When I saw the Altmetric badge in OpenVIVO, my first thought was to apply these metrics to TraitBank data sets and add EOL as a source of attention. The data providers would have additional information about how their data are being used and EOL would have a better measure of how much value (in the form of increased attention) they were providing.

As it turns out, citation and attribution gets tricky when parts of thousands of data sets are recombined and analyzed to create a new data product. Most authors would rather cite one TraitBank download as the source of all their data instead of citing the hundreds or thousands of smaller data sets that make up the new data set. This is understandable in a printed manuscript, but it should be less of an issue in the digital age. In theory, an Altmetric donut can be applied to individual data points, data sets, and combinations of data sets. The citation of a published meta analysis that uses millions of data points from thousands of data sets should trickle all the way down to attribute the study that produced the original data. Important data sets (or even data points) could be identified and the provenance of meta analyses could be improved.

The “trickle down attribution” problem has existed almost as long as there have been scientific publications. Chains of citation are too easily broken or lost for articles and are much harder to track for individual data points. Recreating the chains using References Cited sections of published articles would likely result in misapplied attribution. Going into the future we can keep better track of use, but a major impediment is the lack of unique and persistent identifiers for data points and data sets. Assuming we had these identifiers in place, a standard for describing a data set and its constituent parts could provide the infrastructure needed to make “trickle down attribution” a reality.