EOL Meeting and Report

Mid-April I attended the first review of the Encyclopedia of Life Biodiversity Informatics Group (BIG). This is the team based at the Marine Biological Laboratory, Woods Hole that is charged with delivering the web component of the Encyclopedia of Life project. As a member of EOL’s Informatics Advisory Group (IAG) we were present to take stock of the bioinformatics component one year in to this ten-year project. The review culminated in report that highlights priorities for BIG in the coming months. Rod Page sets out the tone of this review inEOL’s official blog, and I hope that the full document will be publicly available soon. In the mean time here are a few personal impressions based on the major themes from the meeting:

  • The meeting was larger and a little more managed than I expected. In addition to the advisory group (all 8 of us), the group consisted of the BIG team (about 15 people), representatives from EOL’s steering committee, the Biodiversity Heritage Library, BioSynC (Field Museum), EOL’s outreach group, and a few other observers (notably Google, and ITIS). My initial concern that such a large group might feel inhibited speaking out proved groundless – the discussion was more spirited than I thought it might be.
  • The BIG consists of a team of really talented people who are breathing new life in to a field that desperately needs their help. This gives me confidence that EOL can deliver on their ambitious goal of a web site for all 1.8 million species. However, they are working within the confines of a remit set by the steering committee that meet infrequently and do not necessarily grok the potential of the web as a mechanism for generating knowledge, or the way scholarly communication is changing.
  • The traditional approach that EOL have taken to writing this encyclopedia does not scale. The initial 24 exemplar pages contain an average of 50 data elements, each of which required the permission of the rights holder before use. The labor intensive process of getting this permission means that at the current rate EOL would write about 100 pages per year, and not the 1,000 pages per working day that they need to deliver all 1.8 million species pages in the remaining 9 years. Notwithstanding other issues (like why or who will write these pages) EOL needs to take a different approach to generating this Encyclopedia if it is to achieve its goal, let alone realize its true potential.
  • Quality and vetting. EOL will always display some information that is incorrect. It already is (many GBIF maps contain errors) and no project of this scale could expect anything less. The trick is to find automated ways of prefiltering erroneous data before it appears on the site, and innovative ways that allow users to flag (and prehaps correct) spurious data post publication. BIG clearly understands this, but I’m not sure it is understood by all those involved with the project. EOL will always be in a state of permanent beta, but this notion runs counter to those traditionally involved in the process of generating and publishing science.
  • Attribution. The people contributing to EOL will need to be credited for their efforts, but the traditional ways of doing this do not scale for EOL. Pages will always be multi-authored, and in many cases the shear number of contributors would obscure the content if their names and logos were to appear on pages that have contributed to. While there is some consensus on how to tackle this problem, I think there is a long way to go before this problem is resolved. Without taking an innovative (some might say courageous) approach, this problem could cripple the development of EOL.
  • Globally Unique Identifiers (GUIDs). Unique identifiers are a really boring subject, but in principle thay are the solution to many of EOL’s problems, especially issues surrounding credit, citation and integration. The issue of identifier persistence is essentially a social problem. Identifiers can be persistent if EOL agrees to keep them so, and has the financial resources to maintain them. EOL can outsource the problem (for example it might use DOI’s) and this has advantages for some EOL objects, but ultimately the cost will be prohibitive for this approach to work for every object within EOL.

After the meeting I spent a productive couple of days working with David Shorthouse. David is tackling some of the problems associated with assigning GUIDs to EOL content, and is also leading the developing of EOL’s participatory component. For this David is taking a similar approach to the one I have taken with the Scratchpad project hosted at the Natural History Museum, and while I was there we talked about possible synergies between EOL and the Scratchpads. As with my last trip to Woods Hole back in August 07, I left with mixed feelings. EOL has an extraordinary potential, but it is going to take time to realize this.