Parts of the taxonomic community just don't get sustainability. I have always known this was a problem, but two events this week demonstrate just how much work there is to do in explaining why sustainability matters. Early this week I received a series of e-mails on the TDWG mailing list that said the websites for the two LSID projects on SourceForge are broken (see here and here). For the uninitiated, LSID stands for 'Life Science Identifier'. These are supposed to be the Globally Unique IDentifier (GUID) of choice for the taxonomic community. In essence this is the system of numbering (a barcode if you like) that we give biodiversity data, such that we can electronically find it again. In theory, LSID's were our community’s way of guaranteeing the sustainability (i.e. citability) of biodiversity data, and it is thus deeply ironic that the LSID project has itself proven unsustainable.
I have never been a big fan of LSID's as few within the community seem to be able to technically implement them, and even less people understand the social conventions (i.e. persistence in perpetuity) we must adopt to make them work. I have always understood this social challenge to be far greater than the technical challenge. However, my identifier of choice (URI's - specifically URL's) came in for a bit of a bashing this week, when a technical administrator from the American Museum of Natural History (AMNH) wrote to me asking to change a link on my website because the ENTIRE DOMAIN of the American Museum was about to change!!! Everything at "http://amnh.org" is going to be moved to a new host at "http://www.american-mnh.org" and the old domain is going to be "released for charitable purposes". In other words, as of June 1st 2009, all links to anything (data, papers, webpages etc) that point to amnh.org will break! [INSERTED 31 March, 09: AMNH have confirmed this was a hoax. See my comment below]
To be fair I have not taken the trouble to check this out. Indeed, when I mentioned this to a colleague, they thought it must be a joke. Unfortunately I don’t think this is. Copied below is the original message I received. If someone can confirm the veracity of this message (and what "released for charitable purposes") actually means, do let me know:
MESSAGE SENT ON MARCH 19, 2009
Dear Mr. Smith,
my name is Andy Braxton, I am Technical Director at the "American Museum of Natural History" in NY.
I am writing to you today because as of June 1st, 2009 our website at http://www.amnh.org will no longer be accessible at this domain-name. Instead, we have moved our website to a new host, namely: http://www.american-mnh.org.
I therefore ask you to change your links to our website on the following subpage of yours:
Your Subpage: http://www.vsmith.info
Currently links to: http://www.amnh.org
Please change to: http://www.american-mnh.org
Please note that as of June 1st, 2009 all links pointing to amnh.org will be invalid. Our old domain-name has been released for charitable purposes and it will thus cease to show our content.
Thank you for your cooperation!
Sincerely Yours,
Andy Braxton (Technical Director)
American Museum of Natural History
Central Park West at 79th Street
New York, NY 10024-5192
USA
Mail: abraxton@american-mnh.org
Web: http://www.american-mnh.org
Comments
AMNH Domain Address NOT changing
Re: Sustainability matters in informatics
Also, on your statement regarding LSIDs "few within the community seem to be able to technically implement them": if a given project/community wants the various features offered by LSIDs, then it is always going to be a non-trivial thing to implement + maintain such a system, LSID-based or not. If you use URLs (as the W3C wants you to) and some hodge-podge mix of approaches to handle metadata etc., I predict that would be no easier to do than implementing an LSID-based system, except for the fact that the LSID-protocol specifies the various desired features explicitly (it's what it was designed for!).
On the other hand, if one *just* want a permanent link to your stuff and are not interested in the rest of the feature set, then LSIDs are probably overkill and using PURLs would be a simpler solution, sure. Use the right tool for job, as the saying goes :)
Lastly, one main reason for LSIDs not being endorsed by the W3C is that they don't 'do something' when you put them in a browser URL-box, and (more importantly) can't be resolved by e.g. Semantic Web reasoners who can only follow http-links. But this shortcoming could be largely worked around by using up a resolver(s) service for LSIDs and use HTTP URLs (http://lsid.resolver-thingy.org/urn:lsid:...), and then embed these URLs in say RDF-XML data.
Simon: whatever the reason for it, the fact remains that the Sourceforge site is down and has been for quite some time, and it's just not encouraging when one is promoting the use of LSIDs to e.g. PIs and the main site for the project (top on the list of Google results!) is dead.
Re: Re: Sustainability matters in informatics
LSID infrastructure was never down
- http://lsid.tdwg.org/
- http://lsid.tdwg.org/summary/urn:lsid:ubio.org:namebank:11815
- http://lsid.tdwg.org/summary/urn:lsid:ubio.org:classificationbank:1164063
- http://lsid.tdwg.org/summary/urn:lsid:indexfungorum.org:names:213649
- http://lsid.tdwg.org/summary/urn:lsid:gdb.org:GenomicSegment:GDB132938
- http://lsid.tdwg.org/summary/urn:lsid:ipni.org:names:30000959-2
The actual LSID infrastructure, including the TDWG LSID resolver and the LSID authorities listed there have always been operational, as you can see from the links above. The open source community website that supports software developers involved in implementing LSID clients and resolvers also was never off-line either. See the following links:- http://sourceforge.net/projects/lsids (open source collaboration site)
- http://sourceforge.net/project/showfiles.php?group_id=198923 (downloads)
- http://lsids.svn.sourceforge.net/viewvc/lsids/ (version control - subversion)
- http://sourceforge.net/mailarchive/forum.php?forum_name=lsid-developer (LSID developer forum)
What has been down is the website of the LSID project website that presents the technology, with a few pages of information and links to various resources. That site has been down because Source Forge changed their website provider configuration (with previous notice) that broke our setup, and we were not able to restore it. But we are working on it. So, besides loosing a bit of the documentation and information about LSIDs, the underlying glue that keeps these links up and running kept working without much human intervention. It's is indeed unfortunate that we were not able to restore the LSID website, but never for a moment the LSID infrastructure stopped working. So I suppose that your argument is moot. I hope this clarifies the matter a bit. Regards, Ricardo PereiraVolunteer Systems Administrator
Biodiversity Information Standards - TDWG
Re LSID infrastructure was never down
Re Re LSID infrastructure was never down
What worries me about NOT adopting a new identifier system as we move into the Semantic Web is that we start to hack and kludge our way to full functionality by adding novel behaiours on top of URLs, or start putting the "intelligence" of where to find data/metadata into redirects, purl URLs, or other nasty, centralized, and IMO unsustainable architectures.
See also more on this debate (aka 'URLs vs LSIDs wars') going back 2-3 years on the W3C SemWeb mailing list:
http://lists.w3.org/Archives/Public/public-semweb-lifesci/
Re Re LSID infrastructure was never down
Ultimately this comes down to a cost vs. benefits discussion, and [in my opinion] the bottom line is that the benefits are not enough to justify the costs when it comes to LSID's. In part this is because our solution has to transcend the life sciences, and there is little sign of any buy in from other communities.
Thanks for the link to Mark Wilkinson's blog post on this subject. Mark's statement "The Browser is going the way of the Dodo!" is emblematic of the gulf between developers and users. For developers, browsers just get in the way of machine-to-machine interactions. In contrast users are now more reliant on web browsers than ever before. The fact that LSID's cannot be resolved in browsers without additional software IS a major problem for most users, whose knowledge of IT does not transcend a Web browser. It’s much less of an issue for a developer.
Vince. I would contend the
However, I should say that I am myself slowly changing opinion on the GUID front away from favoring LSIDs. Lately I'm thinking that DOIs are perhaps the GUID technology/infrastructure to turn to, in particular if certain changes are made to the DOI-registration pricing scheme (far cheaper bulk price per DOI, e.g. for mass-tagging say 0.5M elements). For certain things in my domain of interest (cataloging results from genome-wide scans for disease-associated variants), assigning DOIs to whole datasets is almost a no-brainer, and possibly relatively easy to implement by extending initiatives such as this one:
Publication and Citation of Scientific Primary Data" (STD-DOI) is a project funded by the German Science Foundation. Its aim is to make primary scientific data citeable as publications...
You're joking...
A project's website going down for a period of time doesn't mean that the project itself has died http://lists.tdwg.org/pipermail/tdwg-tag/2009-March/000386.html.
"Persistence in perpetuity" is a problem for LSIDs, and ALL other GUIDs.
As for saying you prefer URIs over LSIDs, that's just silly, LSIDs are URIs.
Re. You're joking...
Oh, you must be kidding
Reply