iSpecies

Syndicate content
Updated: 16 weeks 5 days ago

Offline

Tue, 2008-06-10 22:32
iSpecies was off-line for a few hours today. I moved it from a local folder in my user folder to the /Library/Server folder on the web server, and associated ispecies.org with it's own IP address (although it is still served from the same machine). Glasgow University's DNS seems takes a while to update, so consequently the site appeared to be broken. A quick external check using Network-Tools.com confirmed that ispecies.org had the new IP address, but locally it was still resolving to the holding page of 123-reg, with whom I registered the domain. By fussing with the VirtualHost directive in the Apache httpd.conf file, I managed to get it working again.
NameVirtualHost 130.209.46.63
<VirtualHost 130.209.46.63>
DocumentRoot "/Library/WebServer/ispecies"
ServerName ispecies.org
ServerSignature email
DirectoryIndex index.php index.html index.htm index.shtml
LogLevel warn
HostNameLookups off
<Directory "/Library/WebServer/ispecies">
allow from all
Options +Indexes
</Directory>
</VirtualHost>
The only difference users may notice is that the URLs will now always start with http://ispecies.org.
Categories: , Cybertaxonomy

Building the encyclopedia of life

Thu, 2008-05-01 06:22

iSpecies is very limited in the sources it uses, and also in what it extracts from its sources. The sources it does query contain a wealth of information. As an example, GenBank sequence AF131710 from Ligophorus mugilinus has the following information about this animal:


FEATURES Location/Qualifiers
source 1..374
/organism="Ligophorus mugilinus"
/mol_type="genomic DNA"
/specific_host="Mugil cephalus"
/db_xref="taxon:92200"
/country="France"


Note the tags "/specific_host" and "/country". By parsing this record we learn that this organism is found in France, and is hosted by Mugil cephalus.

In the same way, the Google Scholar results could be more effectively used. In many cases we could follow the links to get abstracts of articles, then use literature data mining techniques (e.g., Hirschman et al.) to extract information on the organism's ecology, etc.

Extracting this sort of information would be an one way to automate the construction of an encyclopedia of life.
Categories: , Cybertaxonomy

Wikipedia on iSpecies

Tue, 2008-03-25 16:23

I've added snippets from Wikipedia to iSpecies results, in part inspired by FreeBase. This makes use of the XML export format . For example, the URL http://en.wikipedia.org / wiki / Special:Export / Luzon_Montane_Forest_Mouse returns XML, with the wiki markup enclosed in the tags <text xml:space="preserve"></text> I use some simple regular expressions to strip some of the markup out, including the taxobox, then I grab the first 100 words of the article to display on the iSpecies page (together with a link to the original article).

Because a species may have multiple names, we need to handle redirection. For example, the URL http://en.wikipedia.org / wiki / Special:Export / Apomys_datae returns
<text xml:space="preserve">#Redirect [[Luzon Montane Forest Mouse]]</text>
which tells us that the content is to be found at http://en.wikipedia.org / wiki / Special:Export / Luzon_Montane_Forest_Mouse.

There's still some polishing to do, but the Wikipedia snippets add something to the iSpecies results.
Categories: , Cybertaxonomy