(Click image to enlarge.)
Following up on previous posts on OTMI (the proposal from NPG for scholarly publishers to syndicate their full text to drive text-mining applications), Fabien Campagne from Cornell, a long-time OTMI supporter, has created an OTMI-driven search engine (based on his Twease work). This may be the first publicly accessible OTMI-based service. It currently only contains NPG content from the OTMI archive online - some 2 years worth of Nature and four other titles. (When will we begin to see other publishers on board?)
What’s happening here? Well, Twease is a web-based front-end to searching Medline abstracts. As such, a search will retrieve a set of results labeled by PMID and list all lines in the abstract where a match occurs. By contrast, with Twease-OTMI a search is run over the article full text and a will retrieve all text “snippets” (for Nature we use sentences, although other units of text are possible) which match. See the figure above where the top three results are all labeled by the same DOI and show text matches from various points within the document.
This shows that a far superior search match rate is possible using the article full text (as distributed in OTMI format) where text integrity as publishable asset is not compromised.