Microsoft Releases Details of its “Live Search Books” Project to Reuters

One area which has opened up a diverse range of opportunities for new applications of Semantic Web technologies is the book search domain. Last Monday, Microsoft provided Reuters with information on its pursuit to add 100,000 books from the British Library’s 19th century collection to its “Live Search Books” engine, approximately 25 million scanned pages. However, the initiative is far behind Google’s “Google Book Search”. Here, I will give an overview of Microsoft’s “Live Search Books” and Google’s “Book Search”, discussing aspects in which the two services differ, which subsequently leads to a further examination of a more general issue which is crucial for both companies: indexing and effectively retrieving the maximum amount of relevant information for a user’s Web search. Thus, what is at stake for Microsoft and Google, as well as other global players, is to be the owner of the leading technology for searching the world’s information. One sub-area relevant for these companies which requires search technology is book search. It is clear that Google is currently the market leader in both fields. However, the topic which will be discussed here is the efforts of other technology giants to increase market share in web search and book search, and how semantic web technology can contribute to this process.

“Google Book Search” indexes the books of 30 major world libraries; the list of its library partners includes Bavarian State Library, Columbia University, Committee on Institutional Cooperation (CIC), Harvard University, Ghent University Library, Keio University Library, Stanford University, among others. It enables the user to search the full text of books, browse books online if they have fallen out of copyright, buy books, borrow books, and consult many additional references about the books of interest to them. Therefore, it is evident that Google’s book search is aimed towards a wide-ranging audience.

In comparison, Microsoft is one year into a three year project of indexing 100,000 books dated from 1800 to 1900 from the British Library. It also intends to add collections from Yale and Cornell University to its “Live Book Search”. However, a limited repository such as this one can only be aimed towards a narrow market segment, making the size of Microsoft’s book search facility micro in comparison with Google’s. Evidently, building a book search application equivalent to that of Google’s would prove to be impossible for another company, as Google has already monopolized many of the major world libraries as its “Library Partners”. Although interestingly, Microsoft does allow users to upload the same book list which they have uploaded to “Google Book Search” to “Live Book Search”. The picture below gives an inside look into Microsoft’s project in the annals of the British Library.

2008-02-04t164555z_01_nootr_rtridsp_2_tech-microsoft-google-search-dc.jpg
The competition for indexing the world’s libraries is accompanied by an even more eminent battle: increasing query share of Web searches. That is, the percentage of users which consult a particular search engine, and consequently, the advertising they see, which generates the profits for the company – Google, Microsoft, or Yahoo!

According to Reuters, comScore, who calculate internet audience rates, estimated that Microsoft only has a 4% share of Internet searches, compared with Yahoo’s 16% and Google’s 77% (Microsoft’s intention to buy Yahoo may be part of its strategic plan to increase its share in this domain - news released by the TechCrunch blog one hour ago announced that Yahoo’s board of directors was to meet with Microsoft today to discuss a USD 44.6 billion buyout offer, according to anonymous sources). The steady increase in the volume of online information indexed by RDF may soon have an impact on the balance of power in search technologies. When considering the application of Semantic Web technologies to mass digitization of books, we are presented with a considerable number of opportunities for research projects in the context of metadata for online libraries. This is particularly relevant in light of the EU’s FP7 calls for research proposals in the ICT domain, of which one research area is digital libraries.

books_smen1.gif books_sm.gif

http://www.sciam.com/article.cfm?id=in-microsoft-vs-google-se

Tags: , , , , , , ,

Related posts