Increasing Relevancy and Trust on the Web: Powder Adds Steps in the Right Direction

Constant progress is being made to disambiguate the information mine on the Web by using semantic web metadata to mark up web content, using RDF, OWL, and HTTP, but with additional features which add more descriptive metadata to resources. The POWDER (Protocol for Web Description Resources) W3C Working Group is behind the initiative, and major Web players are starting to notice. A number of developments were published on the 15th of August 2008, including Formal Semantics, Grouping of Resources, Description Resources, Primer, and Test Suite.  The work of POWDER contributes to one overarching objective: improving both the developer’s and the user’s experience of the Web, from a number of perspectives. These can be stated in terms of three related concepts: making the Web more personal, relevant and trustworthy. Given the fundamental role of these characteristics in search (engines), this should be precisely the reason why global search companies are displaying interest. For example, Powder’s most recent Outreach day, on the 22 September, was sponsored by Yahoo! 

Powder enhances Web content by increasing its relevance, as each piece of content is marked up, specifying what it is in terms of RDF. So, this is already known by most Semantic Web developers. What sets POWDER apart is the inclusion of a number of other features, such as classifying single items within content groups, Data Retrieval Efficiency, allowing the user to decide whether a resource (such as a PDF) is relevant for their requirements before they download it, based on the description, profile matching, trustmarks, and semantic annotation. Profile matching enables the user to judge whether the resource which will be retrieved is actually what they are looking for, in terms of their preferences, for example, is it mobile compatible?, does it have child protection? Trustmarks allow partners in the web to verify the claims made by the descriptions and act as certification authorities, and the final feature is easy semantic annotation, effectively, putting the pow(d)er in the user’s hands.

 http://www.w3.org/2007/powder/

Tags: , , , , , , , ,

Related posts

The Semantic Web: Its Status in the IT Industry

David Provost, a business development professional and a strategist, published a report on the 30th of September which outlines a global review of the current position of the Semantic Web in the Information Technology industry, providing a commercial slant on the advancement of the Semantic Web in IT companies. Initially, some key concepts of the Semantic Web are summarized, providing both less-technical entrepreneurs and those on the development side with a general overview of the Semantic Web. The executive summary of the article mentions some key emerging trends: linked data, Social Network Analysis and the Semantic Web’s role in this scenario, and the role played by Natural Language Processing (NLP), Semantic search and the traditional publishing industry. However, most significantly, this summary captures an evident trend which has recent become more and more notable in the industry: the transition from technical descriptions of technology, to an emphasis on problems, solutions and applications.

Given that new companies which rely on Semantic Web technology are continuously emerging, it appears that the moment has arrived for these IT businesses to differentiate themselves in the market. This report provides a step in this direction, by providing a summary of what these leading companies, or vendors, are comprised of and what they offer. The information is based on phone or face-to-face personal interviews, with representatives of 17 companies, analysed for credibility with regard to being a “going concern”, in other words, an active companies with substantial cash flow (according to the author; those companies who have used spare cash to sponsor a conference, however, this is not a definitive indicator of the financial status of an organisation). Ten companies were based in North America, six in Europe, and one in South Korea.

These organizations address diverse topics within the Semantic Web, such as: databases for storing semantic data, ontology engineering and management tools, application development platforms, and NLP. One important transformation is that previously, companies just sold software. Now, the transition to “do” solutions rather than “make” solutions is evident, with vendors selling ready-made solutions with less emphasis on the development side. For example, solutions in areas such as: knowledge management, risk management, and content management, among others. The author redefines this trend as a change in market strategy: companies have widened their market, targeting companies who previously could not afford to employ complex development skills to adopt a solution to their needs. Now these companies are being offered solutions which already suit their internal requirements, without having to make substantial modifications.

A diagram is provided, which lists the following 17 companies as having taken part in the review: Aduna, The Calais Initiative, Cambridge Semantics, Dow Jones Client Solutions, Expert System, Franz, Mondeca, Ontoprise, Ontos, OpenLink Software, Primal Fusion, Saltlux, Sindice, Thetus, TopQuadrant, Twine/Radar Networks, and Yahoo!/Searchmonkey. The diagram marks each company schematically according to the functions they possess, defined as the following: Solution, Middleware, NLP, Database, Platform, Ontology, Search, Consumer Web Service, and Developer Web Service.  Solution and Platform exhibit the highest frequency; Developer Web Service and Consumer Web Service the least.    Subsequently, the findings of the report are published, in the form of four key trends: Semantic Web technology has emerged to be fierce competition for companies providing solutions using traditional technology, NLP has transpired to be a fundamental tool for mining content for the Web, the value and use of Linked Data is slowly gaining recognition, and the role of marketing, technical and solution partners is becoming essential in selling Semantic Web based solutions.

The principal findings are followed by a profile of each of the organizations interviewed, defined in terms of their products, employees, revenue, Installed base, primary offering, key differentiators, six/twelve month plan, and a final analysis. Given that the article has been written from a strategist’s perspective, it should be viewed as such, as it focuses on the aspects for the marketing of companies, rather than any shortcomings. It is not an entirely objective review, and some companies, such as Talis, are missing, but it is nevertheless an interesting read and a good overview.

http://pdfmenot.com/view/http://www.davidprovost.com/Resources/Semantic%20Web%20Industry%20Revie.pdf

Tags: , , , , , , , , , , , , , , , , ,

Related posts

Band Metrics: Semantic Web Technology for Gathering and Analysing Opinion Trends of Musicians and Bands

Yesterday a new take on the use of Semantic Web Technology in the music industry emerged: Band Metrics announced the launch of their private beta on their blog, which helps musicians and bands manage their digital identity, as well as monitoring bands’ popularity and analysing trends. Currently, more details regarding the technology are not available, as it is patent pending. Band Metrics is a TechCrunch50 semi-finalist and will participate in its DemoPit, where it will present the demo tomorrow.

 http://www.techcrunch50.com/2008/conference/demopit.php

http://www.bandmetrics.com/beta/#

Tags: , , , , ,

Related posts

Details of a New Semantic Search Engine and Health Knowledge Base Released by WebLib


This morning the development of a new semantic search engine and health knowledge base was announced by the company WebLib, experts in search solutions based on Natural Language Processing and Semantic Web Techniques. The solution claims to provide search features currently lacking in engines which retrieve health-related information: relevant, current, and actionable. The search will be powered by the Healthmash Health Knowledge Base, and will combine Web 2.0 technologies with Semantic Web methods to retrieve user-relevant information. However, a beta version is not yet available, it will become available at the end of 2008.

http://www.prweb.com/releases/health_knowledge_base/200808/prweb1226424.htm
http://www.weblib.com/

 

Tags: , , ,

Related posts

Sir Berners-Lee’s Insights on the Future of the Semantic Web

An important public symposium in the Semantic Web field was held last week in Rensselaer Polytechnic Institute (NY, US), as part of the launch of a new research institute for Web Science, entitled Tetherless World Research Constellation. A number of leading figures such as Nova Spivack and Nigel Shadbolt were present, among those Tim Berners-Lee. The principal focus of Berners-Lee’s keynote speech was his future vision of the Web, for which he discussed a number of distinct aspects. Berners-Lee has never faltered to advocate his future Web vision as a semantic and social one. The formal scientific analysis of such outlooks has emerged as the field of Web Science, to which the new research centre is dedicated. The characteristics of the evolving Web which Berners-Lee outlined could be divided into various areas. One of the topics, which was also one of the general themes of the day, was the construction of more intelligent data, as opposed to smarter software. This is one of the objectives of the Semantic Web, which was introduced by Berners-Lee, however, adding that the conceptual links between data have the power to be used in unpredictable and novel ways. It has not yet emerged what the outcome of such opportunities will be, but the evolution of the Web is likely to have an impact on the traditional methods of constructing social systems. This was another one of the topics elaborated by Berners-Lee and other attendees.

The Web has spawned new social systems which have initiated new possibilities for viewing science and other political systems such as democracy. It is evident that with the advent of the Social Web, that networks can influence collective thinking, ideas or movements, which may be constructive but also destructive. The effect of such systems will be witnessed in the future.

 A further topic discussed was the requirement for existing technology to be able to cope with and complement the future form of the Web. If data and the concepts contained in data are interlinked, there arises a requirement for technologies to adapt to this data through more pages, higher bandwidth, and mobile devices. The filtration of the Web into daily life is an ongoing theme, which has previously been discussed by many of the leading researchers in the Semantic Web domain, which advocate and emphasize the role of the mobile Web. For example, the chairman of the Mozilla Foundation, Mitchell Baker. 

 http://www.forbes.com/feeds/ap/2008/06/11/ap5106902.html

http://www.pcmag.com/article2/0,2817,2319807,00.asp

Tags: , , , , , , ,

Related posts

A Medley of Semantically Related News Snippets


Today witnessed a number of thought-provoking news publications from various sources, from a newspaper in Madrid to ReadWriteWeb. My first observation this morning, in a newspaper circulated in Madrid’s metro, is worth mentioning, as it demonstrates the triumphant transition made by Web 2.0 into the general public’s eye, as well as recognising the need for technologies to aid users in activities such as managing their social profiles and their blog profiles. The daily cartoon sketch shows two female students seated in a university computer lab. One asks the other: “How’s it going?”. Her friend answers: “Ugh……..I’m updating my social profile in 18 social networks, changing my website’s photos, and renewing the contents of my blog”. The response is: “And on top of all that, they expect us to study”. Although the underlying issues provoked in this cartoon have been partly resolved by initiatives such as Google Open Social, there is still significant room for improvement.

Besides the spine-chilling news for Google that Powerset unleashed its test version of a semantic search engine for Wikipedia for public use on Monday (which I will not go into here, as there are currently countless articles floating around the Web discussing what some have coined the “Google Killer”), another search engine, entitled Uptake, was launched today. Uptake, formerly known as Kango, is a travel search engine which extracts information from more than 1000 travel sites in order to construct a database of over 400,000 US hotels and activities. Uptake has built its database from consumer reviews, opinions and descriptions on these sites, and has constructed an ontology from metadata applied to the content of these sources. One of the more recent Natural Language Processing Techniques Uptake applies is Sentiment Analysis, also referred to as Opinion Mining, which uses syntactic parsing to extract words to indicate, for example, favourable sentiment towards a hotel, such as “good time”, “fantastic view” or “relaxed atmosphere”, and distinguishes positive sentiment from negative sentiment.

Today also saw the move of Jeremy Carroll, lead architect on the Open Source Jena Toolkit at HP, to TopQuadrant, a leading Semantic Web company, as Chief Product Architect.

Tags: , , , , , , ,

Related posts

Optimistic Opinions about the Future of the World Wide Web

 BBC recently interviewed ten of the leading figures of the WWW regarding their opinions about the future of the Web, in honour of the 15th anniversary of the invention of the Web. This post discusses the aspects of the interviews which referred to the Semantic Web, which when combined, contributed an interesting overview of the future of the Semantic Web.

Sir Tim Berners-Lee firstly pointed out a crucial detail concerning the objective of the BBC’s interviews – that it is incorrect to look back on fifteen years of the Web, and rather, that the last 15 years should be viewed as a starting point representing the infancy of the Web. His viewpoint implies high ambitions for the Web and the Semantic Web: he states that the current phase in the growth of the Web will be referred to in a 100 years as a time when all the world’s data was not even available instantly to a user, and the Semantic Web was not even functional. This insinuates that it is theoretically possible that in the next 100 years, these two visions will be completely fulfilled. He also makes an indirect reference to Web 2.0, stating that Web 2.0 is the conception of new systems of social behavior, peer review and regimes.

Nigel Shadbolt makes a similar observation, referring to Web 2.0, pointing out the conquering of the Web by users. He concludes that the future is the Semantic Web as an information broker for the user, assuming the role of document filtering, which was previously carried out by the user.

Professor Wendy Hall, Nigel Shadbolt’s colleague at the university of Southampton, does not make any reference to the Semantic Web, instead giving importance to only one essential characteristic – the Web will no longer be confined to traditional desktop computers, but will be accessible to all through mobile devices such as mobile phones. 

Kai-Fu Lee from Google China similarly does not mention the Semantic Web, but refers to the concept of “Cloud Computing” as a challenge, also alluding to one factor which was broached in the previous blog post; the importance of user confidence in the uses of their private data online, and the fostering of trust between users and online companies.

Dr. David Belanger, AT&T Labs’ Chief Scientist and Vice President in Information and Software Systems Research, does not make a direct reference to the Semantic Web, however, it is evident from his comments that the Semantic Web will play a central role in the future vision he describes. He states that the greatest challenge for the future Web is managing all of the new and different applications which are emerging on the Web, such as image browsers which have video and other interactive media. This scenario demands one characteristic which the Semantic Web can offer: integration of homogeneous data.

The chairman of the Mozilla Foundation, Mitchell Baker (She of the famous “Firefox” hair) propounded a similar viewpoint to the other interviewees in relation to the growth of communities on the Web, and the Web’s mobility and thus its consequential filtration into every aspect of life.

The president of the Palo Alto research center, Mark Bernstein, does not make any reference to the Semantic Web, neither directly nor indirectly, mainly mentioning that the Web represents communities. 

Robert Cailliau was one of the people responsible for the creation of the Web with Tim Berners-Lee at Cern. He reveals some important aspects for the future of the Web which until now have not been given adequate attention, such as the fact that all of the communities emerging on the Web (which was the main focus of many of the other interviewees), will fundamentally need the laws, economics and social norms required by any community. He states the important point that the Web is controlled by ontologies, therefore, who controls the construction of the ontologies, and thus, the data the user sees? Given that he states “Because it (the Web) works by ontologies…”, this gives implicit recognition to the fact that the Semantic Web already controls a significant volume of the data on the Web. 

Robert Scoble, well-known blogger and head of Fast Company TV, concludes that ultimately the Web is about communication, corroborating the viewpoint of many.

Tim O’Reilly, one of the leading figures of the Web, focused on the sensor aspect of the Web. He summarized the Web as a concept rather than a concrete technology, which in the future will not just be the Web as it is today, but a part of an interconnected network of mobile phones, sensor networks, and even power networks, comprised of independent devices. He refers to this scenario as a “global brand”, which raises interesting questions, one being: Who will own this global brand?

http://news.bbc.co.uk/1/hi/technology/7373717.stm

      

Tags: , , ,

Related posts

The Dynamics Between Web 2.0, Semantic Web, and Financial Markets

A theme which has recently been gaining increased importance for both Web 2.0 companies and financial investors alike is the interaction between Web 2.0, Social Networks, and Wall Street. The Semantic Web also has a role to play in this scenario. This post will give some insights on this topic, being based on concepts presented at the O’Reilly Money:Tech conference, which was held on 6-7 February in New York. One of the most relevant events for those working in the Semantic Web was Tim O’Reilly’s interview with the upcoming CEO of Reuters, Devin Wenig, who emphasized that the Semantic Web will play a pivotal role in Reuters’ future.

Tech

The Money:Tech conference was aimed towards a diverse audience: on the financial side, hedge fund managers, equity, financial and investment analysts, investors, managers, and entrepreneurs, and on the technological side, technologists, as well as academics. The main focus of the conference was to highlight that Web 2.0 is a useful source of data for investors, and more specifically, the tools to extract meaning from this data for money management. Focusing on this topic shows the utility of Web 2.0 and Social Networks to expose trends in financial data. The coming together of financial experts and Web 2.0 innovators and the fusion of their ideas provides insight and value creation for technology advances in the investment industry. An example of a presentation which discussed such issues was: “Main Street Research Meets Wall Street: How Social Networking is Transforming Online Investing”.

Moving from Web 2.0 and Social Networks to the Semantic Web, in his blog, Tim O’Reilly discusses some developments in how information will be delivered to the consumer, which were presented by Devin Wenig in his interview at Money:Tech. Changes in the methods of data delivery are relevant for all types of news, but particularly financial news sources, as they are volatile. A first point which Wenig highlighted was his awareness that textual information has been replaced by the formats of data desired and controlled by consumers of Web 2.0, such as video and interactive applications. Secondly, Wenig claimed that we are coming to the end of an era where the company with the least time delay in delivering news held a competitive advantage. This second point exposed a very important trend for the future of news data: that the timing of news is no longer a crucial factor, but rather the sources of the news and the information which can be derived from connections between them. In other words, the processing of the data. This is where the Semantic Web steps in. The aim is not just to mark data with semantic metadata, but to use the semantic data to derive added-value additional information from the original data for the consumer, where the consumer may be another news company, or the end consumer. Thus, the focus is on making insights from the data through semantic technology.

O’Reilly’s discussion of Devin’s point is of essential importance. Why? Because he emphasizes the purpose of adding metadata to news items for further processing. Semantic annotation itself is useless if it cannot be reused by Reuters, other news companies or financial investors. For example, it could be used to determine the connection between a news item about the fall in the share price of a particular company and a fall in oil prices.

O’Reilly agrees with changes in consumer media impacting on the structure of professional media. However, he does not see the end of exploiting opportunities for zero-delay information delivery to the consumer, and even more importantly, getting access to the correct sources of data to extract overlooked information. This provides the consumer with more relevant information. But, the process is still contingent upon a degree of human interaction with the data (the curation process). Subsequent semantic annotation of this data may then unlock previously concealed connections.

http://radar.oreilly.com/archives/2008/02/reuters_semantic_web_moneytech.html

http://en.oreilly.com/money2008/public/content/home

Tags: , , , , , , ,

Related posts

Microsoft Releases Details of its “Live Search Books” Project to Reuters

One area which has opened up a diverse range of opportunities for new applications of Semantic Web technologies is the book search domain. Last Monday, Microsoft provided Reuters with information on its pursuit to add 100,000 books from the British Library’s 19th century collection to its “Live Search Books” engine, approximately 25 million scanned pages. However, the initiative is far behind Google’s “Google Book Search”. Here, I will give an overview of Microsoft’s “Live Search Books” and Google’s “Book Search”, discussing aspects in which the two services differ, which subsequently leads to a further examination of a more general issue which is crucial for both companies: indexing and effectively retrieving the maximum amount of relevant information for a user’s Web search. Thus, what is at stake for Microsoft and Google, as well as other global players, is to be the owner of the leading technology for searching the world’s information. One sub-area relevant for these companies which requires search technology is book search. It is clear that Google is currently the market leader in both fields. However, the topic which will be discussed here is the efforts of other technology giants to increase market share in web search and book search, and how semantic web technology can contribute to this process.

“Google Book Search” indexes the books of 30 major world libraries; the list of its library partners includes Bavarian State Library, Columbia University, Committee on Institutional Cooperation (CIC), Harvard University, Ghent University Library, Keio University Library, Stanford University, among others. It enables the user to search the full text of books, browse books online if they have fallen out of copyright, buy books, borrow books, and consult many additional references about the books of interest to them. Therefore, it is evident that Google’s book search is aimed towards a wide-ranging audience.

In comparison, Microsoft is one year into a three year project of indexing 100,000 books dated from 1800 to 1900 from the British Library. It also intends to add collections from Yale and Cornell University to its “Live Book Search”. However, a limited repository such as this one can only be aimed towards a narrow market segment, making the size of Microsoft’s book search facility micro in comparison with Google’s. Evidently, building a book search application equivalent to that of Google’s would prove to be impossible for another company, as Google has already monopolized many of the major world libraries as its “Library Partners”. Although interestingly, Microsoft does allow users to upload the same book list which they have uploaded to “Google Book Search” to “Live Book Search”. The picture below gives an inside look into Microsoft’s project in the annals of the British Library.

2008-02-04t164555z_01_nootr_rtridsp_2_tech-microsoft-google-search-dc.jpg
The competition for indexing the world’s libraries is accompanied by an even more eminent battle: increasing query share of Web searches. That is, the percentage of users which consult a particular search engine, and consequently, the advertising they see, which generates the profits for the company – Google, Microsoft, or Yahoo!

According to Reuters, comScore, who calculate internet audience rates, estimated that Microsoft only has a 4% share of Internet searches, compared with Yahoo’s 16% and Google’s 77% (Microsoft’s intention to buy Yahoo may be part of its strategic plan to increase its share in this domain - news released by the TechCrunch blog one hour ago announced that Yahoo’s board of directors was to meet with Microsoft today to discuss a USD 44.6 billion buyout offer, according to anonymous sources). The steady increase in the volume of online information indexed by RDF may soon have an impact on the balance of power in search technologies. When considering the application of Semantic Web technologies to mass digitization of books, we are presented with a considerable number of opportunities for research projects in the context of metadata for online libraries. This is particularly relevant in light of the EU’s FP7 calls for research proposals in the ICT domain, of which one research area is digital libraries.

books_smen1.gif books_sm.gif

http://www.sciam.com/article.cfm?id=in-microsoft-vs-google-se

Tags: , , , , , , ,

Related posts

Opinion

It seems to me that what was once referred to as the Internet is becoming an interlinked structure connected by means of concepts.

I have been thinking about this since Tim Berners Lee published his blog post on 21 November, coining a new phrase for the Web: “Giant Global Graph” (Given that he is the inventor of the World Wide Web, posting this phrase on his blog will undoubtedly lead to him being credited as the creator of the “GGG” phrase as well). I cannot say that I disagree with Berners-Lee’s conceptualisation of the Internet as a GGG. In his blog post, he states that the Internet connected computers, the Web connected documents, and that the Graph connects content.

At its most basic, the Graph is constructed of connected concepts which can be viewed as possibly infinite. File formats, structures, and representation languages used no longer place boundaries on the transfer of information, as they once did before the advent of the Semantic Web. Concepts transcend the restrictions of formats and represent relationships.

However, one question comes to my mind. What is the meaning of the word “Graph”? The The Concise Oxford Dictionary of English Etymology (1996) defines it as:

graph XIX. orig. (chem.) short for GRAPHIC formula, in which lines are used to indicate the connections of elements; hence in math.

To me, this is relatively acceptable definition for the English word “graph”. But, the word graph is just a term for a connected structure. What about the millions of people who do not speak English? Will they also adopt the English term “Giant Global Graph”? Or, what I consider more likely, will they simply have a conceptual image of what is in English termed the “Graph” as a connected structure of concepts? And thus label this structure with the most appropriate (or already existing) term from their own language? In this case, I am referring to those people in the Semantic Web community and other interrelated communities who are aware of the emergence of the “Giant Global Graph”. The point I am making is that the “Graph” is not so much a specific term, but a conceptual structure which has been created (and is being improved) by Semantic Web technologies. It represents an almost infinite number of concepts from nature and humankind, and the relationships between the concepts. This is also what an ontology intends to capture. Which brings the description of an ontology closer to the original philosophical definition of an ontology: the science of being.

Tags: , , , ,

Related posts