By Dr. Kathleen Dahlgren, CTO
Over the course of the several years, there has been significant discussion on how to move the Web towards the vision of Web 3.0 (the Semantic Web). At the time, it appeared that the only course of action was to develop a common tagging language (RDF and OWL) and then attempt to impose these standards of content creators on the Web. Granted, there weren’t many other alternatives to this approach at the time and it appeared that a “brute force” approach to the problem was the only way to get the process moving. That was then – this is now: RDF and OWL are insufficient to meet the needs of the Semantic Web (Web 3.0). There are much better alternatives today, many of which have only recently made themselves known (e.g. Powerset, Hakia, Cognition Technologies, etc.), which employ new semantic technologies and capabilities which render the need for tagging obsolete. Tagging is unnecessary for Natural Language Processing (NLP) systems, extremely labor intensive, it requires a broad consensus amongst users and content creators, and is unenforceable. The way the Web becomes semantic is to employ the only foundational standard necessary — the English language.
First, let’s define the terms: The commonly-known Semantic Web creates hierarchical relationships and descriptions of Web using a special markup language in XML. The initial language used for this was RDF (Resource Description Framework,) which was later augmented with a higher level language OWL (Web Ontology Language). An example of an RDF tag as applied to the description of a music CD:
<?xml version=”1.0″?>
<rdf:RDF
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:cd=”http://www.recshop.fake/cd#”>
<rdf:Description
rdf:about=”http://www.recshop.fake/cd/Empire Burlesque”>
<cd:artist>Bob Dylan</cd:artist>
<cd:country>USA</cd:country>
<cd:company>Columbia</cd:company>
<cd:price>10.90</cd:price>
<cd:year>1985</cd:year>
</rdf:Description>
<rdf:Description
rdf:about=”http://www.recshop.fake/cd/Hide your heart”>
<cd:artist>Bonnie Tyler</cd:artist>
<cd:country>UK</cd:country>
<cd:company>CBS Records</cd:company>
<cd:price>9.90</cd:price>
<cd:year>1988</cd:year>
</rdf:Description>
</rdf:RDF>
As you can see, information about the page is marked up in a formal language, indicating the artist, country, company, price etc. of the CD. Such mark-up is a form of document tagging.
The complication isn’t just in the fact that someone has to create these tags for all Web content. A more significant issue is that to find documents by these tags, users have to know what the system of tags are. To the extent that all coders of RDF or OWL can coordinate to use the same system of tags, then the tagged pages could become accessible to a wide spectrum of Internet users. A typical way of accessing tags is to offer users a menu of choices for values of the tags. Unfortunately, this creates more work and inefficiency. Also, it injects a certain amount of subjectivity into to the process because the content creator or manager is the arbiter of which tags will or will not be used. As a result, if the Search process is left simply to using pre-defined and subjective RDF or OWL tags, then the searcher may miss what he or she is searching for. Being able to Search, understand and utilize all of the text on a Web page naturally results in higher relevancy and a more satisfying Search experience.
So, rather than use a structured, subjective and incomplete dictionary of tags, how about we use an already established, agreed upon and universal “tagging system” – the English language itself. Semantic NLP uses English as the means of communications – a system of symbols, if you will, that has already been agreed upon by English speakers. No new system of symbols needs to be created, calibrated, propagated and used.
Cognition’s Semantic NLP™ offers free-text, Semantic search in the body of documents and Web pages. The advantages of searching semantically in free-text are:
1) No labor to create tags as all of the content is indexed and used;
2) Any information in a document can be searched for;
3) Users don’t need to know the exact way a concept was expressed (Alternative ways of expressing concepts are recognized as meaning the same thing and relevant to the query. For example, the pages in the RDF example above could be found with “cost” as well as “price”.);
4) Words are disambiguated within the context of how they are used, so “price” meaning “consequences of an action” would not be retrieved in response to a query about “price” meaning “monetary cost”.
For categorized (tagged) content already in place, Cognition’s Semantic NLP can be used in conjunction with tags to create a versatile free-text Search and a structured data Search. This can be seen using the Advanced Search on Cognition’s SemanticMEDLINE™ Website (www.SemanticMEDLINE.com), where users can search by author, date and journal of abstracts along with free concept Search.
With Microsoft’s acquisition of Powerset, Yahoo!’s decision to open its platform to RDF (tagging) and micro-formats, and Hakia’s ontological (categorization) vision, the Semantic Web is becoming more of a reality. Cognition’s Semantic NLP, which is built on a vast and complete Semantic Map of the English language, has done the majority of the work that RDF and OWL are intended to do: render the content on a Web page semantically retrievable.
(NOTE: You can also find this article at AltSearchEngines.com here.)