Cognition’s August Newsletter…”Inside the Mind”
Wednesday, August 6th, 2008It’s here! It’s hot off the press. Check out the latest (and first) issue of “Inside the Mind”.
It’s here! It’s hot off the press. Check out the latest (and first) issue of “Inside the Mind”.
By Dr. Kathleen Dahlgren, CTO
Over the course of the several years, there has been significant discussion on how to move the Web towards the vision of Web 3.0 (the Semantic Web). At the time, it appeared that the only course of action was to develop a common tagging language (RDF and OWL) and then attempt to impose these standards of content creators on the Web. Granted, there weren’t many other alternatives to this approach at the time and it appeared that a “brute force” approach to the problem was the only way to get the process moving. That was then – this is now: RDF and OWL are insufficient to meet the needs of the Semantic Web (Web 3.0). There are much better alternatives today, many of which have only recently made themselves known (e.g. Powerset, Hakia, Cognition Technologies, etc.), which employ new semantic technologies and capabilities which render the need for tagging obsolete. Tagging is unnecessary for Natural Language Processing (NLP) systems, extremely labor intensive, it requires a broad consensus amongst users and content creators, and is unenforceable. The way the Web becomes semantic is to employ the only foundational standard necessary — the English language.
First, let’s define the terms: The commonly-known Semantic Web creates hierarchical relationships and descriptions of Web using a special markup language in XML. The initial language used for this was RDF (Resource Description Framework,) which was later augmented with a higher level language OWL (Web Ontology Language). An example of an RDF tag as applied to the description of a music CD:
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cd="http://www.recshop.fake/cd#">
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Empire Burlesque">
<cd:artist>Bob Dylan</cd:artist>
<cd:country>USA</cd:country>
<cd:company>Columbia</cd:company>
<cd:price>10.90</cd:price>
<cd:year>1985</cd:year>
</rdf:Description>
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Hide your heart">
<cd:artist>Bonnie Tyler</cd:artist>
<cd:country>UK</cd:country>
<cd:company>CBS Records</cd:company>
<cd:price>9.90</cd:price>
<cd:year>1988</cd:year>
</rdf:Description>
</rdf:RDF>
As you can see, information about the page is marked up in a formal language, indicating the artist, country, company, price etc. of the CD. Such mark-up is a form of document tagging.
The complication isn’t just in the fact that someone has to create these tags for all Web content. A more significant issue is that to find documents by these tags, users have to know what the system of tags are. To the extent that all coders of RDF or OWL can coordinate to use the same system of tags, then the tagged pages could become accessible to a wide spectrum of Internet users. A typical way of accessing tags is to offer users a menu of choices for values of the tags. Unfortunately, this creates more work and inefficiency. Also, it injects a certain amount of subjectivity into to the process because the content creator or manager is the arbiter of which tags will or will not be used. As a result, if the Search process is left simply to using pre-defined and subjective RDF or OWL tags, then the searcher may miss what he or she is searching for. Being able to Search, understand and utilize all of the text on a Web page naturally results in higher relevancy and a more satisfying Search experience.
So, rather than use a structured, subjective and incomplete dictionary of tags, how about we use an already established, agreed upon and universal “tagging system” – the English language itself. Semantic NLP uses English as the means of communications – a system of symbols, if you will, that has already been agreed upon by English speakers. No new system of symbols needs to be created, calibrated, propagated and used.
Cognition’s Semantic NLP™ offers free-text, Semantic search in the body of documents and Web pages. The advantages of searching semantically in free-text are:
1) No labor to create tags as all of the content is indexed and used;
2) Any information in a document can be searched for;
3) Users don’t need to know the exact way a concept was expressed (Alternative ways of expressing concepts are recognized as meaning the same thing and relevant to the query. For example, the pages in the RDF example above could be found with “cost” as well as “price”.);
4) Words are disambiguated within the context of how they are used, so “price” meaning “consequences of an action” would not be retrieved in response to a query about “price” meaning “monetary cost”.
For categorized (tagged) content already in place, Cognition’s Semantic NLP can be used in conjunction with tags to create a versatile free-text Search and a structured data Search. This can be seen using the Advanced Search on Cognition’s SemanticMEDLINE™ Website (www.SemanticMEDLINE.com), where users can search by author, date and journal of abstracts along with free concept Search.
With Microsoft’s acquisition of Powerset, Yahoo!’s decision to open its platform to RDF (tagging) and micro-formats, and Hakia’s ontological (categorization) vision, the Semantic Web is becoming more of a reality. Cognition’s Semantic NLP, which is built on a vast and complete Semantic Map of the English language, has done the majority of the work that RDF and OWL are intended to do: render the content on a Web page semantically retrievable.
(NOTE: You can also find this article at AltSearchEngines.com here.)
With traditional keyword search engines, such as those used by Google, Yahoo!, and others, finding the best medical research document within complex data sets, such as MEDLINE, is very difficult without the use of complex Boolean equations and a deep understanding of the many permutations of technical synonymy. Cognition’s SemanticMEDLINE has the ability to target and locate these types of data that are otherwise hidden in masses of information because of its comprehensive Semantic Map (particularly deep within the health sciences discipline) and its unique ability to “understand” the meaning behind words, phrases, and idioms.
Read the full article here.
Cognition Technologies is talked about in Merrill’s latest press release about their product:
“To improve early identification and review of key evidence, version 5.6 also features conceptual search tools, offered via a licensing agreement with Cognition Technologies, Inc. Unlike keyword searches, conceptual searching interprets the meaning of words within the context they are used to help computers “understand” concepts, significantly increasing the precision of the discovery results. “Cognition’s Semantic NLP program is revolutionary in the methodology used to conduct complex searching through algorithms. The semantic natural language search component allows our clients to conduct more meaningful content searches and uncover relevant evidence more quickly.” said Harold Leach, senior vice president of IT for Merrill’s Electronic Discovery Services.”
Read full release here.
LOS ANGELES – July 23, 2008 —Cognition Technologies, a next-generation Semantic Natural Language Processing (NLP) company, announces a quantum improvement in the application of NLP technology with the introduction of Semantic MEDLINE™ – the 18 million article abstract database of complex health information published by the National Library of Medicine. This new free service at www.SemanticMEDLINE.com enables complex health and life science material to be rapidly and efficiently discovered with greater precision and completeness. This marks the first time that users can employ a natural, conversational sentence structure to find the most complex studies within the MEDLINE dataset.
Charles Knight has the story:
Cognition has completed an institutional and individual financing round totaling $2.7 million. Investors include Draper Associates Tim Draper), Fingerhut Ventures, and a personal investment by the company’s CEO, Scott Jarus.
Nitin Karandikar of the Software Abstractions Blog had this to say about Cognition:
May 7, 2008
Cognition Technologies, which focuses on Semantic natural language processing technology, was named by KMWorld as one of the top 100 Companies That Matter in Knowledge Management for 2008.
Says Cognition CEO Scott Jarus:
One of the biggest barriers to building a natural language understanding system is to build the semantic map and the dictionary with details of the syntactic behavior of words (i.e. how words behave within context). Cognition’s team has spent more than 20 years building this capability into Cognition’s Semantic NLP for the English language … and our technology is commercially available today!
Semantic search and NLP technologies seem to have arrived – they are generating a lot of buzz lately. In addition to mainstays Hakia and Powerset, there is a spate of new entries, including Cognition, BooRah and eeggi. We will be reviewing some of these new alternate search engines on this blog in the near future.
Congratulations, Scott and the Cognition team!
Hi! Welcome to the CogBlog! This is the home of Cognition Technologies’ employees and their musings. Our founder and CTO, Dr. Kathy Dahlgren has been building out Cognition’s Semantic NLP™ for the past 23 years, so she will be chiming in regularly. Plus, we have some other amazing linguists and computer scientists here too. Nearly 25% of our current team has a PhD. Okay, it’s a small team, but still!
Since there are so many linguists here, we thought we’d go straight to the definition of what we mean by Cognition’s Semantic NLP.