The Cognitive Computing Era: Commonsense Knowledge
- Published: September 1, 2015
- Written by Peter Fingar
Based on excerpts from the new book Cognitive Computing: A Brief Guide for Game Changers
In artificial intelligence research, commonsense knowledge is the collection of facts and information that an ordinary person is expected to know. The commonsense knowledge problem is the ongoing project in the field of knowledge representation (a sub-field of artificial intelligence) to create a commonsense knowledge base: a database containing all the general knowledge that most people possess, represented in a way that it is available to artificial intelligence programs that use natural language or make inferences about the ordinary world. Such a database is a type of ontology of which the most general are called upper ontologies.
The problem is considered to be among the hardest in all of AI research because the breadth and detail of commonsense knowledge is enormous. Any task that requires commonsense knowledge is considered AI-complete: to be done as well as a human being does it, it requires the machine to appear as intelligent as a human being. These tasks include machine translation, object recognition, text mining and many others. To do these tasks perfectly, the machine simply has to know what the text is talking about or what objects it may be looking at, and this is impossible in general, unless the machine is familiar with all the same concepts that an ordinary person is familiar with.
The goal of the semantic technology company, Cycorp, with its roots in the Microelectronics and Computer Technology Corporation (MCC), a research and development consortia, is to codify general human knowledge and common sense so that computers might make use of it. Cycorp charged itself with figuring out the tens of millions of pieces of data we rely on as humans — the knowledge that helps us understand the world — and to represent them in a formal way that machines can use to reason. The company’s been working continuously since 1984. Cycorp’s product, Cyc, isn’t “programmed” in the conventional sense. It’s much more accurate to say it’s being “taught.” In an interview with Business Insider, Doug Lenat, President and CEO, said that, “most people think of computer programs as ‘procedural, a flowchart,’ but building Cyc is much more like educating a child. We’re using a consistent language to build a model of the world.”
This means Cyc can see “the white space rather than the black space in what everyone reads and writes to each other.” An author might explicitly choose certain words and sentences as he’s writing, but in between the sentences are all sorts of things you expect the reader to infer; Cyc aims to make these inferences.
Consider the sentence, “John Smith robbed First National Bank and was sentenced to 30 years in prison.” It leaves out the details surrounding his being caught, arrested, put on trial, and found guilty. A human would never actually go through all that detail because it’s alternately boring, confusing, or insulting. You can safely assume other people know what you’re talking about. It’s like pronoun use (he, she, it) one assumes people can figure out the referent. This stuff is very hard for computers to understand and get right, but Cyc does both.
Natural-language understanding will also require computers to grasp what we humans think of as common-sense meaning. For that, Google’s Ray Kurzweil’s AI team taps into the Knowledge Graph, Google’s catalogue of some 700 million topics, locations, people, and more, plus billions of relationships among them. It was introduced as a way to provide searchers with answers to their queries, not just links.