Programming, life, and everything in between
IBM Watson and the Turing Test
Recently, IBM Watson defeated two Jeopardy! masters during three matches aired on public television from February 14 to 16, 2011. Since Watson beat two humans at a trivia game, Watson is more intelligent than the two humans, right? Wrong. There are a few problems with that statement. First, we haven’t even established whether Watson should even be considered intelligent. The current standard litmus test for intelligence (checking whether a computer has intelligence or not) is called the Turing Test.
Published in 1950, Alan Turing’s “Computing Machinery and Intelligence” describes this test as the Imitation Game. The original version of this game is conducted with three participants, referred to as A, B, and C. A is a man and B is a woman. C is an interviewer asking questions of A and B through text chat or via an intermediary (so that no auditory or vocal judgment can be made on the questioned). Through the interviewer’s interrogation, C must determine which of A and B is the man and which is the woman. A, the man, and B, the woman, are both trying to convince the interviewer that they themselves are in fact the woman. In other words, C must ask questions of A and B and, based on their answers alone, determine the gender of each. This version of the Imitation Game isn’t the same as what is now known as the Turing Test. The revised version of the Turing Test sets A as a computer and B as a woman. Again, A and B must answer the interviewer’s as if they were both women. However, the interviewer’s job has changed. Although Turing does not mention it explicitly, the interviewer normally knows that one of the entities answering is a computer and it is his or her job to figure out which of the questioned is human and which is the machine.
With the definition of the Turing Test in place, let’s take a look at what IBM Watson actually did. The basis format of the Jeopardy! game show is a lengthy statement is exposed to all of the contestants at the same time. The contestants must “answer” the statement in the form of two or three word question. Speaking from my own experience as a computer programmer, the reason this kind of problem was challenging for IBM is because it requires reverse mapping of information, i.e. retrieving a very specific piece of information based on a large set of somewhat ambiguous information. To clarify, Jeopardy! would be much easier for current computers (and humans, for that matter) if the game was inverted by instead giving the contestants a two to three word question and having the contestants spit back a couple of sentences about the question. For a computer, solving such problems would be as simple as a Google search.
To get an idea of how Watson did so well on Jeopardy!, I checked the official IBM website on the technology behind Watson. Watson is a codename for the underlying technology called DeepQA, which is described as “an application of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, and Machine Learning technologies to the field of open-domain question answering.” In other words, the developers of Watson built in a bunch of grammar rules to analyze sentences and gather information that is associated with each other bits of information. Taking an extremely simple example, if Watson read the phrase “apples are red,” he would know that there is a link between the word “apples” and the word “red.” After feeding Watson thousands of pages of text on varying topics, a massive database on information (arguably knowledge) was compiled. This database was queried for every Jeopardy! question during the three matches.
So can we consider Watson intelligent? No. As mentioned before, the accepted litmus for intelligence is the Turing Test. There is no way that Watson would pass the Turing Test because of a couple of problems intrinsic to the DeepQA system. Possibly the most obvious problem with Watson passing the Turing Test is that he is optimized for the game of Jeopardy!. His input must be in the form of a statement and he is taught to answer only in the form “Who is…” or “What is…”. If I was the interviewer in a Turing Test between Watson and a human and I asked a question like “What is your favorite Christian Bale movie?”, I would get two answers. One answer would be statement “I don’t like Christian Bale movies. He yells far too much.” The other answer would be in the form of a question. “What is The Machinist?” Simple mistakes like these would quickly demonstrate a lack on “understanding” in Watson’s question and answer system to C, the interviewer.
One big objection to this problem would be that the fact that Watson answers in this short question format is only superficial. In other words, the way in which Watson responds to his input is only a facet of the Jeopardy! game. The objector would continue and say that Watson could easily be taught to answer in statements as the Turing test requires. I’ll play along and forget my previous objection to Watson’s so-called intelligence. However, there are other problems with Watson’s ability to pass the Turing Test. The first one I can see is that the DeepQA engine is not meant for conversation. Granted, the Turing Test is not exactly a conversation, but extremely informative questions could be dependent on the state of the questioning, i.e. “Which of the previous questions has been the most difficult for you?” This question would be very easy for any human taking the Turing Test, as all it requires is remembering a previous question and spitting it back (very little deception would be necessary if A was still a male). If Watson was asked this question during Jeopardy!, he would have a hard time pulling out any useful keywords to begin formulating an effective response. Watson’s answer would mostly be completely irrelevant as he is only looking at the keywords within the question. For this question to be properly answered, Watson would have to have a notion of where the conversation began and be able to remember previous questions only asked during the Turing Test.
Another objection I have is that Watson has absolutely no notion of the requirements of the Turing Test, especially when questions concern opinionated matters. For the sake of argument, let’s try to remedy Watson’s lack of Turing Test aptitude by prefixing every question with a statement linking the phrase to all information indexed about females. So my first example question would become “What is a girl’s favorite Christian Bale movie?” Still, there is absolutely no guarantee that the answer given will be accurate, much less convincing. Much deeper than the requirements of the Turing Test, the DeepQA engine is not meant for understanding. Watson is simply meant for single queries to a highly structures keyword database. IBM doesn’t even claim that there is any level of understanding, comprehension, or innate intelligence running behind that scene in Watson. Although Watson is a huge step in modern computing, in no way is “he” any form of artificial intelligence.