July 3, 2011

Can Tweety fly?

Tags: Memories, University

The worst question you can ask a PhD student is “tell me about your thesis”. Or at least, that was the joke among the postgrad students at the UWA CS department in the mid 1990’s. I took this seriously and prepared an explanation of my thesis that I thought most people could understand (assuming some small knowledge of logic). I only ever used this explanation once many years ago (at a job interview for Rio Tinto R&TD). So it is about time I explained it again before I forget it all. For reference I did not complete my PhD, dropping out after 18 months.

The provisional title of my thesis was “Using Nonmonotonic Logic to extend Inductive Logic Programming”, and the research proposal is here. The idea was to have computers learn logical rules in a more natural manner than the existing systems. First some background on each of the two terms in the title.

First-order Logic is what most people mean when talking about logic. It involves formulas like flies(X) <- bird(X) (all birds fly), which if you also have bird(Tweety) (Tweety is a bird) implies flies(Tweety) (Tweety can fly). One issue with First-order Logic is that it assumes perfect knowledge. That is, it assumes that everything is known and correct. Thus it is monotonic - it is not possible for any future information to invalidate any previous implications. It can’t handle the situation where in addition to the knowledge above, we later learn that Tweety is a penguin and that penguins don’t fly. This new information invalidates our old implication that Tweety flys and is not allowed. Instead you have to know from that start that birds fly unless they are penguins. However, Ostriches don’t fly, nor do dead birds or birds with broken wings. With a little bit of thought many more exceptions to the rule “birds fly” can be imagined, and the rule becomes unwieldly.

In real-world problems there can be many exceptions or inconsistent data (noise), so First-order Logic can be problematic for machine learning. There are ways around this though. Often it is used in self-contained well-understood domain with few exceptions. However, many systems instead use Non-monotonic Logic, which allows new knowledge to contradict old beliefs. Many Non-monotonic Logic constructs seem so intuitive it is hard to imagine thinking without them. The Closed-world Assumption (CWA) says that if something is not known it must be false. When reading a train timetable with a train at 8am and the next specified train at 9am, you are using the CWA when you assume there is no train at 8:20am. Similar is Negation As Failure (NAF) (indeed it has been proven they are logically the same) where if something can’t be proven true then it must be false - sounds like a few university researchers I knew. If you have used Prolog, then you should be familiar with NAF. The logic I was mainly working on was Default Logic, which allows rules to be true by default assuming there is some justification. This allows exceptions in a compact and natural rule system.

Given the rules resulting from the learning system would be in a Non-monotonic Logic, the next step is to describe how those rules are learnt. The example above where it was implied that Tweety could fly is an example of Deductive Reasoning. Deduction is the process of using some rules and some background facts to imply new facts. Inductive Reasoning is the reverse, deriving rules from facts. So if you know Tweety is a bird and that Tweety flies, then you might derive the rule that all birds fly. Inductive Logic Programming (ILP) is just the term for computer programs that perform this process with the resulting rules as logic system. They take a set of background knowledge and sets of positive and negative examples and try to create rules that cover as many of the positive examples as possible while covering as few of the negative examples as possible. This is a computationally expensive process, computers only got powerful enough to do this for non-trivial datasets in the late 80’s, so when I studied in the mid 90’s the field was just getting started and becoming known.

My thesis attempted to marry the two concepts. There had been a few attempts at this before, but the best known ILP systems at the time didn’t handle exceptions or noise very well. Thus part 1 of my work was to construct an ILP learning system that output a Non-monotonic logic. This was about as far as I got, implementing some theoretical results from another researcher (with minor improvements). Part 2 was handling exceptions and noise better by flipping rules. That is, if the evidence against a rule became too strong the system should flip it to the reverse rule. I also looked at weakening the “strength” of a rule over time. All rules have the same applicability in non-probalistic logics. By “strength” I mean how long I keep a rule in my output theory before modifying or flipping it in the presence of negative examples, or even ignoring exceptions/noise if the rule is particularly “strong”. I didn’t come up with anything meaningful here, I just thought it might be useful. Although I have been told this became a hot research topic in the years after I quit (under the term “forgetting”, I think).

This may sound like it was mainly a coding project, maybe because that was where my interests lie. Most of my time should have been spent doing maths to prove my work was correct as it was a very theoretical field.