O’Donnell, Cohen and the perfect anti-virus

Recently I stumbled upon this post where Adam O’Donnell discusses the impossibility of building a perfect anti-virus. He then pointed me to a 1984 paper by Cohen, which is apparently considered to be the first demonstration that no algorithm can correctly detect all possible computer viruses. Cohen’s proof goes like this :

  1. Let’s define a computer virus as a program that can infect other programs by modifying them to include a possibly evolved copy of itself.
  2. Let AV be a candidate virus detection algorithm. AV takes a program as input and returns true if it detected a virus, false otherwise.
  3. Let P be a program which contains the single following pseudocode statement:

    if AV(P) exit(); else infect_others();

    In other words, P calls the detection algorithm AV on itself. Provided that AV returns, two possibilities exist :

  4. If AV(P) returns true, P exits without infecting other programs. Thus by identifying P as a virus, AV gives a false positive, since P will not act as a virus.
  5. If AV(P) returns false, P will infect other programs. Thus AV produced a false negative, as it did not identify P as a virus, although P is one.
  6. From 4 and 5 : in both cases AV does not return the correct result when called on P, thus AV is not a perfect virus detection algorithm.

This proof is pretty weird, as it mixes up being a virus and behaving like a virus. P is a virus, both according to common sense and Cohen’s own definition 1 — in fact, P cannot do anything but infect other programs. Since P is obviously a virus, its detection at step 4 is not a false positive, even if no infection occurs during a particular run of P. So the contradiction at step 6 disappears and thus the conclusion.

What I find really puzzling in Cohen’s reasoning is that he considers a program P which he completely defines by its source code. But is P a virus ? Funnily enough, even with the source code Cohen cannot answer until P has been executed, as the answer depends on the output of AV(P). Contrary to his own definition, step 4 is actually based on behaviour, which is clearly not sufficient to decide if a program is a virus (slow/sparse infectors, logic bombs, etc).

Last words
Not only I find Cohen’s proof very unconvincing, but I also find it quite amusing from Mr O’Donnell to use a theoretical result to argue that making good anti-viruses is difficult. Imagine a football coach asking a player to run faster, and the guy objecting : “sorry coach, but running fast is hard because there’s a theoretical limit to the speed I can reach, called C …”. In both cases I think their actual performances are much too far from the theoretical limit to use it as an excuse.