## Sunday, January 08, 2006

### Limits of copyrightability, part 2

In a previus post I concluded that works shorter than 110 bits of information do not deserve copyright because the risk of two persons incidentally producing an identical work and thus risking a false conviction would be unacceptably high. We saw that this was achieved when (1-2^-n)^(p*k*k) was at most one minus the acceptable risk 10^-6. Here n is the length of the work, p the number of other people producing copyrightable works, k the number of bits of information everyone on average produces in his lifetime.

We can expand this approach to cover similarity simply by regarding each new work as representing s similar new works. In other words we require (1-2^-n)^(s*p*k*k) to be at most one minus the acceptable risk 10^-6. The difficulty, however, lies in defining s. In text we might assume s covers, for example, changes in tense, mode, person, sentence structure, word order, dialect or language, or systematic changes of words with their synonyms, names with new ones, etc. Based purely on intuition I would guess s must be at least millions, before the resulting sentences in a natural language appear significantly different. If we choose s = 10^6, then the minimum information content of a copyrightable work grows from 110 bits to 130 bits, or from 100 to 216 letters.

While the above rise may not appear significant, it already seems to imply that no individual verse of Haiku should be copyrightable. Recall that typical western-style Haiku are 5+7+5 syllables long, such as the 82-letter Haiku with which David Dixon won the Salon magazine's Haiku contest:
Three things are certain:
Death, taxes, and lost data.
Guess which has occurred.

However, in defense we might argue that not p = 10^9 people write Haiku verses all of their life. If we instead take p = 10^6 and k = 100 for one million people writing one hundred verses, it would imply n = 73, or from 56 to 121 letters. Considering the the first 45 letters from Dixon's Haiku occur in several other phrases (try it with Google, for example), I'ld still claim that this Haiku probably shouldn't deserve copyright according to the arguments I've laid out.

Nevertheless, I think Dixon's Haiku is great, and as someone who has had all but the first occur personally and repeatedly, I think Dixon deserved to win.

Next week I hope to write a little about how all my reasoning applies to programming and copyrightability of source code.

olli said...

Your approach is interesting, but it also implies how difficult it is to apply purely probabilistic methods to human beings. Creativity (protected by copyright) is characteristic of humans. To decide whether something is creative, one should also consider the context. A very small amount of information can be creative in one context although it is not in another.

9:51 AM
cessu said...

Olli wrote: "Creativity (protected by copyright) is characteristic of humans."

In my rather reductionist world view, creativity is merely a word we attribute to a pseudo random process whose internal state and inner workings we don't (yet?) know. Excellent chess players attribute creativity to others and occasionally also to computers, and most people would find some actions of animals (dogs, chimps, dolphins, ...) somewhat creative. It is only an elective choice, not an inherent property of the creativity, that copyright is granted to humans, instead of dolphins, for example.

Olli wrote: "To decide whether something is creative, one should also consider the context. A very small amount of information can be creative in one context although it is not in another."

Correct. This contextual bias is what Claude Shannon's estimates of entropy per letter of English text also try to include. That's why the range (0.6-1.3 bits per letter) is rather wide, much wider than we would see if we applied mechanical compression software to the same texts, for example.

2:48 PM
olli said...

I largely agree. Creativity refers to our ability to find a good solution from a large number of choices when there is no one single correct answer. Probably that only refers to our incapability of understanding our own decision-making process and thus we just call it creative. Most likely, copyright should be granted to dolphins also (and thank you for the fish) but the current small-minded copyright-law defines that legally only humans can be creative.

Note that too often words have different meaning in everyday language and in legal speak. Creativity is a good example. I agree that decision-making in chess is creative (everyday language), although the output is hardly copyrightable (legal).

11:55 AM