Limits of copyrightability, part 2
We can expand this approach to cover similarity simply by regarding each new work as representing s similar new works. In other words we require (1-2^-n)^(s*p*k*k) to be at most one minus the acceptable risk 10^-6. The difficulty, however, lies in defining s. In text we might assume s covers, for example, changes in tense, mode, person, sentence structure, word order, dialect or language, or systematic changes of words with their synonyms, names with new ones, etc. Based purely on intuition I would guess s must be at least millions, before the resulting sentences in a natural language appear significantly different. If we choose s = 10^6, then the minimum information content of a copyrightable work grows from 110 bits to 130 bits, or from 100 to 216 letters.
While the above rise may not appear significant, it already seems to imply that no individual verse of Haiku should be copyrightable. Recall that typical western-style Haiku are 5+7+5 syllables long, such as the 82-letter Haiku with which David Dixon won the Salon magazine's Haiku contest:
Three things are certain:
Death, taxes, and lost data.
Guess which has occurred.
However, in defense we might argue that not p = 10^9 people write Haiku verses all of their life. If we instead take p = 10^6 and k = 100 for one million people writing one hundred verses, it would imply n = 73, or from 56 to 121 letters. Considering the the first 45 letters from Dixon's Haiku occur in several other phrases (try it with Google, for example), I'ld still claim that this Haiku probably shouldn't deserve copyright according to the arguments I've laid out.
Nevertheless, I think Dixon's Haiku is great, and as someone who has had all but the first occur personally and repeatedly, I think Dixon deserved to win.
Next week I hope to write a little about how all my reasoning applies to programming and copyrightability of source code.