<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-20305076</id><updated>2011-12-30T01:59:47.799+02:00</updated><category term='skepticism'/><title type='text'>Cessu's blog</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>49</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-20305076.post-85892519056713562</id><published>2010-04-22T23:42:00.002+03:00</published><updated>2010-04-22T23:46:08.179+03:00</updated><title type='text'>Self-referential Sentencing</title><content type='html'>I explained a student the concept of powerlooping by constructing a program that generates self-referential sentences like&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;This sentence contains seventeen times the letter i, fifteen times the letter r, fourteen times the letter h, thirteen times the letter l, twelve times the letter n, eleven times the letter m, six times the letter f, five times the letter v, four times the letter o, and three times the letter u.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;The student gawked for fifteen seconds while I was expecting an exclamation like "awesome" or "cool", but instead he balked with "and &lt;span style="font-style:italic;"&gt;that&lt;/span&gt; is supposed to help me find a job?"&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-85892519056713562?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/85892519056713562/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=85892519056713562' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/85892519056713562'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/85892519056713562'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2010/04/self-referential-sentencing.html' title='Self-referential Sentencing'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-3388235818141033250</id><published>2009-10-15T01:22:00.002+03:00</published><updated>2009-10-15T01:34:20.053+03:00</updated><title type='text'>MegaHurtz for Your Ears</title><content type='html'>Last week I experienced the awe of social media: a year ago I posted an article of how &lt;a href="http://cessu.blogspot.com/2008/09/have-you-listened-to-your-program-today.html"&gt;rendering a program's execution as sound&lt;/a&gt; could incidentally ne rather pleasing to the human ear, but not until last week did I receive an avalanche of comments and requests for more.  It took a while to dust off all relevant programs I had used, but here you are: &lt;a href="http://innoguard.cs.hut.fi/~cessu/megahurtz/"&gt;190 hours of sound of programs&lt;/a&gt;, corresponding to approximately one second of CPU-time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-3388235818141033250?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/3388235818141033250/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=3388235818141033250' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/3388235818141033250'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/3388235818141033250'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2009/10/megahurtz-for-your-ears.html' title='MegaHurtz for Your Ears'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-7664436392410849217</id><published>2008-11-28T14:37:00.002+02:00</published><updated>2008-11-28T15:10:10.311+02:00</updated><title type='text'>SR3C</title><content type='html'>Symbol Ranking is a method for data compression where one maintains a LRU list of symbols recently seen in the given context.  Those symbols are expected to be more probable the next time and therefore encoded with fewer bits than other symbols.  The LRU list is usually only up to three symbols long, thus making it possible to store a context in just one 32-bit word.&lt;br /&gt;&lt;br /&gt;Symbol Ranking has been common computing folklore for ages, &lt;a href="http://www.cs.auckland.ac.nz/~peter-f/"&gt;Peter Fenwick&lt;/a&gt; gave the first modern treatment and implementation subsequently improved by &lt;a href="http://www.mattmahoney.net/"&gt;Matt Mahoney&lt;/a&gt; for example by adding a secondary arithmetic compression stage.  I added a few ideas of my own, rewrote it in C in a way easily embeddable to other programs to compress e.g. network connections and database logging.&lt;br /&gt;  &lt;br /&gt;This compression library, &lt;a href="http://hibase.cs.hut.fi/~cessu/sr3c-1.0.tar.gz"&gt;SR3C&lt;/a&gt;, hits the front where one either has to consume more memory or cycles to compress better.  The compression is slightly faster to &lt;TT&gt;gzip -7&lt;/TT&gt;, but results in ~11% smaller data.  &lt;TT&gt;bzip2 -2&lt;/TT&gt; compresses slightly better than SR3C, but takes almost three times as long and is not on-line.&lt;br /&gt;&lt;br /&gt;Matt Mahoney made a &lt;A HREF="http://cs.fit.edu/~mmahoney/compression/text.html#2660"&gt;version&lt;/A&gt; whose command line interface compiles on Windows and &lt;A HREF="http://cs.fit.edu/~mmahoney/compression/text.html"&gt;benchmarked&lt;/A&gt; it as well.  Both versions of SR3C are published under the MIT license.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-7664436392410849217?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/7664436392410849217/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=7664436392410849217' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/7664436392410849217'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/7664436392410849217'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2008/11/sr3c.html' title='SR3C'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-1036673034616106344</id><published>2008-11-28T14:22:00.002+02:00</published><updated>2008-11-28T14:33:59.285+02:00</updated><title type='text'>Random Access Pseudo-Random Numbers</title><content type='html'>Since my previous post I was approached by a fellow-in-keyboards in need of a "random-access" pseudo-random number generator which could generate the i'th generated random r number directly, without computing the intermediate i-1 random numbers since seeding.  Would a hash function for integers be a good RAPRNG?&lt;br /&gt;&lt;br /&gt;No!  Good hash functions avoid funnels and are thus bijective, which for 32-bit hash functions would imply only a 2^32 period. Consequently, if we draw N &lt; 2^32 consequtive random numbers, then hashing would make r could occur once or never, but not twice or more frequently.  A correct random number generation would result in an exponential distribution of the frequency of r.&lt;br /&gt;&lt;br /&gt;We can fix this by increasing the period, which is equivalent to increasing the number of output bits of the hash, but use only 32 bits of the output.  Alternatively we could hash i with two independent integer hash functions and xor the result.  Either way, we deliberately create just the right amount of final &lt;A HREF="http://burtleburtle.net/bob/hash/evahash.html#funneling"&gt;funnelling&lt;/A&gt;.&lt;br /&gt;&lt;br /&gt;Practical tests, such as &lt;A HREF="http://www.phy.duke.edu/~rgb/General/rand_rate.php"&gt;Dieharder&lt;/A&gt;, are surprisingly difficult.  I tried xoring results of several pairs of known integer hash functions, and eventually found a reasonable solution based my own &lt;TT&gt;hash_32_to_64&lt;/TT&gt; mentioned in my previous post.&lt;br /&gt;&lt;PRE&gt;uint32_t raprng(uint64_t i, uint64_t seed)&lt;br /&gt;{&lt;br /&gt;  uint64_t r = (2857720171ULL * ((uint32_t) i)) ^ 0x1EF57D8A7B344E7BULL;&lt;br /&gt;  r ^= r &gt;&gt; 29;&lt;br /&gt;  r += r &lt;&lt; 16;&lt;br /&gt;  r ^= r &gt;&gt; 21;&lt;br /&gt;  r += r &gt;&gt; 32;&lt;br /&gt;  r = (2857720171ULL * ((uint32_t) (i ^ r))) ^ (0xD9EA571C8AF880B6ULL + seed);&lt;br /&gt;  r ^= r &gt;&gt; 29;&lt;br /&gt;  r += r &lt;&lt; 16;&lt;br /&gt;  r ^= r &gt;&gt; 21;&lt;br /&gt;  return r + (r &gt;&gt; 32);&lt;br /&gt;}&lt;/PRE&gt;&lt;br /&gt;&lt;br /&gt;The hypothetical period of &lt;TT&gt;raprng&lt;/TT&gt; is 2^96 if &lt;TT&gt;i&lt;/TT&gt; were of sufficient width.  This period is barely enough to be respectable, at least compared to some serious PRNG's like the &lt;A HREF="http://en.wikipedia.org/wiki/Mersenne_twister"&gt;Mersenne Twister&lt;/A&gt;.  &lt;TT&gt;raprng&lt;/TT&gt; is also slower.  But it is random-accessible.&lt;br /&gt;&lt;br /&gt;Having gone through all this I need to point out that there is a less-known PRNG called the explicit inversive congruential generator (EICG), where the i'th output is defined as modular inverse of a*i+b for some a and b, where a should obviously be non-zero.  The problem herein is that modular inversion is by no means cheap, and it isn't clear to me that some other very non-linear bijection - such as a very good hash function - wouldn't do.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-1036673034616106344?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/1036673034616106344/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=1036673034616106344' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/1036673034616106344'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/1036673034616106344'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2008/11/random-access-pseudo-random-numbers.html' title='Random Access Pseudo-Random Numbers'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-6681025074591997298</id><published>2008-11-13T10:45:00.002+02:00</published><updated>2008-11-13T11:04:32.981+02:00</updated><title type='text'>Hashing with SSE2 Revisited, or My Hash Toolkit</title><content type='html'>Now that Bob "The Hashman" Jenkins &lt;a href="http://burtleburtle.net/bob/hash/doobs.html"&gt;refers&lt;/a&gt; to my &lt;a href="http://cessu.blogspot.com/2007/09/hashing-with-sse2.html"&gt;earlier post&lt;/a&gt; on a hash function implemented with the help of SSE2 instructions, I feel compelled to post an update on it.&lt;br /&gt;&lt;br /&gt;After my original post Bob and I concluded the mix step was not reversible, so I went back to the drawing board and came up with the function below.  As earlier, the task of padding the data to even 16 bytes and other trivialities are left to the reader.&lt;br /&gt;&lt;br /&gt;&lt;PRE&gt;/* Compile with gcc -msse2 ... */&lt;br /&gt;&lt;br /&gt;#include &amp;lt;stdlib.h&amp;gt;&lt;br /&gt;#include &amp;lt;stdint.h&amp;gt;&lt;br /&gt;#include &amp;lt;assert.h&amp;gt;&lt;br /&gt;#include &amp;lt;xmmintrin.h&amp;gt;&lt;br /&gt;&lt;br /&gt;static uint32_t coeffs[12] __attribute__((aligned(16))) = {&lt;br /&gt;  /* Four carefully selected coefficients and interleaving zeros. */&lt;br /&gt;  2561893793UL, 0, 1388747947UL, 0,&lt;br /&gt;  3077216833UL, 0, 3427609723UL, 0,&lt;br /&gt;  /* 128 bits of random data. */&lt;br /&gt;  0x564A4447, 0xC7265595, 0xE20C241D, 0x128FA608,&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;#define COMBINE_AND_MIX(c_1, c_2, s_1, s_2, in)                              \&lt;br /&gt;  /* Phase 1: Perform four 32x32-&gt;64 bit multiplication with the             \&lt;br /&gt;     input block and words 1 and 3 coeffs, respectively.  This               \&lt;br /&gt;     effectively propagates a bit change in input to 32 more                 \&lt;br /&gt;     significant bit positions.  Combine into internal state by              \&lt;br /&gt;     subtracting the result of multiplications from the internal             \&lt;br /&gt;     state. */                                                               \&lt;br /&gt;  s_1 = _mm_sub_epi64(s_1, _mm_mul_epu32(c_1, _mm_unpackhi_epi32(in, in)));  \&lt;br /&gt;  s_2 = _mm_sub_epi64(s_2, _mm_mul_epu32(c_2, _mm_unpacklo_epi32(in, in)));  \&lt;br /&gt;                                                                             \&lt;br /&gt;  /* Phase 2: Perform shifts and xors to propagate the 32-bit                \&lt;br /&gt;     changes produced above into 64-bit (and even a little larger)           \&lt;br /&gt;     changes in the internal state. */                                       \&lt;br /&gt;  /* state ^= state &gt;64&gt; 29; */                                              \&lt;br /&gt;  s_1 = _mm_xor_si128(s_1, _mm_srli_epi64(s_1, 29));                         \&lt;br /&gt;  s_2 = _mm_xor_si128(s_2, _mm_srli_epi64(s_2, 29));                         \&lt;br /&gt;  /* state +64= state &lt;64&lt; 16; */                                            \&lt;br /&gt;  s_1 = _mm_add_epi64(s_1, _mm_slli_epi64(s_1, 16));                         \&lt;br /&gt;  s_2 = _mm_add_epi64(s_2, _mm_slli_epi64(s_2, 16));                         \&lt;br /&gt;  /* state ^= state &gt;64&gt; 21; */                                              \&lt;br /&gt;  s_1 = _mm_xor_si128(s_1, _mm_srli_epi64(s_1, 21));                         \&lt;br /&gt;  s_2 = _mm_xor_si128(s_2, _mm_srli_epi64(s_2, 21));                         \&lt;br /&gt;  /* state +64= state &lt;128&lt; 32; */                                           \&lt;br /&gt;  s_1 = _mm_add_epi64(s_1, _mm_slli_si128(s_1, 4));                          \&lt;br /&gt;  s_2 = _mm_add_epi64(s_2, _mm_slli_si128(s_2, 4));                          \&lt;br /&gt;                                                                             \&lt;br /&gt;  /* Phase 3: Propagate the changes among the four 64-bit words by           \&lt;br /&gt;     performing 64-bit subtractions and 32-bit word shuffling. */            \&lt;br /&gt;  s_1 = _mm_sub_epi64(s_1, s_2);                                             \&lt;br /&gt;  s_2 = _mm_sub_epi64(_mm_shuffle_epi32(s_2, _MM_SHUFFLE(0, 3, 2, 1)), s_1); \&lt;br /&gt;  s_1 = _mm_sub_epi64(_mm_shuffle_epi32(s_1, _MM_SHUFFLE(0, 1, 3, 2)), s_2); \&lt;br /&gt;  s_2 = _mm_sub_epi64(_mm_shuffle_epi32(s_2, _MM_SHUFFLE(2, 1, 0, 3)), s_1); \&lt;br /&gt;  s_1 = _mm_sub_epi64(_mm_shuffle_epi32(s_1, _MM_SHUFFLE(2, 1, 0, 3)), s_2); \&lt;br /&gt;                                                                             \&lt;br /&gt;  /* With good coefficients any one-bit flip in the input has now            \&lt;br /&gt;     changed all bits in the internal state with a probability               \&lt;br /&gt;     between 45% to 55%. */&lt;br /&gt;&lt;br /&gt;void hasshe2(const unsigned char *input_buf, size_t n_bytes, &lt;br /&gt;             unsigned char *output_state)&lt;br /&gt;{&lt;br /&gt;  __m128i coeffs_1, coeffs_2, rnd_data, input, state_1, state_2;&lt;br /&gt;&lt;br /&gt;  assert(n_bytes % 16 == 0);&lt;br /&gt;&lt;br /&gt;  coeffs_1 = _mm_load_si128((void *) coeffs);&lt;br /&gt;  coeffs_2 = _mm_load_si128((void *) (coeffs + 4));&lt;br /&gt;  rnd_data = _mm_load_si128((void *) (coeffs + 8));&lt;br /&gt;&lt;br /&gt;  /* Initialize internal state to something random.  (Alternatively,&lt;br /&gt;     if hashing a chain of data, read in the previous hash result from&lt;br /&gt;     somewhere.) */&lt;br /&gt;  state_1 = state_2 = rnd_data;&lt;br /&gt;&lt;br /&gt;  while (n_bytes &amp;gt;= 16) {&lt;br /&gt;    /* Read in 16 bytes, or 128 bits, from buf.  Advance buf and&lt;br /&gt;       decrement n_bytes accordingly. */&lt;br /&gt;    input = _mm_loadu_si128((void *) input_buf);&lt;br /&gt;    input_buf += 16;&lt;br /&gt;    n_bytes -= 16;&lt;br /&gt;&lt;br /&gt;    COMBINE_AND_MIX(coeffs_1, coeffs_2, state_1, state_2, input);&lt;br /&gt;  }&lt;br /&gt;&lt;br /&gt;  /* Postprocessing.  Copy half of the internal state into fake input,&lt;br /&gt;     replace it with the constant rnd_data, and do one combine and mix&lt;br /&gt;     phase more. */ &lt;br /&gt;  input = state_1;&lt;br /&gt;  state_1 = rnd_data;&lt;br /&gt;  COMBINE_AND_MIX(coeffs_1, coeffs_2, state_1, state_2, input);&lt;br /&gt;&lt;br /&gt;  _mm_storeu_si128((void *) output_state, state_1);&lt;br /&gt;  _mm_storeu_si128((void *) (output_state + 16), state_2);&lt;br /&gt;}&lt;/PRE&gt;&lt;br /&gt;&lt;br /&gt;Some people have asked me what my set preferred set of hash functions are.  As mentioned earlier, the one-at-a-time hash fills most of my daily needs.  But when I need more speed or more bits to address a larger hash table, I use either Jenkins hash functions for string-like data, or when I have fast multiplication and the hashed material is more reminiscient of picking fields in a record, I use&lt;br /&gt;&lt;br /&gt;&lt;PRE&gt;uint64_t hash_32_to_64(uint32_t key, uint64_t seed)&lt;br /&gt;{&lt;br /&gt;  seed ^= 2857720171ULL * key;&lt;br /&gt;  seed ^= seed &amp;gt;&amp;gt; 29;&lt;br /&gt;  seed += seed &amp;lt;&amp;lt; 16;&lt;br /&gt;  seed ^= seed &amp;gt;&amp;gt; 21;&lt;br /&gt;  seed += seed &amp;lt;&amp;lt; 32;&lt;br /&gt;  return seed;&lt;br /&gt;}&lt;/PRE&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://burtleburtle.net/bob/hash/integer.html"&gt;Bob Jenkins&lt;/a&gt; and &lt;a href="http://www.cris.com/%7ETtwang/tech/inthash.htm"&gt;Thomas Wang&lt;/a&gt; have pages for similar integer to integer hash functions. And when I need high speed and cryptographical width but not cryptographic strengh, I use the &lt;tt&gt;hasshe2&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;In summary, my kit of non-cryptographic hash functions contains roughly half a dozen snippets of code I cut, paste and modify as needed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-6681025074591997298?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/6681025074591997298/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=6681025074591997298' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/6681025074591997298'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/6681025074591997298'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2008/11/hashing-with-sse2-revisited-or-my-hash.html' title='Hashing with SSE2 Revisited, or My Hash Toolkit'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-2719512821370606668</id><published>2008-09-23T14:34:00.003+03:00</published><updated>2008-09-23T15:32:38.342+03:00</updated><title type='text'>Have You Listened to Your Program Today?</title><content type='html'>Assume you could plug a synthesizer into the internals of your computer's CPU, what would you hear?  Noise and cacophony would be an intuitive first guess, and probably correct if you would hear the sound in real time.  But if we slow down the CPU to approximately 3500 operations per second, we just about might assume the CPU's computation could contain all essential ingredients of music: rhytm and repetition caused by iteration, melodies caused by various mechanisms that change a value incrementally, and harmonies caused by some mechanism of excluding values, such as memory alignment.&lt;br /&gt;&lt;br /&gt;Since I wasn't able to open a live CPU without making it slightly disfunctional in the process, I took &lt;a href="http://www.valgrind.org/"&gt;Valgrind&lt;/a&gt; to the rescue: I wrote a little object code instrumentation tool, "skin" as Valgrind authors call them, which logs into a file all values I wish to follow when executing the software with which I wish to have a &lt;a href="http://imdb.com/title/tt0075860/"&gt;Close Encounter of the Third Kind&lt;/a&gt;.  A little more code to generate sound and movies of the values.&lt;br /&gt;&lt;br /&gt;Here's what one would here if the bits one to seven in the result of each addition and subtraction would indicate a very short ping.  All those bits being zero corresponds to the barely audible 24.5 Hz, or the sub-contra G.  Roughly ten highest notes are unfortunately ultrasound, but this was nevertheless the best representation I could come up with.&lt;br /&gt;&lt;br /&gt;&lt;object width="320" height="150" class="BLOG_video_class" id="BLOG_video-5ba7dee528420e50" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"&gt;&lt;param name="movie" value="http://www.youtube.com/get_player"&gt;&lt;param name="bgcolor" value="#FFFFFF"&gt;&lt;param name="allowfullscreen" value="true"&gt;&lt;param name="flashvars" value="flvurl=http://v17.nonxt2.googlevideo.com/videoplayback?id%3D5ba7dee528420e50%26itag%3D5%26app%3Dblogger%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1330010990%26sparams%3Did,itag,ip,ipbits,expire%26signature%3D7704F800F97F16C9E85C86DED30C301BE3FDB03A.E5799C51E1FA127843C7A6EF1B9BA2C66608178%26key%3Dck1&amp;amp;iurl=http://video.google.com/ThumbnailServer2?app%3Dblogger%26contentid%3D5ba7dee528420e50%26offsetms%3D5000%26itag%3Dw160%26sigh%3DrcH9asghiLHsVEnIfgeks782F_g&amp;amp;autoplay=0&amp;amp;ps=blogger"&gt;&lt;embed src="http://www.youtube.com/get_player" type="application/x-shockwave-flash"width="320" height="150" bgcolor="#FFFFFF"flashvars="flvurl=http://v17.nonxt2.googlevideo.com/videoplayback?id%3D5ba7dee528420e50%26itag%3D5%26app%3Dblogger%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1330010990%26sparams%3Did,itag,ip,ipbits,expire%26signature%3D7704F800F97F16C9E85C86DED30C301BE3FDB03A.E5799C51E1FA127843C7A6EF1B9BA2C66608178%26key%3Dck1&amp;iurl=http://video.google.com/ThumbnailServer2?app%3Dblogger%26contentid%3D5ba7dee528420e50%26offsetms%3D5000%26itag%3Dw160%26sigh%3DrcH9asghiLHsVEnIfgeks782F_g&amp;autoplay=0&amp;ps=blogger"allowFullScreen="true" /&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;They say listening daily to your spouse improves your relationship.  I wonder whether the same applies to programs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-2719512821370606668?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='enclosure' type='video/mp4' href='http://www.blogger.com/video-play.mp4?contentId=5ba7dee528420e50&amp;type=video%2Fmp4' length='0'/><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/2719512821370606668/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=2719512821370606668' title='56 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/2719512821370606668'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/2719512821370606668'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2008/09/have-you-listened-to-your-program-today.html' title='Have You Listened to Your Program Today?'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>56</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-6135041804897725183</id><published>2007-12-04T00:00:00.000+02:00</published><updated>2007-12-04T00:06:37.620+02:00</updated><title type='text'>At 8am Helsinki is a 1.77-Dimensional</title><content type='html'>And worse, at Sunday noon its dimensions decrease to 1.45.  No wonder I occasionally feel a little confused about our geography...&lt;br /&gt;&lt;br /&gt;Ok, maybe I need to explain a little.  Assume we are given distances between random pairs of points in a hypercube of unknown number of dimensions and extent.  A little calculus shows that if the coordinates of the points are uniformly distributed along each axis, the square of their distance has an expectation of a&lt;SUP&gt;2&lt;/SUP&gt; N / 6 and variance of 7 a&lt;SUP&gt;4&lt;/SUP&gt; N / 180 where N is the number of dimensions and a the edge length of the hypercube.  I hereafter make a lofty assumption that these formulas apply also to fractional dimensions, at least for some such definition.&lt;br /&gt;&lt;br /&gt;According to &lt;A HREF="http://www.ytv.fi/"&gt;YTV&lt;/A&gt; there are 6913 bus stops or other kind of access points for regional public transit in Helsinki and its satellite municipalities Espoo, Vantaa, Kerava, Kauniainen and Kirkkonummi combined.  I wrote a script which picks two points at random and queries the optimum public transport solution between them from a web service generously provided by YTV.  Due to the nature of the data provided by the web service, I define distance not by meters but by duration.  More specifically, how many minutes before 8am must one enter the starting point in order to reach the end point by 8am on a working day's morning.&lt;br /&gt;&lt;br /&gt;Based on some 20000 distance samples, the mean distance is 54.4 minutes with a whopping maximum of &lt;A HREF="http://aikataulut.ytv.fi/reittiopas/fi/?test=1&amp;a=4188&amp;b=5761&amp;keya=FOO&amp;keyb=FOO&amp;hour=08&amp;min=00&amp;vm=2&amp;day=03&amp;month=12&amp;year=2007&amp;va=2&amp;adv="&gt;275 minutes&lt;/A&gt;.  No wonder people prefer cars in those more rural areas of the region.  But from the statistics of the squares of the distances and solving the equations for a and N shows that the regional public transport system behaves like a 1.77-dimensional hypercube with edge length of 109 minutes of travelling.&lt;br /&gt;&lt;br /&gt;On weekends public transport becomes less frequent and a larger portion of traffic goes to and from various local centers instead of offering short-cuts around them.  This increases the mean travelling time by almost 10 minutes to 64 minutes when aiming to arrive on Sunday noon.  In our hypercubistic terms, this decreases dimensionality to 1.45 and increases edge length to 142 minutes.&lt;br /&gt;&lt;br /&gt;The really curious observation is the significance of high-speed rail traffic: refraining from subways and trains and using only busses, trams, ferries and feet doesn't increase the mean traveling time more than six minutes, but more importantly, it breaks off clusters reachable from others only through very long walks or infrequently running night-time busses.  Consequently the variance of traveling time is larger than a homogenous cube-like space would allow.  Mechanical use of the equations would in fact describe Helsinki without fast rails as a hypercube with 0.39 dimensions, i.e. &lt;I&gt;fewer&lt;/I&gt; dimensions as a straight line!&lt;br /&gt;&lt;br /&gt;Of course, the true reason for rails is not in fact so much in their speed (compared to an express bus) than in their capacity.  And they indeed do add another dimension to our lives, or 1.38 to be more precise.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-6135041804897725183?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/6135041804897725183/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=6135041804897725183' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/6135041804897725183'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/6135041804897725183'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2007/12/at-8am-helsinki-is-177-dimensional.html' title='At 8am Helsinki is a 1.77-Dimensional'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-4091061074798840942</id><published>2007-09-10T15:17:00.001+03:00</published><updated>2008-11-13T11:07:35.128+02:00</updated><title type='text'>Hashing with SSE2</title><content type='html'>NOTE: This post has been &lt;A HREF="http://cessu.blogspot.com/2008/11/hashing-with-sse2-revisited-or-my-hash.html"&gt;revised&lt;/A&gt;.&lt;br /&gt;&lt;br /&gt;Almost all software engineers will occasionally need a hash function, either for hash table lookups, a source of data-dependent pseudo-randomness, or for a checksum of some sort.  The classic one I and many others use is the "one-at-a-time" -hash:&lt;br /&gt;&lt;PRE&gt;&lt;br /&gt;u_int32_t one_at_a_time(unsigned char *key, size_t len)&lt;br /&gt;{&lt;br /&gt;  u_int32_t hash, i; &lt;br /&gt;  for (hash = i = 0; i &lt; len; ++i) {&lt;br /&gt;    hash += key[i];&lt;br /&gt;    hash += hash &lt;&lt; 10;&lt;br /&gt;    hash ^= hash &gt;&gt; 6;&lt;br /&gt;  }&lt;br /&gt;  hash += hash &lt;&lt; 3;&lt;br /&gt;  hash ^= hash &gt;&gt; 11;&lt;br /&gt;  hash += hash &lt;&lt; 15;&lt;br /&gt;  return hash;&lt;br /&gt;}&lt;br /&gt;&lt;/PRE&gt;&lt;br /&gt;It's reasonably fast and pseudo-random for most everyday use.  And because it's so simple, it is trivially modified for special purposes, such as hashing nul-terminated strings in C.&lt;br /&gt;&lt;br /&gt;The one-at-a-time hash could nevertheless be faster, and several better hash functions have been suggested by &lt;a href="http://burtleburtle.net/bob/hash/doobs.html"&gt;Bob Jenkins&lt;/a&gt;, &lt;a href="http://www.isthe.com/chongo/tech/comp/fnv/"&gt;Fowler, Noll and Vo&lt;/a&gt;, and &lt;a href="http://www.azillionmonkeys.com/qed/hash.html"&gt;Paul Hsieh&lt;/a&gt;.  The one-at-a-time hash uses a 32-bit internal state stored in the variable &lt;TT&gt;hash&lt;/TT&gt; and consumes eight bits of input per iteration.  Hsieh's hash speeds this up consuming 32-bits per iteration while still significantly improving the hash function's statistical properties.  Jenkins' hash increases both the internal state and consumption to 96 bits.&lt;br /&gt;&lt;br /&gt;But I occasionally want to use hash values as checksums.  Thirty-two bits is insufficient even if I don't require cryptographic resilience against malicious attacks - use &lt;a href="http://en.wikipedia.org/wiki/SHA-2"&gt;SHA-2&lt;/a&gt; for those purposes.  Thirty-two bits is also becoming insufficient when indexing a hash table on a large server.  Jenkins does offer a 64-bit variant of his hash, but mixing a large internal state becomes slow with 32 or even 64-bit wide registers.&lt;br /&gt;&lt;br /&gt;So I decided to sacrifice portability and use the Intel's multimedia extension &lt;a href="http://en.wikipedia.org/wiki/SSE2"&gt;SSE2&lt;/a&gt; with a good supply of big fat 128 bits wide registers and funny instructions for SIMD operations.  The multiply-add instruction on eight 16-bit integer pairs to four 32-bit integer results is particularly handy for performing Fowler-Noll-Vo -style multiplication to propagate one-bit changes to at least 16 more significant bits.  A little more work is needed to propagate those changes evenly to the other fifteen 16-bit&lt;br /&gt;entities, but SSE does have 128-bit bitwise operations, 64-bit arithmetic, 32-bit word shuffling and funny byte interleaving primitives (&lt;a href="http://www.amd.com/us-en/assets/content_type/DownloadableAssets/dwamd_26568.pdf"&gt;exact&lt;/a&gt; and &lt;a href="http://www.tommesani.com/Docs.html"&gt;more approachable&lt;/a&gt; documentation).  Some research, a test driver to evaluate the statistical properties of the hash, a genetic algorithm to optimize the hypermystical constants in the code, and a weekend of CPU-time went into getting the details right - but hey, any hacker's goal is to save a few CPU-cycles regardless of how much computing it takes!&lt;br /&gt;&lt;br /&gt;In order to save you from semi-trivial details I present only with a simplified version of the hash which assumes the number of bytes to hash is divisible by 32.  I clocked my hash as 5-10% slower than Hsieh's, but I'm quite happy with its statistical properties: a one-bit flip in the input flips any bit in the 256-bit internal state by the end of each iteration with a probability within 0.03 of 50% in a test of 10^8 one-bit flips.  The corresponding probability range for Hsieh's hash is from 3.2% to 25%, or if the final avalanching included within 0.31 of 50%.  By sacrificing some of the quality of the hash the function below could be modified to be even faster than Hsieh's hash.&lt;br /&gt;&lt;PRE&gt;&lt;br /&gt;/* Compile with gcc -msse2 ... */&lt;br /&gt;&lt;br /&gt;#include &amp;lt;xmmintrin.h&amp;gt;&lt;br /&gt;&lt;br /&gt;typedef signed short s_int16_t; /* A 16-bit signed type. */&lt;br /&gt;&lt;br /&gt;static s_int16_t mul_add_coeffs[16] __attribute__((aligned(16))) = {&lt;br /&gt;   18743,  28829, -22921, -14039, -15247, -11117, -14293,  18911,&lt;br /&gt;  -23407,  17987, -11321, -25943, -32287,  10001,  30773, -12541,&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;/* Compute a 256-bit hash into res[0..31] using SSE2.  buf and res&lt;br /&gt;   need not be aligned, but len must be divisible by 32. */&lt;br /&gt;static void hasshe2(const unsigned char *buf, int len, unsigned char *res)&lt;br /&gt;{&lt;br /&gt;  __m128i madd_1, madd_2, in_1, in_2, hash_1, hash_2, tmp_1, tmp_2;&lt;br /&gt;&lt;br /&gt;  madd_1 = _mm_load_si128((void *) mul_add_coeffs);&lt;br /&gt;  madd_2 = _mm_load_si128((void *) (mul_add_coeffs + 8));&lt;br /&gt;&lt;br /&gt;  in_1 = _mm_loadu_si128((void *) buf);&lt;br /&gt;  in_2 = _mm_loadu_si128((void *) (buf + 16));&lt;br /&gt;&lt;br /&gt;  hash_1 = _mm_madd_epi16(madd_1, in_1);&lt;br /&gt;  hash_2 = _mm_madd_epi16(madd_2, in_2);&lt;br /&gt;  goto rest_of_mix;&lt;br /&gt;&lt;br /&gt;  do {&lt;br /&gt;    in_1 = _mm_loadu_si128((void *) buf);&lt;br /&gt;    in_2 = _mm_loadu_si128((void *) (buf + 16));&lt;br /&gt;&lt;br /&gt;    hash_1 = _mm_xor_si128(hash_1, _mm_madd_epi16(madd_1, in_1));&lt;br /&gt;    hash_2 = _mm_xor_si128(hash_2, _mm_madd_epi16(madd_2, in_2));&lt;br /&gt;&lt;br /&gt;  rest_of_mix:&lt;br /&gt;    hash_1 = _mm_add_epi64(hash_1, _mm_srli_si128(hash_1, 3));&lt;br /&gt;    hash_1 = _mm_sub_epi64(hash_1, &lt;br /&gt;      _mm_shuffle_epi32(hash_1, _MM_SHUFFLE(0, 1, 2, 3)));&lt;br /&gt;    hash_2 = _mm_add_epi64(hash_2, _mm_srli_si128(hash_2, 3));&lt;br /&gt;    hash_2 = _mm_sub_epi64(hash_2,&lt;br /&gt;      _mm_shuffle_epi32(hash_2, _MM_SHUFFLE(0, 1, 2, 3)));&lt;br /&gt;    &lt;br /&gt;    tmp_1 = _mm_xor_si128(in_1, _mm_unpackhi_epi8(hash_2, hash_1));&lt;br /&gt;    tmp_2 = _mm_xor_si128(in_2, _mm_unpacklo_epi8(hash_1, hash_2));&lt;br /&gt;    hash_1 = _mm_xor_si128(tmp_2, _mm_madd_epi16(madd_1, tmp_1));&lt;br /&gt;    hash_1 = _mm_sub_epi64(hash_1, &lt;br /&gt;      _mm_shuffle_epi32(hash_1, _MM_SHUFFLE(0, 1, 2, 3)));&lt;br /&gt;    hash_2 = _mm_xor_si128(tmp_1, _mm_madd_epi16(madd_2, tmp_2));&lt;br /&gt;    hash_2 = _mm_sub_epi64(hash_2,&lt;br /&gt;      _mm_shuffle_epi32(hash_2, _MM_SHUFFLE(0, 1, 2, 3)));&lt;br /&gt;    &lt;br /&gt;    /* Move these after the loop if good statistical properties not&lt;br /&gt;       required. */&lt;br /&gt;    tmp_1 = _mm_xor_si128(in_1, _mm_unpackhi_epi8(hash_1, hash_2));&lt;br /&gt;    tmp_2 = _mm_xor_si128(in_2, _mm_unpacklo_epi8(hash_2, hash_1));&lt;br /&gt;    hash_1 = _mm_xor_si128(tmp_2, _mm_madd_epi16(madd_1, tmp_1));&lt;br /&gt;    hash_1 = _mm_sub_epi64(hash_1, &lt;br /&gt;      _mm_shuffle_epi32(hash_1, _MM_SHUFFLE(2, 1, 0, 3)));&lt;br /&gt;    hash_2 = _mm_xor_si128(tmp_1, _mm_madd_epi16(madd_2, tmp_2));&lt;br /&gt;    hash_2 = _mm_sub_epi64(hash_2,&lt;br /&gt;      _mm_shuffle_epi32(hash_2, _MM_SHUFFLE(2, 1, 0, 3)));&lt;br /&gt;    &lt;br /&gt;    len -= 32;&lt;br /&gt;    buf += 32;&lt;br /&gt;  } while (len &amp;gt; 0);&lt;br /&gt;&lt;br /&gt;  _mm_storeu_si128((void *) res, hash_1);&lt;br /&gt;  _mm_storeu_si128((void *) (res + 16), hash_2);&lt;br /&gt;}&lt;/PRE&gt;&lt;br /&gt;&lt;br /&gt;The code should work on all Pentia 4 and Athlon 64's and newer, but my hash will nevertheless become gradually obsolete (or at least subject to further improvement) because &lt;a href="http://en.wikipedia.org/wiki/SSE4"&gt;SSE4&lt;/a&gt; will provide an instruction for computing &lt;a href="http://en.wikipedia.org/wiki/CRC32"&gt;CRC32&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-4091061074798840942?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/4091061074798840942/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=4091061074798840942' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/4091061074798840942'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/4091061074798840942'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2007/09/hashing-with-sse2.html' title='Hashing with SSE2'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-621322102427117157</id><published>2007-06-20T13:43:00.000+03:00</published><updated>2007-06-20T13:50:14.206+03:00</updated><title type='text'>Was Noah's Ark a Sperm Bank?</title><content type='html'>"And God said unto Noah, [..] make thee an ark [..] the length of the ark shall be three hundred cubits, the breadth of it fifty cubits, and the height of it thirty cubits." (Gen. 6:13-15) &lt;br /&gt;&lt;br /&gt;A cubit is the distance between an adult's elbow and tip of the finger, approximately 45 cm.  The ark was thus 135 meters long, 22.5 meters wide, and 13.5 meters high.  These are presumably outer measures, and hence the habitable volume of the ark could not have been much more than 35000 cubic meters.&lt;br /&gt;&lt;br /&gt;Even the most conservative estimates state that there are at least &lt;A HREF="http://hypertextbook.com/facts/2003/FelixNisimov.shtml"&gt;two million species of animals&lt;/A&gt; alive today, and surely a comparable number of plants, fungi, etc.  A large fraction of species live in water, but could not survive if their ecosystem would suddenly submerge a mile deep - according to the bible even high mountains were submerged.  There are &lt;A HREF="http://faculty.plattsburgh.edu/thomas.wolosz/howmanysp.htm"&gt;approximations&lt;/A&gt; that the number of species of a given body length L is proportional to L raised to some power between -1.5 and -3.  For example, if we use the frequent choice of -2 and we knew there is one 10-meter long species, then there are 100 one-meter species, ten thousand 10-centimeter species, etc.  If we limit the species size to a tenth of a millimeter (for example, assuming evolution (!) will sufficiently rapidly reproduce the smaller species), approximate that species are half as wide and high as they are long, and integrate the product of volume and number of species over the species length, we obtain a total volume of approximately 125000 cubic meters.&lt;br /&gt;&lt;br /&gt;We have already exceeded the available volume in the ark by a factor of 3.5, and we are yet to cram in the other sex and enough food for animals and soil for plants for the 150 days the bible says Noah sailed.  Remember also that God required Noah to provide accomodation for his extended family, and in order to avoid extinction by inbreeding, he would have had to take tens if not closer to a hundred individuals of all other species as well.  All this is a little too impressive for a six hundred year old man.&lt;br /&gt;&lt;br /&gt;Unless Noah was in fact history's first documented case of sperm banking.  If a small forest of &lt;A HREF="http://en.wikipedia.org/wiki/Sequoiadendron"&gt;Giant Sequoias&lt;/A&gt; were a challenge to fit in the ark, a sachel of fist-sized cones or a teaspoon of seeds would not.  A rough calculation shows that all necessary genetic material could easily be fitted in a negligible part of the ark leaving the rest free for all the technology needed for recolonization, deep-freezing sperm, artificial wombs and all the other technology necessary to recreate mammals without living parents.  That's truly impressive technology, but far more plausible than the alternative.  And hey, the guy was six hundred years old - that's plenty of time to write any number of dissertations on all required fields, and a good reason to become a drunkard afterwards, like Noah did.&lt;br /&gt;&lt;br /&gt;In fact I predict that this will be the method that humans will use should we some day colonize other stars systems.  We will send unmanned ships ahead, and once they reach their destination we submit by radio (or laser) the genetic information, culture, and without a doubt a large number of software updates to the distant ships.&lt;br /&gt;&lt;br /&gt;But there's still something fishy about Noah and the ark.  Noah had sent out pigeons to search for land.  When the waters abated and the ark hit mount Ararat one pigeon came back with an olive leaf.  However, here lies a contradiction: olive doesn't grow high on Ararat and all ground-level olives must have rotten by then.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-621322102427117157?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/621322102427117157/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=621322102427117157' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/621322102427117157'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/621322102427117157'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2007/06/was-noahs-ark-sperm-bank.html' title='Was Noah&apos;s Ark a Sperm Bank?'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-5958232098416223030</id><published>2007-06-20T09:18:00.000+03:00</published><updated>2007-06-20T09:23:03.238+03:00</updated><title type='text'>My Alma Mater</title><content type='html'>I graduated and now work in the &lt;A HREF="http://www.hut.fi"&gt;Helsinki University of Technology&lt;/A&gt; (HUT).  Since 2003 &lt;A HREF="http://ed.sjtu.edu.cn/en/index.htm"&gt;Institute of Higher Education&lt;/A&gt; at &lt;A HREF="http://www.sjtu.edu.cn/english/"&gt;Shanghai Jiao Tong University&lt;/A&gt; has maintaned a yearly &lt;A HREF="http://ed.sjtu.edu.cn/ranking.htm"&gt;ranking&lt;/A&gt; of the 500 best universities in the world.  We were ranked as 371st, 349th, 446th, and last year 447th in an unfortunate declining trend.&lt;br /&gt;&lt;br /&gt;Surely this must only be a coincidence with the rampant &lt;A HREF="http://en.wikipedia.org/wiki/Managerialism"&gt;managerialism&lt;/A&gt; we have experienced at my Alma Mater.  In a few years we've been blessed with several administrative procedures for determining salaries, counting working hours, managing travels, and accepting bills, just to name a few.  The least common denominator of all these systems is that they are rigid, vaguely documented, self-contradictory, their IT implementations suck big time, and they cause a huge net increase in management work of &lt;A HREF="http://www.ct.tkk.fi/hallinto/ehdotus.html"&gt;really annoyed professors&lt;/A&gt; and other staff who really should concentrate on teaching and research instead.&lt;br /&gt;&lt;br /&gt;No wonder the Shanghai list ranks us low because our own salary policies rank our degrees even lower: our M.Sc.'s teaching courses, working in or leading a research project can be paid from 50-75% of the salary paid to a person in a non-teaching position with the lowest job grade requiring a corresponding degree, typically a non-science degree from another university.  For example, at default performance Linus Torvald's salary could be at most 2178 euros per month, or roughly $40000 a year, whereas if he (or some bureaucrat) neither taught nor researched he would be paid at least 2931 euros per month.  No surprize that the name of this new salary system, UPJ, is widely used as a profanity among current teachers and anybody trying to recruit competent personnel to a (practically oriented) research project.&lt;br /&gt;&lt;br /&gt;And in accordance to the &lt;A HREF="http://en.wikipedia.org/wiki/Parkinson%27s_law"&gt;Parkinson's Law&lt;/A&gt; the administration is growing rapidly.  In addition to increasingly filling the days of teachers and researchers, administration is a major recruiter of new people.  I listed every open position at HUT closing on the second half of 2006 or the first half of 2007 as they appeared on &lt;A HREF="http://www.tkk.fi/Yksikot/Hallitus/Virat/index.html"&gt;the official recruitment page&lt;/A&gt;.  I ignored part-time positions, graduate schools and stipendiates and counted only "serious" jobs.  Considering the main function of our university it was surprizing to see that barely half, 83, of the 160 positions were primarily for teaching and research.  Furthermore, 62% of the 77 non-teaching positions were tenure positions, which is significantly higher than the 25% of HUT overall.&lt;br /&gt;&lt;br /&gt;I also wish to bring into our memories the &lt;A HREF="http://en.wikipedia.org/wiki/Intelligent_design"&gt;intelligent design&lt;/A&gt; conference that was held in the Helsinki University of Technology on the 22nd of October 2004.  Our ranking on the Shanghai list dropped by nearly 100 positions in the next update.&lt;br /&gt;&lt;br /&gt;But maybe there's hope.  There are still able and hard-working people here, and in a few years HUT will be merged with two other universities and funded by a foundation with a significant industrial contribution.  Some see that as a threat to academic freedom, but I'm hopeful that the board of the foundation may have the courage to tackle the managerialism and let the able and hard-working people focus on the real work: teaching and research.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-5958232098416223030?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/5958232098416223030/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=5958232098416223030' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/5958232098416223030'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/5958232098416223030'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2007/06/my-alma-mater.html' title='My Alma Mater'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-4904368409192762112</id><published>2007-03-18T03:04:00.000+02:00</published><updated>2007-03-18T03:07:00.700+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='skepticism'/><title type='text'>Skepticism, and Bullshit in Bullshit by Penn and Teller</title><content type='html'>I'm a member of the &lt;A HREF="http://www.skepsis.fi"&gt;Skepsis&lt;/A&gt;, the Finnish equivalent of &lt;A HREF="www.csicop.org"&gt;CSICOP&lt;/A&gt;.  For the last few weeks a new TV channel has aired (or should I call "cabled", or "coppered") the show Bullshit! by Penn and Teller.  Penn's harsh language and Teller's eloquent zero-liners didn't set me back, but two other issues did.  The first is the abundance of ad hominem attacks which in my ears turn against the attacker even if the victim would indeed be a swindling crackpot.  It won't take much to make a spokesperson for an organization (let alone a teen-aged protester) look ignorant, but the same would apply for any organization.&lt;br /&gt;&lt;br /&gt;The second, and far worse problem with the show is that some of Penn's bullshitting is in fact bullshit itself.  So far I've only seen roughly ten episodes, but the episode where they claim there was no scientific evidence that showed second hand smoking was unhealty was simply bullshit.  According to the Robert Todd Carroll, the author of the Skeptic's Dictionary, he as well as Penn and Teller were lead to &lt;A HREF="http://skepdic.com/news/newsletter61.html#4"&gt;trust the standards of risk assessment as promoted by the tobacco industry (led by Philip Morris) and their Republican generals like&lt;/A&gt; &lt;A HREF="http://www.washingtonmonthly.com/features/2004/0405.mooney.html"&gt;Jim Tozzi&lt;/A&gt;.&lt;br /&gt;&lt;br /&gt;Also the episode on the environmental movement had some strong elements of bullshit.  Surely the movement attracts alarmists and many of the movement's claims are bullshit, but the general concern of global warming and its consequences among academics is genuine rather than a tool for another round of research funding.  Carroll states that this episode "is based on these same questionable standards pushed by Republican leaders for their corporate donors whose main interest is the deregulation of industries and products rather than public safety or health. This approach fits well with P&amp;T's libertarian philosophy but it is essentially dishonest and does nothing to promote the view of skepticism as healthy critical thinking."&lt;br /&gt;&lt;br /&gt;In fact this abuse of pseudo-skepticism as an opinion altering tool for industrial deregulation is what I consider the greatest threat to Skepsis and CSICOP.  The crackpots with their occasional legal threats are just bullshit.&lt;br /&gt;&lt;br /&gt;Also the episode on diets threw more bullshit than necessary.  Of course there's an abundance of bullshit in the diet industry, but it seems the late Dr. Atkins did have a positive impact on the science of obesity research after all.  Despite being used to treat some forms of epilepsy since 1920's has no-one ever shown his diet to be unhealty.  Furthermore, numerous recent studies have shown the effect on both weight as well as blood lipids are generally better to the "prudent" generally recommended diets.&lt;br /&gt;&lt;br /&gt;But I can't deny the entertainment value of the show even if the background research seems scarser than Teller's lines.  Maybe some day they'll air a self-critiquing episode - that would likely be the most interesting episode of them all.  Until then, my dear readers, you should regard Penn and Teller's Bullshit! as manure or politics rather than true skepticism or good journalism.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-4904368409192762112?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/4904368409192762112/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=4904368409192762112' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/4904368409192762112'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/4904368409192762112'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2007/03/skepticism-and-bullshit-in-bullshit-by.html' title='Skepticism, and Bullshit in Bullshit by Penn and Teller'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-117019656129438713</id><published>2007-01-31T00:32:00.000+02:00</published><updated>2007-03-17T21:59:10.476+02:00</updated><title type='text'>Estimated and Actual Duration of Projects</title><content type='html'>Software engineers are often accused of missing deadlines and grossly under-estimating the duration of projects.  But how about warfare? Let's take some examples:&lt;br /&gt;&lt;br /&gt;When the first world war broke, German military leaders estimated it would last approximately six months.  But over three years and three months later Kaiser Wilhelm fled the country and Germany capitulated.&lt;br /&gt;&lt;br /&gt;When the largest land war operation in history, operation Barbarossa, started on June 22, 1941, German leaders promised the troops would be home by christmas.  Three years and ten months later Hitler committed suicide in his underground bunker.&lt;br /&gt;&lt;br /&gt;(Actually, some historians claim that the first world war wasn't concluded until the second world war ended - there was merely a 21-year armistice.)&lt;br /&gt;&lt;br /&gt;The Vietnam war ended exactly three decades after Hitler's suicide.  But six years and three months earlier Nixon had promised congressman Donald Riegle (R-MI) that the Vietnam war would end six months after Nixon assumed office.  After all, Nixon had a "secret plan", just like George W. Bush has today.&lt;br /&gt;&lt;br /&gt;And speaking of Bush Junior, it's now three years and eight months since he declared the "mission was accomplished" and "major combat ended".  Since then, the guerillas have fulfilled Bush's request to "bring'em on", resulting in over twenty-folding the U.S. body count.&lt;br /&gt;&lt;br /&gt;So, over-optimistic programmers are just like politicians and military leaders: given their estimate of how long the project will take, if they haven't succeeded in twice that time they will fail and at least six longer until they realize it themselves.&lt;br /&gt;&lt;br /&gt;(To be fair, there have been also realistic estimates in both venues.  Churchill, for example, never promised a quick and pleasant end to the second world war.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-117019656129438713?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/117019656129438713/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=117019656129438713' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/117019656129438713'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/117019656129438713'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2007/01/estimated-and-actual-duration-of.html' title='Estimated and Actual Duration of Projects'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-116734192385201858</id><published>2006-12-28T23:25:00.000+02:00</published><updated>2007-03-12T03:37:01.620+02:00</updated><title type='text'>Better RAIDs</title><content type='html'>Regardless of what the hard disk vendors claim to be the mean time before failure, based on roughly twenty of my most recent disks from several different vendors I'ld claim a more realistic half-life of a normal off-the-shelf PC hard disks is at most four years, or approximately 35000 hours.  Assuming there's one million hard disks in Finland, on average one hard disk dies while I engulf a fluffy Big Mac.  Sorry, I'll try to cut down on junk food.&lt;br /&gt;&lt;br /&gt;I personally replicate my essential data with &lt;A HREF="http://www.cis.upenn.edu/~bcpierce/unison/"&gt;unison&lt;/A&gt; on three hard disks on separate locations, but this occasionally leads to annoying conflicts and even data losses when I forget to promptly synchronize the replacas.  And for bulk data, such as scanned images, digital camera pictures and videos, triplicating data becomes unnecessarily expensive.&lt;br /&gt;&lt;br /&gt;Engineers have found smarter ways to improve the reliability of disk systems by organizing them into redundant arrays of independent disks, &lt;A HREF="http://en.wikipedia.org/wiki/RAID"&gt;RAID&lt;/A&gt;s.  A typical RAID-4 might consist of four disks which contain the data and a fifth disk containing the parity (bitwise exclusive-or) of the other disks.  If any of the disks crash, it can be replaced and its data can be reconstructed from the other disks.  In RAID-5 the role of the parity disk is distributed among all the disks in order to avoid performance bottlenecks in updating the parity information.&lt;br /&gt;&lt;br /&gt;According to theory, when one of the disks die, you hot replace it from a reserve of pristine disks, and maybe in an hour you've reached the original reliability.  So, if individual disks would be replaced by RAIDs of four data and one parity disk, the MTBF would be increased from 35 thousand to over 60 million hours.  Or in other words, the number of permanent data losses in Finland would be reduced to below one every two days.  Sound comforting, but in practice any less than a professional sysadmin might need days or until the next paycheck to buy a new disk, and the MTBF reduces in proportion to how much longer than one hour the replacement drags.&lt;br /&gt;&lt;br /&gt;Suppose we had more than five disks, could we do better?  That's the question I felt compelled to study (motivated also by a possible future hobby project).  A toy example demonstrates a positive answer: ten disks set up as two independent RAID-5s can recover only from those two-disk crashes where the two crashed disks reside in different sub-RAIDs.  There's only a 50% chance of that.  But if we use one of the data disks in &lt;EM&gt;both&lt;/EM&gt; parity disks, then we reach a 73.3% chance of recovering from a two-disk crash.&lt;br /&gt;&lt;br /&gt;Guaranteeing recovery from two simultaneous disk crashes, in other words reaching RAID level 6, is actually harder than one might think. Given one data disk, simple mirroring on two additional disks would do.  Up to four data disks, three parity disks is sufficient.  The parities could, for example, be computed like this:&lt;PRE&gt;  x0 = d0 ^ d1 ^ d3&lt;br /&gt;  x1 = d0 ^ d2 ^ d3 = d1 ^ d2 ^ x0&lt;br /&gt;  x2 = d0 ^ d1 ^ d2 = d1 ^ d3 ^ x1&lt;/PRE&gt;The equations are laid out so that no two disks occur in edgemost expressions in all equations.  I prefer this formulation because it also states the recovery algorithm by repeated solving for only one unknown and substitution to other equations.&lt;br /&gt;&lt;br /&gt;Four parity disks is sufficient to recover any two-disk crash in a system of eleven data disks.&lt;PRE&gt;  x0 = d0 ^ d1 ^ d2 ^ d4 ^ d6 ^ d7 ^ d8&lt;br /&gt;  x1 = d0 ^ d4 ^ d5 ^ d8 ^ d9 ^ d10 ^ x0&lt;br /&gt;  x2 = d0 ^ d2 ^ d3 ^ d6 ^ d9 ^ x0 ^ x1&lt;br /&gt;  x3 = d0 ^ d1 ^ d3 ^ d5 ^ d6 ^ d8 ^ d9&lt;/PRE&gt;This scheme can also recover a three-disk crash with a 84.84% probability and a four-disk crash with a 39.78% probability.&lt;br /&gt;&lt;br /&gt;Five parity disks is sufficient to reover any two-disk crash in a system of twenty disks.&lt;PRE&gt;  x0 = d0 ^ d1 ^ d5 ^ d6 ^ d8 ^ d10 ^ d12 ^ d13 ^ d18 ^ d19&lt;br /&gt;  x1 = d1 ^ d2 ^ d3 ^ d6 ^ d7 ^ d11 ^ d12 ^ d13 ^ d14 ^ d16&lt;br /&gt;  x2 = d0 ^ d2 ^ d4 ^ d6 ^ d8 ^ d9 ^ d13 ^ d14 ^ d15 ^ d19 ^ x1&lt;br /&gt;  x3 = d5 ^ d7 ^ d9 ^ d12 ^ d14 ^ d16 ^ d17 ^ d18 ^ d19 ^ x2&lt;br /&gt;  x4 = d0 ^ d1 ^ d2 ^ d3 ^ d4 ^ d5 ^ d6 ^ d7 ^ d10 ^ d12 ^ d19 ^ x2 ^ x3&lt;br /&gt;&lt;/PRE&gt; A three-disk crash is recovered with a 93.22% and a four-disk crash with a 66.57% probability.  Notice that this scheme reaches the same storage overhead as the typical four data and one parity disk RAID-5, but with a hugely improved reliability - less than one irrecoverable crash every five years in one million such installments - of course making the lofty assumptions the disk losses are independent and not caused by an &lt;A HREF="http://en.wikipedia.org/wiki/Electromagnetic_pulse"&gt;EMP&lt;/A&gt; or a virus.&lt;br /&gt;&lt;br /&gt;For those who need truly high reliability, nested RAIDs are the way to go.  Here one builds a RAID-5 of other RAID-5 elements instead of individual disks.  For example, the twenty data disks are placed in four RAID-5s, and a fifth RAID-5 (of five data and one parity disk) is allocated to serve as a parity for those sub-RAIDs.  Yes, it is reliable: it can handle all three-disk crashes and 99.45% of all four disk crashes.  This happens at the expense of disk space: for twenty disks containing user data such a nested RAID consumes ten disks for redundancy.&lt;br /&gt;&lt;br /&gt;But also this nested RAID can be improved.  The set of parity equations below almost doubles the reliability, in particular now 99.72% of four-disk crashes can be recovered.&lt;PRE&gt;  x0 = d0 ^ d1 ^ d3 ^ d6 ^ d11 ^ d15 ^ d18 ^ d19&lt;br /&gt;  x1 = d7 ^ d8 ^ d9 ^ d13 ^ d18 ^ x0&lt;br /&gt;  x2 = d2 ^ d6 ^ d9 ^ d11 ^ d14 ^ d15 ^ d16 ^ d17&lt;br /&gt;  x3 = d0 ^ d3 ^ d4 ^ d7 ^ d12 ^ d15 ^ d17 ^ d18&lt;br /&gt;  x4 = d1 ^ d3 ^ d8 ^ d10 ^ d15 ^ d16 ^ d17 ^ d18 ^ x1&lt;br /&gt;  x5 = d0 ^ d5 ^ d6 ^ d12 ^ d13 ^ d14 ^ d15 ^ d16 ^ d19 ^ x0&lt;br /&gt;  x6 = d6 ^ d7 ^ d9 ^ d10 ^ d15 ^ x2&lt;br /&gt;  x7 = d3 ^ d4 ^ d5 ^ d8 ^ d9 ^ d18 ^ x6&lt;br /&gt;  x8 = d0 ^ d3 ^ d4 ^ d7 ^ d13 ^ d16 ^ d17 ^ x2 ^ x5&lt;br /&gt;  x9 = d4 ^ d6 ^ d7 ^ d12 ^ d18 ^ d19 ^ x5 ^ x7&lt;/PRE&gt;Alternatively, one could add three data disks and still have a higher reliability than with the smaller nested RAID scheme:&lt;PRE&gt;  x0 = d2 ^ d4 ^ d5 ^ d9 ^ d10 ^ d16 ^ d18 ^ d20 ^ d21 ^ d22&lt;br /&gt;  x1 = d0 ^ d1 ^ d2 ^ d3 ^ d5 ^ d6 ^ d9 ^ d10 ^ d13 ^ d17 ^ d20&lt;br /&gt;  x2 = d1 ^ d3 ^ d5 ^ d7 ^ d8 ^ d15 ^ d17 ^ x0&lt;br /&gt;  x3 = d0 ^ d10 ^ d11 ^ d13 ^ d15 ^ d16 ^ d17 ^ d19 ^ x0&lt;br /&gt;  x4 = d0 ^ d2 ^ d5 ^ d8 ^ d11 ^ d12 ^ d16 ^ d21 ^ d22&lt;br /&gt;  x5 = d0 ^ d1 ^ d2 ^ d6 ^ d7 ^ d8 ^ d12 ^ d14 ^ d18 ^ d19 ^ d21 ^ x0&lt;br /&gt;  x6 = d4 ^ d5 ^ d6 ^ d7 ^ d12 ^ d17 ^ d20 ^ x2 ^ x5&lt;br /&gt;  x7 = d0 ^ d3 ^ d4 ^ d5 ^ d6 ^ d8 ^ d9 ^ d11 ^ d13 ^ d18 ^ d20 ^ d22 ^ x6&lt;br /&gt;  x8 = d3 ^ d8 ^ d9 ^ d14 ^ d16 ^ d18 ^ d20 ^ d22 ^ x0 ^ x4 ^ x6&lt;br /&gt;  x9 = d0 ^ d1 ^ d16 ^ d17 ^ d20 ^ d21 ^ x5&lt;/PRE&gt;&lt;br /&gt;&lt;br /&gt;To generalize, scientists have developed numerous other kinds of Forward Error Correcting codes.  The &lt;A HREF="http://en.wikipedia.org/wiki/Reed-Solomon"&gt;Reed-Solomon&lt;/A&gt; is probably the most widely known with uses ranging from CDs and xDSL-lines to communicating with space probes.  Reed-Solomon becomes computationally expensive for a single large disk block, but we could employ it "horizontally" so that the i'th s-bits of each disk block form a message which is then appended with Reed-Solomon parity information.  Because all errors will be erasures in our case, we can expect to recover from as many disk crashes as we have redundant disks.  Despite this advantage, recomputation costs of an incremental update, overhead caused by shortening or rounding s upwards, and perhaps also complexity has kept Reed-Solomon codes out of RAIDs.  The only work I know of is &lt;A HREF="http://www.cs.utk.edu/%7Eplank/plank/papers/SPE-9-97.html"&gt;this&lt;/A&gt;.&lt;br /&gt;&lt;br /&gt;What I've done in this blog entry is essentially equivalent to the basic ideas of &lt;A HREF="http://en.wikipedia.org/wiki/Low-density_parity-check_code"&gt;LDPC-codes&lt;/A&gt; invented by &lt;A HREF="http://en.wikipedia.org/wiki/Robert_G._Gallager"&gt;Gallager&lt;/A&gt; in the early sixties, but with the exception that I employ a genetic algorithm to find the excellent encodings.  Current research focus seems to be in developing codes for media delivery where the number of blocks can be tens or hundreds of thousands, and consequently the choice of the equations can be random.  Unfortunately the domain is patent-ridden, but the &lt;A HREF="http://planete-bcast.inrialpes.fr/rubrique.php3?id_rubrique=5"&gt;LDGM&lt;/A&gt; approach seems to be free.&lt;br /&gt;&lt;br /&gt;Oh, my hobby project?  I won't reveal that yet as I haven't really decided whether I'll pursue it.  But suffice it to say, that if all Finns would share 1% of their disk space for parity information with each other and recovering a disk would take one hour, we would have to wait over 10^750000 years for the next loss of data.&lt;br /&gt;&lt;br /&gt;(Unfortunately all the assumptions in that calculation can't be taken into practice without significantly reducing the glamour of the result.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-116734192385201858?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/116734192385201858/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=116734192385201858' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/116734192385201858'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/116734192385201858'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/12/better-raids.html' title='Better RAIDs'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-116424049334521285</id><published>2006-11-23T02:04:00.000+02:00</published><updated>2007-03-08T11:46:23.673+02:00</updated><title type='text'>Speed Limits and Economy</title><content type='html'>I recently participated in a debate about speed limits.  Even if none of the participants were really heavy-duty-speeders, most of them nevertheless strongly opposed to current rather low speed limits (30 or 40 km/h on small streets, 50 on main streets in Helsinki), arguing, for example, it would hamper the national economy.  Some debaters even argued that people should be allowed to drive their cars at any speed &lt;I&gt;the driver&lt;/I&gt; consider safe.  To me, a father of a child with whom I walk between home and day care over seven zebra crossings twice every day, this simply disgusted me.&lt;br /&gt;&lt;br /&gt;So, let's make some hyper-optimistic assumptions: a driver drives, say, three kilometers every day in an area where speed limits have been reduced from 40 to 30 km/h, how much will it cost the state?  Nothing because most likely this wouldn't increase salaries and consequently tax revenues, but theoretically it could cost the driver up to 1.5 minutes every day, or assuming we count only working days, some 330 minutes during the year.  That's five and a half hours of taxable work time.  Assuming a relatively well-paid (4000 e/month) driver that would be below 150 euros per year, or below 15 million euros for the approximately 100 000 cars suffering from the reduced speed limit.&lt;br /&gt;&lt;br /&gt;Of course that's only theory.  In practice driving speed is usually limited by other cars, traffic lights, and taking corners.  In fact statistics show that average driving speed dropped only by 1.5 km/h, cutting the cost to yearly drivers to only two million euros.&lt;br /&gt;&lt;br /&gt;Ok, even two million euros is money, and spending it has to be justified.  So let's look at the issue from the victim's point.&lt;br /&gt;&lt;br /&gt;If a car hits a person, the probability of death is 5%, 15% or 40% for speeds 30, 40, and 50 km/h, respectively.  The severity of non-fatal injuries and likelihood of a hit is also reduced.&lt;br /&gt;&lt;br /&gt;What then is the cost of a death to the society?  If the victim is a 30-year old adult, the state loses all its investments in his education and future tax revenues.  A typical university degree costs the state approximately a quarter of a million euros, the basic education roughly equally much, and missed taxes 20000 euros per year until the victims retirement.  So far the loss is one million euros, and we haven't yet accounted for the economic support for the grieving family.  Even if statistics are still scarce, the reduction in driving speeds did have the desired effect: in 1999 to 2003 on average seven pedestrians were killed each year, but in 2005 only two.  &lt;br /&gt;&lt;br /&gt;The five million euro saving in the reduced deaths already counters for the drivers' lost leasure time, but we haven't yet accounted for non-fatal injuries.  Unfortunately I don't have good data on them, but let it be noted that during 1999-2003 approximately 150 non-fatal injuries from car-pedestrian accidents were reported each year.  A large fraction of them were probably tissue damages and bone fractures that maybe required only a few surgeries and some months' time to recover, but some of them were also tetraplegics whose life-time medical bills to the society easily reach a million - in addition to the one million in their abrupted careers.&lt;br /&gt;&lt;br /&gt;In this treatment I've deliberately concentrated on the economic justification of current reduced speed limits, and that is quite strong.  Injuries will cause physical pain, death and even moreso paralysis will cause emotional losses I can't formulate in numbers.  I'll only say that it definitely is &lt;I&gt;not&lt;/I&gt; the drivers'  prerogative to define the level of risk his potential victims should to accept.&lt;br /&gt;&lt;br /&gt;We, and the society we live in, pay heavily for speed.  And I haven't yet treated all aspects of speed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-116424049334521285?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/116424049334521285/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=116424049334521285' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/116424049334521285'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/116424049334521285'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/11/speed-limits-and-economy.html' title='Speed Limits and Economy'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-116367317996501201</id><published>2006-11-16T12:26:00.000+02:00</published><updated>2007-03-17T23:12:26.593+02:00</updated><title type='text'>Anti-Fibonacci Sequences and Rings of Saturn</title><content type='html'>Assume a process which repeatedly marks all of positive integers which belong to a &lt;a href="http://en.wikipedia.org/wiki/Fibonacci_sequence"&gt;Fibonacci sequence&lt;/a&gt; which start with the lowest unmarked x and x+c.  For example, if c is one, the first Fibonacci sequence marks 1, 2, 3, 5, 8, 13, 21, etc.  The process would start the second Fibonacci sequence at 6 and 7, marking thereafter 13, 20, 33, ..., the third one would mark 9, 10, 19, 29, etc.  The unmarked numbers 4, 18, 22, 28, 32, 47, 54, 72, etc. are then what I call the anti-Fibonacci numbers for the given c = 1.  For c = 2 the anti-Fibonacci numbers would be 5, 8, 9, 24, 34, 38, 45, 50, etc.&lt;br /&gt;&lt;br /&gt;One might assume that anti-Fibonacci number generation is a sufficiently complicated process to be essentially pseudo-random.  My back-of-the-envelope calculation went as follows: Let p(i) be the probability that i is anti-Fibonacci.  A single existing Fibonacci sequence grows sparser with a rate of phi/i where phi is the base of the golden ratio (sqrt(5) + 1) / 2 with which the Fibonacci values grow, but with a probability of p(i)^2 the i and i+c starts a new Fibonacci sequence and p(i) gets a nudge of 2/i upwards.  Putting these in equilibrium and solving for p gives sqrt(sqrt(5) + 1) / 2, or roughly 0.899.  I knew I had made some lofty approximations in the derivation above, especially in assuming that the density of multiple Fibonacci sequences would decrease as rapidly as that of one sequence, so I wrote a little program that computes some 100 million first anti-Fibonacci numbers.  I was somewhat surprized to see that p is only approximately 0.856 for c = 1.  For c = 2 the program gave 0.872, which as again surprizing because I didn't expect c to affect p.  But when my computer spat out what looks like interference patterns my jaw really dropped.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://hibase.cs.hut.fi/~cessu/fibinterference.png" /&gt;&lt;br /&gt;&lt;br /&gt;Having seen the non-randomness of the average of p over all i and observed how it depends on c, a change in coordinates lead me to wonder if conversely the average of p over c's depends on i.  The answer is yes.  Below I have a sample of p(i) estimated by c up to ten thousand.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://hibase.cs.hut.fi/~cessu/fibdensity.png" /&gt;&lt;br /&gt;&lt;br /&gt;The moral of the story?  Whenever playing with number theory, staple your jaw so it won't keep falling off constantly.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-116367317996501201?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/116367317996501201/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=116367317996501201' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/116367317996501201'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/116367317996501201'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/11/anti-fibonacci-sequences-and-rings-of.html' title='Anti-Fibonacci Sequences and Rings of Saturn'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-116248112299843674</id><published>2006-11-02T17:24:00.000+02:00</published><updated>2007-02-27T19:42:54.040+02:00</updated><title type='text'>Price of a Parking Place</title><content type='html'>I recently overheard a powerful curse of how costly parking is in the center of Helsinki.  At its highest normal street parking costs three euros per hour, but evenings, nights and weekends are free.  So, should a parking place be in 100% utilization, and should everybody really pay their parking, it would give a revenue of a little above 700 euros per month, or 50 to 60 euros per square meter per month.  This sounds high in comparison to the average appartment rents, typically well over 15 euros per square meter per month, in the same area.  But appartments are typically stacked six to nine floors on top of each other, so consequently one can gather up to twice the amount in rent than in street parking for each unit of ground area.  The cost of heating, maintenance and repairs is perhaps a fifth of the rent.  The cost of construction is harder to estimate, but in the long run - we live in an eighty year old building - it too becomes relatively small.  So, in summary, renting a parking place doesn't really seem costlier than renting an appartment.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-116248112299843674?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/116248112299843674/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=116248112299843674' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/116248112299843674'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/116248112299843674'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/11/price-of-parking-place.html' title='Price of a Parking Place'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-116248106842360083</id><published>2006-11-02T17:22:00.000+02:00</published><updated>2006-11-02T17:24:28.443+02:00</updated><title type='text'>Bad Hearing</title><content type='html'>A student with a heavy flu asked what a &lt;a href="http://en.wikipedia.org/wiki/Monads_in_functional_programming"&gt;monad&lt;/a&gt; is.  Hampered by my hearing disability and absentmindedness of the context of the discussion, I politely answered "the &lt;a href="http://en.wikipedia.org/wiki/Gonad"&gt;glands&lt;/a&gt; that produce reproductive cells".  Yes, humiliation is good for the soul, especially if it is followed by a mutual vigorous &lt;a href="http://en.wikipedia.org/wiki/Laugh"&gt;respiratory exercise&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-116248106842360083?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/116248106842360083/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=116248106842360083' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/116248106842360083'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/116248106842360083'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/11/bad-hearing.html' title='Bad Hearing'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-116007300430292535</id><published>2006-10-05T21:28:00.000+03:00</published><updated>2006-10-05T21:30:04.326+03:00</updated><title type='text'>Similarity of Texts</title><content type='html'>I know of some computational text comparison attempts, such as using word length histograms to reveal the real author of a pseudonym, and some &lt;A HREF="http://www.cis.hut.fi/research/som-research/"&gt;Self-Organizing Map&lt;/A&gt; attempts to cluster for example word histograms of patent applications.  &lt;br /&gt;&lt;br /&gt;Could these attempts be improved upon?  &lt;A HREF="http://www.cs.fit.edu/~mmahoney/poster.ps.Z"&gt;Arguably&lt;/A&gt; text compression poses and compressors have to solve a very strongly text-characterizing problem: predicting the next alphabet.  So, I decided to try compression in quantifying the similarity of two natural languages.  The working hypothesis is that mixed texts written in two similar languages should compress better than texts mixed of two very different languages.  I gathered a number of bible versions and translations, mixed them pairwise verse by verse, compressed the results with &lt;A HREF="http://www.cs.fit.edu/~mmahoney/compression/#paq8"&gt;one of the best dictionary-free compression programs&lt;/A&gt; I could find, compared the compression ratios to unmixed bible versions, and visualized the result on a plane.  I've had to adjust some points ever so slightly in order to avoid the explanatory texts being placed on top of each other, but otherwise this is the verbatim output of the program.  Clusters of similar languages or different versions of the same language are quite evident.&lt;br /&gt;&lt;br /&gt;&lt;IMG SRC="http://hibase.cs.hut.fi/~cessu/bibles.gif"&gt;&lt;br /&gt;&lt;br /&gt;The limitations of only two dimensions is quite evident - some languages such as latin have influenced and would compress well when mixed with a number of other later languages even if those languages are very different.  Perhaps adding a third dimension, or some kind of wormholes or teleports would help.  SOM representation avoid this problem by having multiple nodes (or disconnected areas) represent the same item.  I leave this as a later research topic, though.&lt;br /&gt;&lt;br /&gt;Another limitation not evident in the picture above is that mixing and compressing can only be applied to languages that use similar alphabets.  Otherwise typical context modelling techniques in compressors will separate the two languages essentially into two separate input streams.&lt;br /&gt;&lt;br /&gt;In this example I mixed and compressed two versions or translations of the same text, but what I would really like to do is to mix and compress texts written in the same language but by different authors. Suggestions for good corpora are welcome.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-116007300430292535?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/116007300430292535/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=116007300430292535' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/116007300430292535'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/116007300430292535'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/10/similarity-of-texts.html' title='Similarity of Texts'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-115671095819209134</id><published>2006-08-27T23:32:00.000+03:00</published><updated>2007-02-23T23:26:06.496+02:00</updated><title type='text'>Oatmeal vs Eggs, or the Volumetric Myth</title><content type='html'>Eager proponents of oatmeal argue that eating it for breakfast keeps the hunger away well into afternoon.  I have always been skeptical about this, because the same effect could also be caused by one's habitual daily eating rhythm - if you choose to eat a late lunch for a few days, then the body (brains, that is) adapts to this and you won't get hungry until afternoon regardless of what you ate in the morning.  Typical food satiation tests will not discern this, as they mostly follow how much a person eats before feeling satiated, not so much how long he remains satiated or what their eating rhythm is.&lt;br /&gt;&lt;br /&gt;So I decided to do what &lt;A HREF="http://en.wikipedia.org/wiki/Morgan_Spurlock"&gt;Morgan Spurlock&lt;/A&gt; tought us: eat nothing but oatmeal for thirty days.  Well, in my case I settled for three days.  Except for allowing my caffeine-addicted metabolism two hefty cups of coffee with a little cream each day, my diet will consist of water and a portion of oatmeal made in water whenever I feel hungry.  Not even butter, sugar, berries, jam, or anything of caloric value in the oatmeal.  I allow only sugar-free spices, chewing gum, and vitamin pills to compensate for the one-sided diet.&lt;br /&gt;&lt;br /&gt;Day 1: I'm not accustomed to having anything but coffee for breakfast, so I don't get hungry until 9 am.  Betaglucan, here I come!  After a plateful of microware-prepared porridge I feel satiated and pieceful, but suddenly at 10 am I'm hungry again.  A lingering hunger disturbs my work and makes me take subsequent portions at 11 am, 1 pm, 2:30 pm, 5 pm, 8 pm, 8:30 pm, 10:30 pm, and 12 pm.  It is really annoying to merely burp after dining and feel ready to eat again.  Fortunately I have learned to spice up the oatmeal with for example sugarfree juice concentrates.&lt;br /&gt;&lt;br /&gt;Day 2: Portions at 9:30 am, 10:30 am, 1 pm, 3 pm, 5:30 pm, 8 pm, 9:30 pm, 12 pm.  I have tried various ground spices, such as peppers, cinnamon, and bitter orange.  I also feel a little constipated.  This is a little surprizing considering the amount of oh-so-healthy fibers I'm eating.  Perhaps constipation is more a function of a change in diet instead of the diet itself.&lt;br /&gt;&lt;br /&gt;Day 3: The constipation eases.  I shall save you descriptions of my stool, let it only be noted that I felt very bureaucratic as the paperwork lasted much longer than production.  Portions at 9 am, 11 am, 12:30 pm, 3:30 pm, 5 pm, 6:30 pm, 8 pm, 11 pm, and after a late night hack, at 2 am.  I fall asleep counting mutton steaks.&lt;br /&gt;&lt;br /&gt;Instead of eating properly and become satisfied for a good while, eating has turned into constant slow nibbling, just like one would eat chips or pop-corn in a movie.  Even if I had set as a standard that I would never feel hungry, I did so on several occasions during the experiment.  Of course I could have willed the hunger away, but that would have defeated the purpose of the experiment.  After an average of nine portions of oatmeal per day, I can safely conclude that the myth that oatmeal keeps hunger away is definitely busted.&lt;br /&gt;&lt;br /&gt;But for comparison I felt compelled to repeat the experiment with eggs.  Two eggs contains almost the same amount of energy as a portion of oatmeal, but in a rich combination of proteins, fats, vitamins, and other micronutrients.  The high cholesterol content of the egg yolk does not worry me - contrary to some nutritional urban myths, science has long known that dietary intake of cholesterol has little if any effect on the total blood cholesterol level.  After a purge period of four days the second part of my experiment begins.&lt;br /&gt;&lt;br /&gt;Day 1: One cooked egg at 9:30 am, two at 11 am, one at 1 pm, two at 3 pm, three at 5:30 pm, four at 8 pm.  Felt very satiated.  At 11:30 pm I poached two eggs with vinegar, spices, and a little artificial sweetener to give a slight sweet-and-sour taste - quite delicious!&lt;br /&gt;&lt;br /&gt;Day 2: Three cooked eggs at 11 am, two at 3 pm, two at 5:30 pm, two at 8 pm, and two at 11:30 pm.  I notice my susceptibility of hunger is declining, probably because my body is moving to &lt;A HREF="http://en.wikipedia.org/wiki/Ketosis"&gt;ketosis&lt;/A&gt; where at least I begin to lose appetite.&lt;br /&gt;&lt;br /&gt;Day 3: Three cooked eggs at 10:30 am and two at 2 pm.  A sequence of meetings prevented to prompty quench my hunger until devouring four eggs at 6:30 pm.  At midnight I concluded the day with two eggs deviled with hot spices and a teaspoon of cream fraiche.  Ok, the cream fraiche was cheating, but insignificant in the sense of total energy value.&lt;br /&gt;&lt;br /&gt;All in all I ate 12 and a third egg per day, and a nod of cream fraiche.  Compared to the nine portions of oatmeal per day, I ate only 70% of the calories on the egg diet.&lt;br /&gt;&lt;br /&gt;Conclusion: Volumetrics, the assumption that a person consumes less calories if he eats foods with low enery content per weight or volume, doesn't work, at least not for me on oatmeal versus eggs.  Cement might work, but that I'm not going to try without the helping hand of a bunch of anarchovegans.&lt;br /&gt;&lt;br /&gt;Note that I'm not saying oatmeal wouldn't be a healthy way to start a day - surely healthier than a slice of white toast and marmalade - nor am I suggesting anyone should eat eggs like Cool Hand Luke.  This inhumane experiment was conducted in such extreme only to test the naive idea of volumetrics.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-115671095819209134?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/115671095819209134/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=115671095819209134' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/115671095819209134'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/115671095819209134'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/08/oatmeal-vs-eggs-or-volumetric-myth.html' title='Oatmeal vs Eggs, or the Volumetric Myth'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>15</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-115593293129021417</id><published>2006-08-18T23:24:00.000+03:00</published><updated>2006-12-07T20:11:49.836+02:00</updated><title type='text'>Buddhabrot Aerobics</title><content type='html'>I'm suffering from an incurable illness called nomipsnofunia, a serious disability to have fun without involving a lot of heavy computations.  For example, when I read about &lt;A HREF="http://en.wikipedia.org/wiki/Buddhabrot"&gt;Buddhabrot&lt;/A&gt;s I compulsively needed to compute ones myself too.  But I also decided to study what happens when we generalize the magic constant "2" in the Mandelbrot iteration formula z=z^2+c.  In the animated gif below, which I baptized to Aerobics Buddhabrot, the exponent rotates around a circle with center at 2 and radius of 0.2 on the complex plane.&lt;br /&gt;&lt;br /&gt;&lt;IMG SRC="http://hibase.cs.hut.fi/~cessu/buddhadance.gif"&gt;&lt;br /&gt;&lt;br /&gt;I also computed a three-minute PAL-resolution mpg &lt;A HREF="http://hibase.cs.hut.fi/~cessu/buddhavomit.avi"&gt;movie&lt;/A&gt; (17 MB of MPEG-4 encoded video in AVI) where the exponent rotates around a cirle of radius 2 centered at the origo.  A colleague described the visual effect as "the Buddha throwing his guts out, rotating them around him through infinity, and assembling himself back again on the other side."  I employed some thirty workstations to compute the frames during the last few weeks.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-115593293129021417?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/115593293129021417/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=115593293129021417' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/115593293129021417'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/115593293129021417'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/08/buddhabrot-aerobics.html' title='Buddhabrot Aerobics'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-115434511331025764</id><published>2006-07-31T14:19:00.000+03:00</published><updated>2007-03-05T00:23:39.386+02:00</updated><title type='text'>Polyfilla and Old Painters</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://hibase.cs.hut.fi/~cessu/polyfilla.gif"&gt;&lt;img style="float:right; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 127px;" src="http://hibase.cs.hut.fi/~cessu/polyfilla.gif" border="0" alt="" /&gt;&lt;/a&gt;If I had to name a single favourite construction material, it would probably be Polyfilla, a plaster for filling up holes and cracks in walls and ceilings.  With regular gypsum powder it would be difficult to add exactly the amount of water which would make the paste stiff enough to stay in the crack, and after a few minutes all the paste you mixed would harden nearly instantaneously leaving you very little working time.  Furthermore, gypsum shrinks when drying and thereby forms new smaller cracks which you will have to plaster up a second or  third time.  But then someone, apparently at &lt;A HREF="http://www.polycell.co.uk/"&gt;Polycell UK&lt;/A&gt;, thought of adding cellulose to gypsum powder.  Mixed with water the resulting paste is very pleasurable to handle, it hardens less dramatically and shrinks less.  Where I have needed a little harder finish, for example in an outward corner, I've successfully added small amounts of cement.&lt;br /&gt;&lt;br /&gt;Polyfilla is nowadays a worldwide brand, but recently I learned that old painters have used the basic idea for centuries: they cooked oat meal in water and mixed the resulting thin porridge instead of clear water to the gypsum powder the next day.  Instead of oat they might also use other sources of polymeric carbohydrates, such as mashed potatoes, mouldy bread, old newspapers, etc.  Supposedly a pair of old underwear has been spread in the walls of an architecturally renowned house in central Helsinki.&lt;br /&gt;&lt;br /&gt;If slightly cheaper and tougher finish was required, the gypsum powder would be mixed with coke slag.  This material, colloquially called "kananpaska" (chicken shit) or "karhunpaska" (bear shit) was the predominant material for building light walls in Helsinki: a mold would be erected on one side of the future wall, the wall would be constructed layer-by-layer by throwing the gypsum-ash paste on the mold or previous layer, the mold would be removed, and a finishing ashless layer would be plastered on both sides of the wall.  The demeaning name probably follows from the messiness of the substance when demolishing the wall: the coke slag produces extreme amounts of slowly setting dust.  When local coal heaters was replaced by district heating during the fifties and sixties this construction material has fortunately lost popularity.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-115434511331025764?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/115434511331025764/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=115434511331025764' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/115434511331025764'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/115434511331025764'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/07/polyfilla-and-old-painters.html' title='Polyfilla and Old Painters'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-115266294694071770</id><published>2006-07-12T03:07:00.000+03:00</published><updated>2007-03-14T08:49:21.953+02:00</updated><title type='text'>Football World Cup Finals</title><content type='html'>I've been told the football world cup finals (FWCF) are in progression in Germany at the moment.  Haven't followed them as I'm less interested in watching football played by others than the one I play myself with Konami's Pro Evolution Soccer 4 on my PS2.&lt;br /&gt;&lt;br /&gt;The FWCF to some degree relates to an interesting algorithmic problem: the selection problem in which an algorithm must select the i'th largest element of n values with as few pairwise comparisons as possible.  Author, mathematician, and reverend &lt;A HREF="http://en.wikipedia.org/wiki/Lewis_Carroll"&gt;Lewis Carroll&lt;/A&gt;, officially Charles Dodgson, became puzzled over the same problem while following tennis tournaments.  Barring luck (randomness in individual game outcomes), the common knockout tournament is obviously the best tournament system for selecting the winner.  I've previously studied &lt;A HREF="http://www.cs.hut.fi/~cessu/selection/"&gt;some specific selection problems&lt;/A&gt;, but never ones which involve unreliable comparisons.&lt;br /&gt;&lt;br /&gt;The FWCF employs first a stage of eight parallel &lt;A HREF="http://en.wikipedia.org/wiki/Round-robin_tournament"&gt;round-robin tournaments&lt;/A&gt;, each containing four teams.  The two best teams of each group proceed to a knock-out tournament.  The winning team has played seven games, and it is possible that two teams play against each other twice.&lt;br /&gt;&lt;br /&gt;The &lt;A HREF="http://en.wikipedia.org/wiki/Swiss_system_tournament"&gt;Swiss tournament system&lt;/A&gt; was introduced in chess tournaments in the late 18'th century.  As in round-robin tournaments, each player/team accumulates a score based on its wins, draws and losses.  On each round players with equal scores are pitted against each other, but with additional rules that prevent same players meeting again.  The Swiss tournament has nice advantages, such as rapid convergence and high entertainment value to players, but your average football fan might find it perplexing and, more importantly, the tournament doesn't necessarily end in a final climax.&lt;br /&gt;&lt;br /&gt;Naturally I wouldn't be writing this blog entry unless I would be suggesting some wise-ass tournament schemes of my own.  But first, as a basis for estimating the tournament systems I choose the probability of a win for player A against player B to follow the &lt;A HREF="http://en.wikipedia.org/wiki/Logistic_curve"&gt;logistic curve&lt;/A&gt;, 1/(1+c^(a-b)), where c I choose to be 1.4, and a and b are the positions of players A and B if all players were ordered "correctly". Hence, the best player wins the second best player with probability 1/(1+1.4^(1-2)) = 0.5833... and the tenth best player with probability 0.9538...&lt;br /&gt;&lt;br /&gt;A full round-robin tournament among all thirty-two FWCF finalists sounds fair albeit somewhat lengthy, lasting at least three months. Simulating such a round-robin tournament one million times on a computer with the above game outcome probabilities shows that the best player would be awarded the gold medal only with approximately a 47.5% probability, silver with 26.7% probability, and bronze with 14.5% probability.  Adding a second round increases the likelihood of correct winner to approximately 58%.&lt;br /&gt;&lt;br /&gt;The standard knockout tournament fares even worse: the correct winner is chosen only with approximately 38.5% probability.  But there is a simple remedy to improve it: playing the final best-of-three improves the best player's chances to 42.3%, playing also the semifinals best-of-three yields 45.2%.  The next improvement would be to play the final best-of-five, resulting in almost 47.6% chances of finding the best player, but such a lengthy final may already lead to loss of concentration in the audience and tolerance in the native inhabitants of the hosting cities.  Furthermore, any best-of-N system implies uncertainty of when the final decisive game will be played, which in turn may be troublesome to broadcasting companies and organizers who would like to schedule everything well in advance.&lt;br /&gt;&lt;br /&gt;The FWCF system is not particularly good, only slightly more reliable than the basic knockout tournament and it requires seven rounds of games - two more than the straight-forward knockout tournament. Exchanging the order of group and knockout stage would make the tournament more reliable, but that wouldn't ensure all teams have at least three occasions to play and that there's a final climax at a predetermined date.&lt;br /&gt;&lt;br /&gt;For my tournament system proposal I chose the simple &lt;A HREF="http://en.wikipedia.org/wiki/Sorting_network"&gt;sorting network&lt;/A&gt; as the basic computational device.  I decided to add an eighth round of games, and as the comparator function I use not only the result of the played game but the number of wins accumulated in all played games so far, with emphasis on later wins.  Only the final round is played as earlier - winners of those games are honored with gold and bronze regardless of their game history - thus ensuring the final climax.  I wrote an elementary genetic algorithm and a fitness function which tries to maximize the likelihood of also the dimmer medals for the best player and avoid same players meeting twice.  A nightful of CPU cycles later I obtained the following network which I have post-processed and visualized:&lt;br /&gt;&lt;br /&gt;&lt;IMG SRC="http://hibase.cs.hut.fi/~cessu/tour.jpg"&gt;&lt;br /&gt;&lt;br /&gt;The above tournament system would award gold to the best player with a 44.6% probability.  There are duplicate games, such as players 1 and 5 meeting on the semifinal and quarter-final round, but this was an elective trade-off.  On average slightly over five games are duplicates.  The role of maintaining a score sheet of wins is interesting, because if the comparator function replaced with the traditional - winner moves to the upper line - the tournament still produces the best winner with 42.5% probability.  The most significant drawback of the above tournament is that the chance of winning is not entirely independent of which line the player is initially placed on.  On the other hand, the best player is better off in my tournament than in the current FWCF regardless of which line he is placed on.&lt;br /&gt;&lt;br /&gt;There's one remaining question: how to deal with draws.  FWCF employs extra time and penalty shootouts, the latter of which undoubtably brings an additional element of randomness to the results.  The tournament system I proposed above might allow draws in some places, but if draws are easy to achieve, the player which considers a draw more beneficial may adjust his playing style accordingly.  As this is generally considered dull for spectators I propose a few solutions amenable for football: instead of the penalty shootout, the winning team is the one granted more corner kicks when the ball was passed outside by the defending goal keeper, but not field player.  Should also that statistic be drawn, during the first three and the last rounds ties are broken by penalty kicks, during the latter half a variant of the &lt;A HREF="http://en.wikipedia.org/wiki/Away_goals_rule"&gt;away goals rule&lt;/A&gt; is used: a zero-zero in goals is considered a win for the team on the lower line, otherwise the team on the upper line wins.&lt;br /&gt;&lt;br /&gt;Should a FWCF follow my suggestions some day (yeah, right, chances are even worse than Finland winning the Eurovision Song Contest), I promise I'll buy a ticket and eat it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-115266294694071770?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/115266294694071770/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=115266294694071770' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/115266294694071770'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/115266294694071770'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/07/football-world-cup-finals.html' title='Football World Cup Finals'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-114850916448233256</id><published>2006-05-25T01:17:00.000+03:00</published><updated>2006-05-25T01:22:18.303+03:00</updated><title type='text'>The Eurovision Song Contest and Geographics</title><content type='html'>In general I don't plan to comment on daily topics and with the exception of &lt;A HREF="http://www.apocalyptica.com/home/"&gt;Apocalyptica&lt;/A&gt; I don't particularly like modern heavy music.  But absent-mindedly following the &lt;A HREF="http://www.eurovision.tv/english/index.htm"&gt;Eurovision Song Context&lt;/A&gt; last weekend, I couldn't but help noticing that many neighboring countries did vote each other.  This has been noticeable in past years, and this year, when citizens actually voted, I explained this to myself that immigrants and minorities vote to support their ethnic background.&lt;br /&gt;&lt;br /&gt;So, up with the calculator, or rather the computer.  I wrote a program which tries to place the countries on a 2D plane so that countries that voted heavily on each other were placed more adjacently than other countries, and vice versa.  The layout tries to minimize the sum of squares of differences between the countries on the layout and the number of mutually awarded points.  Final points weigh twice the semifinal points.&lt;br /&gt;&lt;br /&gt;&lt;IMG SRC="http://hibase.cs.hut.fi/~cessu/esc.jpg"&gt;&lt;br /&gt;&lt;br /&gt;Yes, clearly one can observe several clusters of neighbor countries, but the planarity constraint makes it impossible to bring all voters of successful countries sufficiently close.&lt;br /&gt;&lt;br /&gt;Oh, did I have an opinion of the &lt;A HREF="http://www.lordi.org"&gt;winner&lt;/A&gt;?  Not seriously, except that IMHO it was rather nice to see latex on the outside for once win silicone on the inside.  And in general, diversity was really needed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-114850916448233256?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/114850916448233256/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=114850916448233256' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114850916448233256'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114850916448233256'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/05/eurovision-song-contest-and.html' title='The Eurovision Song Contest and Geographics'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-114824450515460481</id><published>2006-05-21T23:45:00.000+03:00</published><updated>2006-05-21T23:48:25.173+03:00</updated><title type='text'>Different Kind of Hacking</title><content type='html'>I hack quite a bit of code, both for fun and funding, but this weekend I hacked up a zero-budget &lt;A HREF="http://hibase.cs.hut.fi/~cessu/fisheye-video-1.avi"&gt;video&lt;/A&gt; demonstrating the use of orthogonal &lt;A HREF="http://www.cs.hut.fi/~cessu/fisheyex11/"&gt;fisheye transformation&lt;/A&gt; as a surrogate for a big screen on handhelds. Fortunately there was some programming involved, but mostly my time went to battling against the heat strokes of my computer and ADSL modem, a malicious sound card, various video formats, and the lack of satisfactory FOSS video editing software.  Oh, and since didn't have a microphone I had to use my digital camera's dictaphone feature.  But that caused a lot noise and clicks I have to edit out some day.&lt;br /&gt;&lt;br /&gt;In away all this was fun, and should I have a lot of extra time it would be fun to really delve into this subject, study video formats, more than basic computer graphics, solid modelling, &lt;A HREF="http://www.blender.org"&gt;blender&lt;/A&gt;, and make these tools as easily approachable to layman as a pen for writing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-114824450515460481?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/114824450515460481/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=114824450515460481' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114824450515460481'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114824450515460481'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/05/different-kind-of-hacking.html' title='Different Kind of Hacking'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-114658156026086942</id><published>2006-05-02T17:51:00.000+03:00</published><updated>2006-05-02T17:52:40.276+03:00</updated><title type='text'>Public Healtcare in Finland: Scrap, or How to Fix It!</title><content type='html'>There has been quite a bit of debate recently on the state of the public healthcare in Finland.  According to laws passed in early 70's the state (and/or municipalities) must provide an adequate level of basic healthcare for all citizens.  For two decades it worked marvellously, but since the recession in early 90's the state funding has gradually decreased despite the subsequent accession.  For quite some time that was compensated by municipalities and "increased employee efficiency" (i.e., in many cases unpaid, underpaid, or incompetent labor), but recently their back seems to have been broken: nurses are rallying for better salaries and more workforce, and nasty rumors are beginning to circulate about neglected elderly patients left to suffer alone in overcrowded institutions.&lt;br /&gt;&lt;br /&gt;So far the supply of doctors has been somewhat sufficient, but as people become older the ratio of nurses to doctors should increase as treatments such as washing, feeding and moving begin to dominate.  Nurses are not accustomed to defend their rights by striking, but the young have long ago voted with their feet: nursing has become an unpopular career and in fact there are roughly twice as many nurses in the age group of 40-49 than 30-39.  So, as the older age groups retire, the situation will only become worse.&lt;br /&gt;&lt;br /&gt;Simultaneously &lt;A HREF="http://www.osku.net/"&gt;worries&lt;/A&gt; are beginning to rise about the state's (i.e., younger generations') ability to pay the pensions the older generations have been promised (by themselves, essentially).  If all goes well, then the pensions can indeed be paid, but the economic uncertainties are considerable.  Consequently, one way to make it easier for the younger generations is to lift the state's responsibility of public healthcare, at least for the older age groups.  This would reduce government spending but the elderly would have to pay their own health care by selling their properties as very few have taken healthcare insurances.&lt;br /&gt;&lt;br /&gt;If, on the other hand, we the Finns decide to keep our public basic healthcare system alive, I propose a very simple remedy that will make sure it remains in working condition: enact a law that forbids all private healthcare or any special treatment forever for anyone who has worked sufficiently high in the state or municipalities.  This law should at least cover all members of the parliament and anyone else who has made a similar or salary (say, 6000 euros/month or more) while working anywhere in the public sector.  Once the politics and higher public servants are made to eat their own dog food the rest will fix itself, trust me on that one!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-114658156026086942?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/114658156026086942/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=114658156026086942' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114658156026086942'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114658156026086942'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/05/public-healtcare-in-finland-scrap-or.html' title='Public Healtcare in Finland: Scrap, or How to Fix It!'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-114514186571021539</id><published>2006-04-16T01:55:00.001+03:00</published><updated>2006-04-16T01:57:45.946+03:00</updated><title type='text'>How to Design a Successful Religion?</title><content type='html'>Assume you wanted to establish a new religion, and make it as successful as possible.  Here are some suggestions on what you should consider when planning it.&lt;br /&gt;&lt;br /&gt;Firstly, your religion should apply to some very basic instincts of humans.  Good candidates are love, hate, fear, sociality, and greed.  Generally fear, most prominently the fear of death, is the best candidate - hence remember to promise a paradise of some sort for those who worship your religion most deeply and hell or Guantanamo Bay for those who don't.&lt;br /&gt;&lt;br /&gt;For centuries religions and entertainment were deeply intertwined, but this coupling is decreasing.  I suggest you do concoct holy scriptures with many charismatic figures, heroic tales, vague metaphors, mysticism, and descriptions of wrath of your deities, but otherwise allow your worshipers watch television, use mobile phones and surf the net as long as you get a few hours a week and twenty minutes every day.  Otherwise your religion won't be compatible with contemporary culture, which may mean that following your religion will be hard for some age groups.&lt;br /&gt;&lt;br /&gt;Secondly, just like drugs, make your religion easy to become acquainted to but require at least some devotion to fully comply.  The best compromise means that it is possibly to learn and try your religion without taking the possible social pressure of giving up one's current religion, but once devoted, gives as little chance as possible to pick up yet another religion.  Note, however, that redemption for the fallen should be possible (even if perhaps uncertain), otherwise people won't return to your faith once they've lost contact with it.&lt;br /&gt;&lt;br /&gt;Thirdly, remember that religion and evolution have a deeper connection than some religions trying to deny evolution.  Specifically, unless your religion supports its practitioner's survival and procreation, your religion will simply die out - this has happened numerous times for many strange sects in history and most evidently in various &lt;A HREF="http://en.wikipedia.org/wiki/Cult_suicide"&gt;cult suicides&lt;/A&gt;.  Features such as religious upbringing, conversion, and even religious wars increase your religion's chances of being the fittest.  &lt;br /&gt;&lt;br /&gt;But even in peaceful times memes, including religion, experience evolution and people pick memes more frequently from those who have succeeded well in the society.  Hence, I suggest you add an element of pre-death revenue: one's success in society is at least partly god's grant for one's devotion to your religion (while someone faithful but unsuccessful in society might still only going through the equivalents of god's tests to Job).&lt;br /&gt;&lt;br /&gt;If I had a lot of extra time, I'ld probably make this easy for you, oh future prophet: I've considered writing a web page which asks a number of questions (for example the religion's position on various issues such as emphasis on love versus hate), and automatically create a holy scripture of desired length using a &lt;A HREF="http://pdos.csail.mit.edu/scigen/"&gt;context-free grammar&lt;/A&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-114514186571021539?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/114514186571021539/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=114514186571021539' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114514186571021539'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114514186571021539'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/04/how-to-design-successful-religion_16.html' title='How to Design a Successful Religion?'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-114272357250730299</id><published>2006-03-19T01:11:00.000+02:00</published><updated>2007-03-09T00:39:23.603+02:00</updated><title type='text'>Optimal Copyrightability</title><content type='html'>One can regard copyright as an agreement between the state and producers where the state grants the producer a time-limited monopoly of distribution, enforces that monopoly with a police force and judicial system, and thereby provides a monetary incentive to produce arts and enrich our culture.  This justification may be a little iffy, but the U.S. Constitution and many others, including me, use this line of reasoning.  But there are three issues which makes the agreement rather suboptimal for the state.&lt;br /&gt;&lt;br /&gt;Firstly, when applied to typical shrink-wrapped off-the-shelf computer programs, only the executable binary, not the more significant source code, will eventually fall into public domain.  So perhaps in 2056 one would be free to distribute binary copies of the first MS Windows, but people still couldn't see how they were implemented.  The source code will remain a trade secret forever, and hence enrich our culture much less than the legislators initially intended.  This deficiency could be solved by requiring the source code of programs to be escrowed before the state granted it copyright, but I guess nobody believes the legislators would have the balls to require that.&lt;br /&gt;&lt;br /&gt;Secondly, if the producer defaults in distribution, then it should become legal to copy and distribute the copyrighted work freely.  For example, according to Lawrence Lessig's database of 18 million books, almost 90% of copyrighted books are out of print.&lt;br /&gt;&lt;br /&gt;Thirdly, and most evidently, it seems copyright times are much longer than is really needed to guarantee an economical incentive for producers.  To study this a little closer, I fetched the list of all 15980 accreditations granted by &lt;A HREF="http://www.riaa.org"&gt;RIAA&lt;/A&gt;, deduced from them as well as I could the sales figures and release dates of all music albums, and computed some statistics.  I know there are many deficiencies with this approach - for example not all artists bother to apply for accreditations or apply decades later - but it was still the best long-term statistics I could get my hands on.&lt;br /&gt;&lt;br /&gt;Based on these statistics it seems that on average albums reach 90% of their final cumulative market share in 30 months, and 95% in ten years.  I would, however, tend to claim that revenues from bargain sales of more than a few year old records are far less profitable, and hence almost all profit will be made within the decade.  The only notable exception is occasional revivals of old very popular hits - for example the musical "Mamma Mia!" clearly boosted sales of two decades old ABBA hits.&lt;br /&gt;&lt;br /&gt;Profit figures for record labels were harder to come by, but for example according to the Canadian record industry association, record label profits are around 7% of the VAT-free street price of a CD - profit ratio of the company would be much higher than that.  So even given these probably very conservative profit estimates, these companies would still make profits even if the copyright period would be dropped dramatically.  Absolutely no data justifies the current 3-4 generation long copyright periods.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-114272357250730299?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/114272357250730299/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=114272357250730299' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114272357250730299'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114272357250730299'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/03/optimal-copyrightability.html' title='Optimal Copyrightability'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-114272342202648750</id><published>2006-03-19T01:09:00.000+02:00</published><updated>2006-03-19T01:11:09.186+02:00</updated><title type='text'>Shaky-Waky</title><content type='html'>I had a little fever and was feeling awkward, so I took a nap on my couch.  Some moments later I woke up feeling a strong vibration in my chest.  For a few twilight seconds I thought I was suffering a heart attack, until I realized I had forgotten my new small mobile phone with an exceptionally strong vibration alarm in my chest pocket.  Found this so hilarious my laughter nearly suffocated me.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-114272342202648750?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/114272342202648750/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=114272342202648750' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114272342202648750'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114272342202648750'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/03/shaky-waky.html' title='Shaky-Waky'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-114165241093150056</id><published>2006-03-06T15:38:00.000+02:00</published><updated>2006-03-06T21:07:35.523+02:00</updated><title type='text'>Allez, Allez!</title><content type='html'>Suppose &lt;A HREF="http://en.wikipedia.org/wiki/Lance_Armstrong"&gt;Lance Armstrong&lt;/A&gt; would sit on his bike and start accelerating at a constant acceleration of 1 m/s^2 (roughly 10% of earth's gravity).  In ten seconds he would reach a good cycling speed of 36 km/h, or some 20 mph, but suppose he would continue to press on.  What would happen?  Here's my guesstimate.&lt;br /&gt;&lt;br /&gt;From over one to two minutes later, around a speed of 300-400 km/h, typical cycling clothes would probably be torn apart by the air resistance.  But assume Lance doesn't care, or happens to wear a super-adhesive fig leaf.&lt;br /&gt;&lt;br /&gt;Roughly four minutes after start his bike wheels would break under the centrifugal forces.  This may have happened sooner or later depending on the surface, the structure of the wheels, and  whether they start wobbling.  Lance has lost his tires far earlier.&lt;br /&gt;&lt;br /&gt;Less than six minutes after starting Lance reaches Mach one.  Gradually air resistance would begin to heat him up.  Burn marks would begin to appear starting on his forehead, hands and knees, but for philanthropic reasons we assume that Lance is indestructable.  Fourty minutes later his kinetic energy exceeds that of his weight of TNT.  It will become increasingly difficult to see Lance's face as he begins to be surrounded by glowing plasma.&lt;br /&gt;&lt;br /&gt;In two hours and twelve minutes after start, having traveled 75% of earth's circumference, Lance will have trouble to stay on ground as earth's gravitational pull no longer compensates the speed at which earth curves downwards under him.  Poor Lance begins to spiral into space.&lt;br /&gt;&lt;br /&gt;Eight hours later Lance would reach the moon, in less than a week the sun, and five weeks later he would reach Pluto.  Gradually relativistic effects will creep in, and Lance will see the stars in front of him change their colors towards blue, and behind him towards red.  After nine years of pedaling (?), or ten years and three months on earth, Lance reaches the nearest stars.  His speed has exceeded 70% of light speed, and needless to say, should he pedal through a planet, he would cause a mass extinction of whatever life forms happen to live on that planet.  Even more spectacular effects - clearly observable also back on earth - would occur should Lance pedal through a neutron star.&lt;br /&gt;&lt;br /&gt;Duh, a side note to myself: I need to stop playing with my pocket calculator.&lt;br /&gt;&lt;br /&gt;(Thanks to an alert reader who wishes to stay anonymous for pointing out a error regarding equivalence of kinetic energy and TNT in the original post.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-114165241093150056?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/114165241093150056/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=114165241093150056' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114165241093150056'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114165241093150056'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/03/allez-allez.html' title='Allez, Allez!'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-114142597591641680</id><published>2006-03-04T00:40:00.000+02:00</published><updated>2006-03-04T00:47:34.050+02:00</updated><title type='text'>Hash the DNA</title><content type='html'>Last year seven EU member countries, including Finland, &lt;A HREF="http://europa.eu.int/idabc/en/document/4332/330"&gt;agreed to share access to DNA, vehicle, and fingerprint databases&lt;/A&gt;, and I've heard several suggestions to construct a DNA sample database of all Finns.  This worries me.&lt;br /&gt;&lt;br /&gt;Gathering DNA is typically justified by referring to 9/11 or the relatively small threat of global terrorism, but the logic and evidence behind this justification is as solid as the plans of evidence ... ooops!, I mean evidence of plans of WMD in Iraq.  But it is definitiely true that DNA, just like fingerprints, is a valuable forensic tool in identifying the victim, the suspect and connecting him physically to a crime scene.  Not only is DNA a tool for positive evidence, more importantly it has been used in several cases to redeem a falsely convicted person.&lt;br /&gt;&lt;br /&gt;But there is a significant distinction: DNA carries much more information about the person than fingerprints, surveillance camera footage, etc.  Based on DNA one can, or will in future be able to, determine much of one's pedigree, race, health, appearance, personality, physical and mental capabilities, among others.  Most of this has very limited forensic value, but is more interesting to for example companies hiring employees, selling life insurance, or marketing departments trying to find customers most easily addicted to a given product.  This is a much bigger threat to Joe Average's everyday life than occasional acts of terrorism.  And there's little doubt security agencies share their information with private companies - in fact economic espionage in its various forms was the primary task of many security agencies from the end of the cold war at least up to 9/11, now it's probably the second most important task.&lt;br /&gt;&lt;br /&gt;I suggest we could reconcile the desire of privacy and the need of evidence by sharing cryptographic hashes of DNA instead of the actual genome data.  The hash would still serve for identification, but would be of no value to anything else.  In fact, storing identified DNA data of any other human except yourself (and possibly your immediate family) should be outlawed as comparable to having, say, child pornography or downloaded music on your hard disk.&lt;br /&gt;&lt;br /&gt;I know there's a technical challenge: DNA data is rarely perfectly correct and complete, and hence traditional cryptographic hashing is not possible.  If that turns out to be impossible, the second best alternative would be to never ship identified DNA data - instead evidence would be shipped to few trusted repositories for identification, and destroyed immediately after a positive match has been found.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-114142597591641680?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/114142597591641680/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=114142597591641680' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114142597591641680'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114142597591641680'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/03/hash-dna.html' title='Hash the DNA'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-114104733758020636</id><published>2006-02-27T15:34:00.000+02:00</published><updated>2006-02-27T15:35:37.616+02:00</updated><title type='text'>Media Tax</title><content type='html'>In Finland the state carries a tax on blank media, a fraction of which supposedly goes to the artists.  For a write-once DVD this is roughly 60-70% of the disk's street price, or some 21 sn per gigabyte.  You pay this tax regardless of whether you use the disk to share art with your friends or whether you back up your own works on it.&lt;br /&gt;&lt;br /&gt;But let's take the media tax to the logical extreme, and then a little beyond.  First, think about paper.  High-quality laser printers claim a 600 dpi resolution, but let's assume only 150 dpi would be recoverable and hence usable to store data reliably.  With margins a normal A4 sheet of paper could then hold roughly 250 kilobytes of data.  A box of 2000 sheets of copier paper could with double-sided printing hold one gigabyte.  That's only 1% of the street price of the paper, but think about the volumes: roughly 300 000 metric tons is produced yearly, or approximately 6.5 billion euros worth of taxes.  Worth lobbying, don't you think?&lt;br /&gt;&lt;br /&gt;Ok, what about empty space.  Waves propagating through space is also a storage medium.  Suppose I send a laser beam one km away, where it is reflected back to a point one millimeter right of the origin, where it is reflected back one millimeter left of the first reflection point, etc. turning the beam gradually around.  Approximately five seconds and 1.5 million reflections later the laser beam reaches its origin.  With 10 ns pulses this device stores 62 megabytes of data.  But the fun has just begun, because we can use many laser colors and stack such laser disks (SIC!) on top of each other.  Even more efficient would be to construct a sphere instead of a ring of mirrors.  The sphere would contain some 3*10^12 mirrors and the laser would reflect through all of them in four months.  With, say, fourty different laser colors we could store over 5000 terabytes of data in the sphere.  That's worth a million in media taxes!&lt;br /&gt;&lt;br /&gt;The German 3G licenses for 20 years cost over 3 billion (10^9) euros each.  But the German airspace's volume could easily hold millions of laser spheres, each worth one million in media tax.  This sounds the expensive 3G licenses of empty space were actually a bargain.  Oh, the lasers can be a little bright, so remember to wear a tin foil hat should you enter the German airspace.&lt;br /&gt;&lt;br /&gt;Now, let's resort to my pet peeve, homeopathy.  Homeopaths claim that even if you dilute water so far that it no longer contains a single atom of the original remedy, the "memory of the water" will remember what it is supposed to cure.  Logically we can extend this not only to vials of water, but also single water atoms.  Hahnemann listed 217 remedies, but I don't know whether a water molecule can "remember" only one of them or any combination of these remedies.  If the former, a water molecule can store 7.76 bits instead of 217 bits of information.  If we assume the former, a litre of water can hold so much information that its media tax is worth 1200 times the whole world's GDP.&lt;br /&gt;&lt;br /&gt;Taking off the tin foil hat, empty space is not economic and homeopathy is not possible as a storage medium.  But media tax will nevertheless have to be moderated at some point.  Otherwise future storage media, such as the &lt;A HREF="http://en.wikipedia.org/wiki/Holographic_Versatile_Disc"&gt;HVD&lt;/A&gt; will cost nearly 1000 euros per disk.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-114104733758020636?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/114104733758020636/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=114104733758020636' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114104733758020636'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114104733758020636'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/02/media-tax.html' title='Media Tax'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-114077987082706134</id><published>2006-02-24T13:17:00.000+02:00</published><updated>2006-02-24T13:22:21.350+02:00</updated><title type='text'>Five-in-a-Row on a Torus</title><content type='html'>Work has the bad habit of to disturb blogging, but recently I had a nice chance to combine them.&lt;br /&gt;&lt;br /&gt;Some time ago I got a sudden inexplicable urge to implement &lt;A HREF="http://home.tiscali.nl/askeplaat/mtdf.html"&gt;MTD(f)&lt;/A&gt;, the reportedly fastest two-player game tree search algorithm.  Having some background in AI and games it was enjoyable to implement and tweak. The only issue the description doesn't quite mention is how to catch the best move in the root, let alone the principal variation, but after a few trials I got that working as well.&lt;br /&gt;&lt;br /&gt;To get something concrete, I decided to apply MTD(f) to the game mentioned in the subject: five-in-a-row, but on the surface of a torus instead of a flat sheet of paper.  Classic five-in-a-row favors the first player, and in fact a &lt;A HREF="http://en.wikipedia.org/wiki/Strategy-stealing_argument"&gt;strategy stealing argument&lt;/A&gt; shows that with correct play, the first player will not lose.  More advanced variations of five-in-a-row restrict the first player's moves in various ways, but many of them, including &lt;A HREF="http://en.wikipedia.org/wiki/Gomoku"&gt;Go-moku&lt;/A&gt; and &lt;A HREF="http://users.erols.com/msmammel/Renju4.doc"&gt;free Renju&lt;/A&gt;, have still been proven with deep computer analyses to be wins for the first player.&lt;br /&gt;&lt;br /&gt;On a torus, it is easy to check manually that if a 2-by-2 grid or 3-by-3 grid is drawn on the torus, the first player wins in two and four moves, respectively.  With this presumption I tested my program on 4-by-4 and 5-by-5 grids, and to my dismay the program reported them as draws.  I fell into a bogus bughunt, until I humiliatingly realized that those variations are indeed draws!  Because the surface is a torus, it is possible to block a three-in-a-row or four-in-a-row with a single stone from both sides.&lt;br /&gt;&lt;br /&gt;Solving the 6-by-6 game wasn't going to be easy, so I was forced against my sincere will (yeah, right!) to have some serious fun: threat-space search, good move ordering heuristics, a reasonable evaluation function, and &lt;A HREF="http://www.cs.cf.ac.uk/Dave/AI2/node133.html"&gt;rote learning as in Samuel's Checkers program&lt;/A&gt;.  The positions were stored in the transposition and rote learning tables in a format that canonified equivalent positions obtained by reflection, translation and transposing into identical ones.&lt;br /&gt;&lt;br /&gt;Playing against this beast was fun athough it demonstrated human fallibility in very convincing ways.  But if I wanted the game solved in a time shorter than the average interval between power outages (roughly 6 months), I had to add a few extra tricks.  Firstly, I changed the move ordering for the second player so that it actively seeks draws by preferring preventively blocking moves to attacking moves.  Secondly, I added an algorithm which checks, with a number of pessimistic assumptions, if the position will unavoidably end in a draw even if there are empty places left.  Thirdly, I split the transposition table in two: one which replaces based on recency of access, one which replaces based on effort needed to determine the position's value.  Without the latter hash table, the long-running search algorithm would replace the valuable upper level nodes before returning from a deep part of the search.&lt;br /&gt;&lt;br /&gt;Several days worth of clock cycles later the program reported that the first player can always win, but only barely because the board (the torus, I mean) is almost full when the fifth-in-a-row is played.  In a few days I should be able to show the optimal play.&lt;br /&gt;&lt;br /&gt;Oh, and the way this fun relates to work is through an  &lt;A HREF="http://www.cs.hut.fi/Opinnot/T-106.3100/K2006/index.shtml"&gt;course&lt;/A&gt; exercise: a simplified version of this code along with a little bit of network code, HTTP parsing and HTML generation serves as the basis for that course's programming exercise.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-114077987082706134?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/114077987082706134/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=114077987082706134' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114077987082706134'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114077987082706134'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/02/five-in-row-on-torus.html' title='Five-in-a-Row on a Torus'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-114030212549710733</id><published>2006-02-19T00:32:00.000+02:00</published><updated>2006-02-19T00:35:51.653+02:00</updated><title type='text'>Hedgehog-2.1 and thoughts about programming language ecosystems</title><content type='html'>I finally made version 2.1 of &lt;A HREF="http://hedgehog.oliotalo.fi/"&gt;Hedgehog&lt;/A&gt;, a small tactical Lisp system for low-end embedded systems.  The most significant additions in this version are the &lt;A HREF="http://en.wikipedia.org/wiki/SHA"&gt;SHA-256&lt;/A&gt; cryptographic hash function, a beefed up version of &lt;A HREF="http://www.cl.cam.ac.uk/ftp/users/djw3/xxtea.ps"&gt;XXTEA&lt;/A&gt; cipher, and a 256-bit shared secret between the interpreter and compiler (or rest of the host system).&lt;br /&gt;&lt;br /&gt;Hedgehog is a programming environment deliberately and rather single-mindedly targeted for a small niche market: very small, remotely controlled and updated M2M-boxes and telematics applications.  In general purpose programming languages C is the assembler, Java the COBOL, and Python the Basic of this era.  They and their equivalents in other systems all have a strong position and it seems it will take some time, perhaps a decade, before significant new revolutions can happen.  Therefore I predict that in the near future various niches is where we will see most of new practical programming language development.  Mainstream general-purpose programming systems, such as Java, take a few hundred person-years to develop when you include not only the core language but also libraries, tools and documentation, and ten times more if you consider marketing, community building and piloting.  Yet they don't fit every machine and every programming task.  There one often needs to design something very uncompromising and occasionally the outcome is something elegant and interesting.&lt;br /&gt;&lt;br /&gt;The Hedgehog Lisp dialect was designed by &lt;A HREF="http://liw.iki.fi/"&gt;Lars Wirzenius&lt;/A&gt;, and my role has merely been to (re)implement his design.  Ten years ago the position was reversed.  I had come across the idea that if one removed in-place mutation entirely from the programming language, one can find significantly improved solutions for generational garbage collection, persistence (backing up to computation and heap disk), and replication (backing up and cross-checking the computation among several machines).  On top of that we (a group of researchers including Lars) implemented a number of index structures and a more modern functional programming language.  Technically I think we succeeded in most of criteria, specifically in reaching good performance, but eventually IPR problems killed the possibility to continue with the work.  But persistent programming languages and fail-safe computing is an interesting niche I someday hope to return to.&lt;br /&gt;&lt;br /&gt;Future computational devices, such as mixtures of FPGA and normal CPUs, &lt;A HREF="http://www.gpgpu.org/"&gt;GPU computing&lt;/A&gt;, and of course massively multicore chips will also offer potential for new programming language designs.  XML &lt;A HREF="http://www.w3.org/TR/xquery/"&gt;processing&lt;/A&gt; &lt;A HREF="http://research.microsoft.com/Comega/"&gt;languages&lt;/A&gt;, &lt;A HREF="http://homepages.inf.ed.ac.uk/wadler/links/"&gt;Links&lt;/A&gt;, Javascript and other examples of languages that have a niche.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-114030212549710733?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/114030212549710733/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=114030212549710733' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114030212549710733'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/114030212549710733'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/02/hedgehog-21-and-thoughts-about.html' title='Hedgehog-2.1 and thoughts about programming language ecosystems'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113898562280890708</id><published>2006-02-03T18:42:00.000+02:00</published><updated>2006-02-04T00:55:29.633+02:00</updated><title type='text'>YASSSP = Yet Another Seven Secret of Successful Programmers</title><content type='html'>&lt;A HREF="http://iki.fi/liw/"&gt;Lars Wirzenius&lt;/A&gt; wrote in his &lt;A HREF="http://liw.iki.fi/liw/log/2006-02.html#20060203b"&gt;loki&lt;/A&gt; a better set of seven secrets of successful programmers than &lt;A HREF="http://www.irishdev.com/NewsArticle.aspx?id=1857"&gt;the original&lt;/A&gt;.  But feeling slightly cynical today and having served as a mental shoulder for many now former programmers over the past years, here's my list.  It's written more from the point of view of a programmer's profession rather than programming itself.  Not that I've fully followed all of these rules, but nevertheless:&lt;br /&gt;&lt;OL CLASS="sparse"&gt;&lt;br /&gt;&lt;LI&gt;&lt;STRONG&gt;Love programming&lt;/STRONG&gt;, problem solving, good "hacks", ideas that seemingly effortlessly get something done well, and read a lot of other people's code.  Find the principles of aesthetics in all of this.&lt;br /&gt;&lt;br /&gt;&lt;LI&gt;&lt;STRONG&gt;Don't fixate to any one programming environment&lt;/STRONG&gt; if you want to make a long-term living.  You will need to learn many new programming languages and tools, some far worse than what you were used to.  Ideally you pick you tools according to the problem at hand, but sometimes they are mandated by management.&lt;br /&gt;&lt;br /&gt;&lt;LI&gt;&lt;STRONG&gt;Don't be disproportionately good in any one thing&lt;/STRONG&gt;, otherwise you will be kept doing it until the thing is obsolete and you're fired.  Becoming a specialist tends to be a career-freezer.  If you are the best a company has for something, learn also other skills.&lt;br /&gt;&lt;br /&gt;&lt;LI&gt;&lt;STRONG&gt;Have a healthy amount of cynicism.&lt;/STRONG&gt; You should be loyal to your employer, but only to a reasonable limit, otherwise you will be exploited.  For example, if the the company can pay dividends, it can also pay overtime compensation.  Or, if the employer requests for your services after you have been fired, don't work for your prior hourly salary.  Instead, require typical consulting fees, for example 100 e/h.&lt;br /&gt;&lt;br /&gt;&lt;LI&gt;But &lt;STRONG&gt;don't be malicious or extort your employer&lt;/STRONG&gt; either.  For example, don't leave deliberate bugs, trapdoors, or purposefully write unmaintainable code.  Otherwise you risk not getting another project from them.&lt;br /&gt;&lt;br /&gt;&lt;LI&gt;&lt;STRONG&gt;Have a life outside of programming.&lt;/STRONG&gt; Don't use your friends, spouse, or children as lightning rods, but let them take your mind off troubled times at work and have a chance to relax.  Hobbies and physical exercise are good for this as well.&lt;br /&gt;&lt;br /&gt;&lt;LI&gt;&lt;STRONG&gt;Don't fixate on programming&lt;/STRONG&gt;, since times are tough for programmers and will become tougher for some time.  Very few programmers make a living comparable to upper management or similarly deeply skilled individuals in many other professions.  Those programmers who have made money, have many other talents and insights in technology and business as well.  The only chance to make a fortune by being predominantly a programmer, it seems, is to become an entrepeneur some years before a bubble, and take all the risks and other less nice things that come with it.  So unless you're willing to go up the management ladder, at least consider having wealthy spouse or parents, some savings, or a second profession, such as plumbing.&lt;br /&gt;&lt;/OL&gt;&lt;br /&gt;The last rule may appear controversial, but its justified by noting that economic success often follows from taking some risks, and if you have a Plan B other than living on the street, you are better able to take risks.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113898562280890708?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113898562280890708/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113898562280890708' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113898562280890708'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113898562280890708'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/02/yasssp-yet-another-seven-secret-of.html' title='YASSSP = Yet Another Seven Secret of Successful Programmers'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113882965175520248</id><published>2006-02-01T23:14:00.000+02:00</published><updated>2006-02-01T23:34:11.813+02:00</updated><title type='text'>Less than half of people are stupider than the average</title><content type='html'>Ok, I'm just nitpicking on an overused joke, but there is a clear distinction between average and median.  Half of people are indeed stupider than the median, but the average IQ is probably a little less than the median IQ because there are more very stupid people than very smart people.  Depending on the chosen measure, roughly one in 200 people have an IQ at least 75 points below the average and are in practice institutionalized, while only one million have an IQ of 75 or more above the average - ironically, many of them spend much of their time in various institions as well.&lt;br /&gt;&lt;br /&gt;Why this asymmetry?  High intelligence is reached when many factors coexist, whereas mental retardedness may be caused by only a single factor, such as a genetic error.  So in the extremes of the distribution, additional intelligence and stupidity are caused by different random processes.&lt;br /&gt;&lt;br /&gt;Back to the main point: we can analoguously clearly see that with the possible exception of Vatican, most of you earn or own far less than the average simply because the income and wealth statistics are very skewed in all countries except the Vatican.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113882965175520248?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113882965175520248/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113882965175520248' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113882965175520248'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113882965175520248'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/02/less-than-half-of-people-are-stupider.html' title='Less than half of people are stupider than the average'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113820377226971633</id><published>2006-01-25T17:40:00.000+02:00</published><updated>2006-01-25T17:42:52.283+02:00</updated><title type='text'>Styles of Reading and Writing</title><content type='html'>Children read the pictures and comics in the newspapers, but once they go to school they read (also) the text.  Later the reverse happens: students write their theses mostly in text, but after graduation they become professional designers of Powerpoint slides.  No wonder Nokia main building is often called "PowerPoint Palace".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113820377226971633?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113820377226971633/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113820377226971633' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113820377226971633'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113820377226971633'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/01/styles-of-reading-and-writing.html' title='Styles of Reading and Writing'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113814261484829341</id><published>2006-01-25T00:36:00.000+02:00</published><updated>2007-02-25T18:40:43.916+02:00</updated><title type='text'>Is video compression worthwhile?</title><content type='html'>Whenever our digibox's hard disk gets full, I have the habit to download its contents to my desktop, compress from MPEG2 to something more concise, say DivX, and either store on a USB/FW disk or DVD.  My typical compression settings reduce the data size by over 70% from approximately 1.5 GB/hour to approximately 400 MB/hour.  Here I'm not concerned with quality - in fact the compression seems to improve the quality by smoothing out some digi-TV artifacts.  But a friend of mine claimed this was hardly economically justified.  So we calculated it out, and below are the results.&lt;br /&gt;&lt;br /&gt;A 4.7 GB (or actually 4.4 GB) DVD-R disk costs roughly 50 cents (without media tax) in large spindle boxes, or approximately 11 cents per GB, and therefore by compressing I save some 12 cents per hour of movie.  Large USB or Firewire disks cost roughly 80 cents per GB, so when I store to them, I save some 88 cents per hour of movie.  These hard disks suffer another drawback: I can't insert them into the (DivX-capable) family DVD player and watch the movies on the big screen.&lt;br /&gt;&lt;br /&gt;Compression takes quite a bit of CPU time, roughly four hours per one hour of movie on my machine.  I estimate my machine consumes roughly 60 watts more electricity when the CPU is working.  Electricity costs roughly 8 cents per hour of kilowatt (including taxes and transfer but no fixed costs), or roughly two cents per one our of movie.&lt;br /&gt;&lt;br /&gt;So yes, for all current media, my video compression is indeed economically justified, albeit not necessarily indefinitely: in future media will be cheaper and power more expensive.&lt;br /&gt;&lt;br /&gt;There's another argument: space.  With individual cases I can store roughly 200 DVD disks per a meter of shelf space, up to 15 shelves high.  That's 3000 disks.  They cost 1500 euros, a shelf might cost some 300 euros, but the floor space in the center of Helsinki costs currently a whopping 4000 euros per square meter.  This quadruples the benefit of video compression!&lt;br /&gt;&lt;br /&gt;Now there's just one aspect left to ponder: The full shelf can store 33000 hours of compressed movies.  Assuming I watched four hours of movies every day - which I definitely don't have time to do - it will take over 22 years to watch all the disks.  Furthermore, since my current recording speed seems to be well below one disk per week, filling the shelf would be concluded by my potential grand children.  But by then the copyright to the movies has extinguished and my grandchildren could legally share the movies (or actually largely also documentaries) I have recorded.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113814261484829341?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113814261484829341/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113814261484829341' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113814261484829341'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113814261484829341'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/01/is-video-compression-worthwhile.html' title='Is video compression worthwhile?'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113796688440520701</id><published>2006-01-22T23:48:00.000+02:00</published><updated>2006-01-22T23:54:44.426+02:00</updated><title type='text'>Limits of copyrightability, part 3</title><content type='html'>In previous posts (&lt;A HREF="http://cessu.blogspot.com/2005/12/limit-of-copyrightability.html"&gt;1&lt;/A&gt;, &lt;A HREF="http://cessu.blogspot.com/2006/01/limits-of-copyrightability-part-2.html"&gt;2&lt;/A&gt;) I stated that the risk of two persons incidentally producing an equivalent work and thus risking a false conviction of copyright violation would be unacceptably high if (1-2^-n)^(s*p*k*k) is below one minus the acceptable risk 10^-6.  Here n is the information content of the work in bits, p the number of other people producing copyrightable works, k the number of bits of information everyone on average produces in his lifetime, and s is the number of works considered equivalent to an n bits long work.  I have, for example, argued that in English text n would be from 100 to 216 letters, and consequently for example individual verses of Haiku should not enjoy copyright protection.  In this rather lengthy post I shall first apply this principle, then another approach, to squeeze out some lower limit on the size of a copyrightable piece of program source code.&lt;br /&gt;&lt;br /&gt;In the context of programming, the role of s becomes particularly interesting depending on whether we consider equivalence to cover superficial variations such as comments, indenting, naming conventions, and choice of equivalent control structures, or deeper issues such as the partition of functionality, representation of data, and choice of algorithms, protocols and pre-existing software components.  In the latter extreme, all functionally equivalent programs, or significant parts of programs, would be considered equivalent and consequently the copyright of a specification would also cover all implementations of the specification.  But that, of course, is not the purpose of copyright, but patents.&lt;br /&gt;&lt;br /&gt;In this text I regard two (parts of) programs equivalent in the copyright sense if the two programs are compiled to the identical stripped object code.  This is a whopping generalization, but somewhat justified.  Firstly, compilers ignore comments and indentation and optimize the programs to the most efficient and consequently more canonical form.  Secondly, stripping removes the symbol information and thereby compensate for equivalent variations in variable and function names.  Additionally roughly half a kilobyte in each object file is reserved for various automatically generated metadata, which is ignored.  After &lt;A HREF="http://www.compression-links.info/Link/2019_PAQ3N.htm"&gt;compressing&lt;/A&gt; the rest of the object files we obtain a rough measure of entropy for the original source code.  &lt;br /&gt;&lt;br /&gt;So, I took all newest software packets in &lt;A HREF="ftp://ftp.funet.fi/pub/gnu/prep"&gt;ftp://ftp.funet.fi/pub/gnu/prep&lt;/A&gt; which compiled by simply issuing &lt;TT&gt;configure CFLAGS=-Os &amp;&amp; make&lt;/TT&gt;, ignored source code not written in C, and object files which couldn't easily be associated with their source file.  The results from the remaining over 5000 files are shown in the figure below.  They include many outliers, such as source code files including other source code files and, in the other extreme, files which include nothing to execute for this particular platform.  But a rough estimate is three bytes per every non-empty non-comment line of source code as reported by &lt;A HREF="http://www.dwheeler.com/sloc/"&gt;David Wheeler's sloccount&lt;/A&gt;, or 0.8 bits per each non-whitespace non-comment source code character.  This measure is a lower, and hence more accurate, upper bound on the entropy than compressing the source code without comments directly.&lt;br /&gt;&lt;br /&gt;&lt;IMG SRC="http://hibase.cs.hut.fi/~cessu/entropy.jpg"&gt;&lt;br /&gt;&lt;br /&gt;If we pick the total SLOC produced in the world to be, say, 200*10^9, and the number of SLOC that a productive person writes during his lifetime to be one million, multiply these by the entropy per SLOC, i.e. 24, substitute into the formula introduced &lt;A HREF="http://cessu.blogspot.com/2005/12/limit-of-copyrightability.html"&gt;in the first part of this series&lt;/A&gt;, then we have n &gt;= 87.  This shows that at least 3.6 average size source code lines, or 108 non-comment non-whitespace characters.&lt;br /&gt;&lt;br /&gt;This may seem a little disappointing, but note that even after the transformations we have done, the entropy of the source code is over-estimated by for example random variations such as register allocation and stack layout in the compiler and suboptimality of compression program.  In other words, the minimum copyrightable work is actually larger than the 3.6 lines, because it includes information about the context in which the work resides.  Consequently, we could argue this lower limit for patches, but not arbitrary code shown out of context.  We could push the lower limit higher by finding compiler mechanisms to get rid of such random variations, but perhaps writing a new C compiler goes beyond the scope of this blog entry...&lt;br /&gt;&lt;br /&gt;But there's another argument we could use: observe how frequently programmers incidentally produce new code equivalent to some existing code.  I chose gcc-core-4.0.2, a large open source project where there's no legal or technical reason not to reuse code written by someone else.  I configured and compiled with -Os, and looked for the number of identical parts of the text segments in the 417 distinct object files containing almost 3 megabytes of executable code from over 300 kSLOC, some more headers, and 180 kSLOC of machine-produced source code.  I roughly estimate each source code line results in some 6 executable bytes.  Note that constant data (such as string contants) are excluded in this treatment even if they may contain copyrightable material.&lt;br /&gt;&lt;br /&gt;&lt;IMG SRC="http://hibase.cs.hut.fi/~cessu/unreuse.jpg"&gt;&lt;br /&gt;&lt;br /&gt;As you can see from the graph above, equivalent code is produced far more frequently than one would assume after the above probabilistic treatment.  For example, the likelihood to write an equivalent 128 bytes (21 SLOC) is over three times the acceptable risk.  I went through the trouble to see where some of the identical parts had come from, and instead of macro expansion (which can produce duplicate machine code), most of them were indeed caused by typical C-like liturgy, such as code for looking up or even inserting with possible rehash in a &lt;A HREF="http://en.wikipedia.org/wiki/Hash_table#Chaining"&gt;chaining hash table&lt;/A&gt; with string keys.  Some of this is caused by rather primitive reuse mechanisms in the programming language, some by ignorance of the existing code.&lt;br /&gt;&lt;br /&gt;Note that even if I have argued that an average four line patch or twenty lines of new code shouldn't deserve copyright protection, they may nevertheless be protected by patents.  Also, if it can be shown that your four-line patch or twenty-line function was a derived or copied work - for example because you might not be sufficiently skilled in programming - then none of the above arguments will help you.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113796688440520701?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113796688440520701/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113796688440520701' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113796688440520701'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113796688440520701'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/01/limits-of-copyrightability-part-3.html' title='Limits of copyrightability, part 3'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113688410442377552</id><published>2006-01-10T11:00:00.000+02:00</published><updated>2006-01-11T01:27:06.726+02:00</updated><title type='text'>Crossword Puzzles</title><content type='html'>A trivial backtracking search to generate an N*N crossword puzzle consumes a depressingly rapidly growing running time O(c^(2*N)), where c is the number of N-letter words. On my computer all 3*3-crosswords are found instantly, 4*4-crosswords in a day, and 5*5-crosswords would take longer than sun shines.&lt;br /&gt;&lt;br /&gt;The key to make crossword puzzle generation efficient is in to check immediately when laying a new letter of a word in a puzzle that the crossing direction is not impossible to satisfy. For example, if I place the word "take" immediately above "time" at the top of the puzzle, it will be impossible to find crossing words which begin with letters "tt" or "km" later in the search. This prunes the search space increasingly efficiently as N grows, because while some 31% of all digraphs occur in some word, only 8% of all trigraps and less than 1% of all 4-graphs occur in some word. Furthermore not all combinations occur in any place, for example both digraphs "tt" and "km" do occur in some words, but neither can begin a valid word.&lt;br /&gt;&lt;br /&gt;I also wrote hints of varying difficulty to a number of the words.  To tease you, here's an easy 3*3-puzzle:&lt;br /&gt;&lt;pre style="font-family: courier new;"&gt;  impregnate color -----+&lt;br /&gt;  celestial rotation -+ !&lt;br /&gt;  beer -------------+ ! !&lt;br /&gt;                    ! ! !&lt;br /&gt;                    V V V&lt;br /&gt;  increase      -&gt;&lt;br /&gt;  put down      -&gt;&lt;br /&gt;  visual organ  -&gt;&lt;/pre&gt;&lt;br /&gt;And a very hard one:&lt;br /&gt;&lt;pre style="font-family: courier new;"&gt;  protective clothing -+&lt;br /&gt;  anecdotes ---------+ !&lt;br /&gt;  metal bar -------+ ! !&lt;br /&gt;                   ! ! !&lt;br /&gt;                   V V V&lt;br /&gt;  chatter      -&gt;&lt;br /&gt;  cuckoo       -&gt;&lt;br /&gt;  small amount -&gt;&lt;/pre&gt;&lt;br /&gt;There are numerous puzzles when N is 3, 4 and even 5, but my program found only two 6*6 crossword puzzles when using an old /usr/dict/words without names of places and people:&lt;br /&gt;&lt;blockquote style="font-family: courier new;"&gt;     a c c u s e&lt;br /&gt;     p r o p e l&lt;br /&gt;     h a n d e d&lt;br /&gt;     i n v a d e&lt;br /&gt;     d i e t e r&lt;br /&gt;    s a y e r s&lt;/blockquote&gt;&lt;br /&gt;The second 6*6-crossword is of course the transpose of the above one.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113688410442377552?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113688410442377552/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113688410442377552' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113688410442377552'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113688410442377552'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/01/crossword-puzzles.html' title='Crossword Puzzles'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113684748865898000</id><published>2006-01-10T00:52:00.000+02:00</published><updated>2006-01-10T01:42:19.036+02:00</updated><title type='text'>Limits of copyrightability, side note</title><content type='html'>Apple has sued and apparently won a preliminary injunction against a German company Liquid Air Lab for the name "spod" for its podcasting service.  If this decision holds, apparently every product or service name one latter and one capitalization change away from "iPod" could be successfully threatened by Apple, especially if they are somehow related to mobile listening of music.  That's over 400 names for the price of one, including the word "apod" for "footless", &lt;a href="http://epod.usra.edu/"&gt;Earth Picture of the Day&lt;/a&gt;, &lt;a href="http://www.ipad.com.br/"&gt;APOD&lt;/a&gt;, etc.  In particular, it is a painfully large share of all pronounciable four letter names.&lt;br /&gt;&lt;br /&gt;This relates to my previous posts on copyrights in the sense that overly wide interpretations of trademarked names pollute the namespace, and soon it may be impossible to name a product without fear of having to rename it.  Of course, for big products such as new car models a proper proximity search in trademark databases is not a big overhead, but for small companies and open source projects it is.&lt;br /&gt;&lt;br /&gt;Also, the state earns quite a bit from trademark fees.  One would assume that some of that money would be funneled into setting up a free web site that searches similar trademarks and gives a legally binding decision if it based on predefined rules can deduce that a proposed name violates no trademark.  Although nowadays one can trademark colors (such as Fazer's blue in the context of sweets), sounds and music (such as the Nokia tune in the context of mobile phones), it would be sufficient to limit to textual trademarks.  No trivial thing, but a reasonable research project proposal, methinks.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113684748865898000?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113684748865898000/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113684748865898000' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113684748865898000'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113684748865898000'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/01/limits-of-copyrightability-side-note.html' title='Limits of copyrightability, side note'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113684630699686222</id><published>2006-01-10T00:38:00.000+02:00</published><updated>2007-02-06T08:01:49.373+02:00</updated><title type='text'>On Taxation</title><content type='html'>I participated recently in an interesting discussion about taxation and its fairness.  Some of the participants were quite well off, and as winners in that game they called taxes on properties to "envy tax" and estate tax to "death tax", and generally dismissed all contrary opinions as leftist propaganda.  While I definitely don't regard myself as leftist, intellectual honesty demanded me to oppose some of their views on taxation.  Here's why.&lt;br /&gt;&lt;br /&gt;Outside of social aspects such as evening out the income gap, taxes serve a simple purpose: share the cost of common purchases and thereby obtain savings in scale.  For example, if we wish to have a state where people needn't hire private armies to defend our borders and guards to keep our property safe, we gather a sum of money through taxes and hire a common army, police force, legislature process, courts of law, and prison system which exercise a monopoly of violence within our borders.  (Incidentally, that is what defines a state.)  Currently almost all of that system is paid by income and value added tax as they form an overwhelming majority of our budget's income.&lt;br /&gt;&lt;br /&gt;But now consider who has most to lose if the army, police force, law, and criminal system would cease to exist.  Rampant theft and vigilantism would follow.  Except for our lives, what do we have to lose?  Yes, our property!  Eventually our society would degrade to stone age where no-one could safely own more than he could carry with him, guard with sight and hearing and run to protect before the thief is gone.  So the ones that benefit most of our costly system for maintaining safety and order are the ones who own a lot of properties (land, buildings, valuables, money, shares, promissory notes, whatever).  However, property tax pays only a negligible fraction of the costs associated with protecting the properties of people, and it is now being lifted altogether.  I don't see the fairness in that.&lt;br /&gt;&lt;br /&gt;Estate (inheritance) tax is argued againts by calling it multiple taxation, as the same property is taxed first when the deceased earned it and a second time when it is passed on to the children.  However, this argument doesn't hold for two reasons: Firstly, why should property in estates not be taxed when they change hands while essentially all other exchanges of ownership is taxed?  Secondly, the one who pays the estate tax is the recipient of the estate, not the deceased, so there's really no double taxation occurring at all!&lt;br /&gt;&lt;br /&gt;In my view estate tax can be used to replace property tax simply because everybody dies some day, and whether the taxation is done after death or while living is irrelevant.  Therefore now that property tax is removed, estate tax should be sufficiently large to cover a fair share of our common infrastructure that protects property.  Nevertheless, in my mind property and estate tax carried by the Finnish state should roughly equal the sum consumed by the interior, defence and justice ministeries, or roughly 5 billion euros.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113684630699686222?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113684630699686222/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113684630699686222' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113684630699686222'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113684630699686222'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/01/on-taxation.html' title='On Taxation'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113673952687749520</id><published>2006-01-08T18:57:00.001+02:00</published><updated>2007-03-02T16:11:57.500+02:00</updated><title type='text'>Styles of Speaking</title><content type='html'>A group of Finns were having a conversation with each other in a crowded restaurant in Rome.  Suddenly the lights went off because of a power outage.  The Finns continued their conversation unabated, but all Italians went silent.  Why?  Because Italians gesture intensively with their hands, but that couldn't be seen in the darkness.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113673952687749520?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113673952687749520/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113673952687749520' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113673952687749520'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113673952687749520'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/01/styles-of-speaking_08.html' title='Styles of Speaking'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113667400461601385</id><published>2006-01-08T00:40:00.000+02:00</published><updated>2006-01-08T00:46:44.626+02:00</updated><title type='text'>Limits of copyrightability, part 2</title><content type='html'>In a &lt;a href="http://cessu.blogspot.com/2005/12/limit-of-copyrightability.html"&gt;previus post&lt;/a&gt; I concluded that works shorter than 110 bits of information do not deserve copyright because the risk of two persons incidentally producing an identical work and thus risking a false conviction would be unacceptably high.  We saw that this was achieved when (1-2^-n)^(p*k*k) was at most one minus the acceptable risk 10^-6.  Here n is the length of the work, p the number of other people producing copyrightable works, k the number of bits of information everyone on average produces in his lifetime.&lt;br /&gt;&lt;br /&gt;We can expand this approach to cover similarity simply by regarding each new work as representing s similar new works.  In other words we require (1-2^-n)^(s*p*k*k) to be at most one minus the acceptable risk 10^-6.  The difficulty, however, lies in defining s.  In text we might assume s covers, for example, changes in tense, mode, person, sentence structure, word order, dialect or language, or systematic changes of words with their synonyms, names with new ones, etc.  Based purely on intuition I would guess s must be at least millions, before the resulting sentences in a natural language appear significantly different.  If we choose s = 10^6, then the minimum information content of a copyrightable work grows from 110 bits to 130 bits, or from 100 to 216 letters. &lt;br /&gt;&lt;br /&gt;While the above rise may not appear significant, it already seems to imply that no individual verse of Haiku should be copyrightable.  Recall that typical western-style Haiku are 5+7+5 syllables long, such as the 82-letter Haiku with which David Dixon won the Salon magazine's Haiku contest:&lt;br /&gt;&lt;blockquote&gt;    Three things are certain:&lt;br /&gt;    Death, taxes, and lost data.&lt;br /&gt;    Guess which has occurred.&lt;/blockquote&gt;&lt;br /&gt;However, in defense we might argue that not p = 10^9 people write Haiku verses all of their life.  If we instead take p = 10^6 and k = 100 for one million people writing one hundred verses, it would imply n = 73, or from 56 to 121 letters.  Considering the the first 45 letters from Dixon's Haiku occur in several other phrases (try it with Google, for example), I'ld still claim that this Haiku probably shouldn't deserve copyright according to the arguments I've laid out.&lt;br /&gt;&lt;br /&gt;Nevertheless, I think Dixon's Haiku is great, and as someone who has had all but the first occur personally and repeatedly, I think Dixon deserved to win.&lt;br /&gt;&lt;br /&gt;Next week I hope to write a little about how all my reasoning applies to programming and copyrightability of source code.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113667400461601385?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113667400461601385/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113667400461601385' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113667400461601385'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113667400461601385'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/01/limits-of-copyrightability-part-2.html' title='Limits of copyrightability, part 2'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113641944717135596</id><published>2006-01-05T02:01:00.000+02:00</published><updated>2007-03-15T03:46:23.130+02:00</updated><title type='text'>More on Caesar</title><content type='html'>My previous entry left me thinking about the likelihood of being a descendant of Julius Caesar.  For example, my boss, of noble birth and a long recorded genealogy, is a descendant of Charlemagne at least in two different lines, but because it is impossible to follow the blood line during the first turbulent millennium, I have to resort to probability estimates.&lt;br /&gt;&lt;br /&gt;Caesar had two children, Julia and Caesarion, who later became Ptolemaios 15, king of Egypt, and a stepson Gaius Octavius, who later became Augustus Caesar.  Undoubtably Caesar had many secret relationships, also heterosexual ones, and we can hereby presume that his genes have had better than average chances of replicating.  If at 50 B.C. the world population was roughly 200 million, we can assume that at least two of them were descended of Julius Caesar.&lt;br /&gt;&lt;br /&gt;We assume descedants of Caesar are equally productive to non-descendants.  The only way to produce a non-descendant is that both parents are non-descendants.  If we denote with P(i) the probability that a random person i generations after Caesar is not his descendant, then P(i+1) = P(i)^2, which simplifies into P(i) = P(0)^(2^i).  After some numerics, we see that in 23 generations, or around 500 A.D. nearly half of the world population would be descended from Caesar, and by today practically everybody.&lt;br /&gt;&lt;br /&gt;Of course this is hardly quite correct, because some populations live in such physical and cultural separation that descendency may not have reached them before the 19'th century.  Even we Finns are considered somewhat isolated, but by no means entirely and in any case two millenia is ample of time for some traveling warrior, trader, or priest to seed some of my ancestors.&lt;br /&gt;&lt;br /&gt;Ok, so do I now feel any Caesareity?  Nope, not even the slightest foretelling sting in my torso.  And by the way, it was Shakespeare who put the words "Et tu, Brute" in Caesar's mouth.  Suetonius claims Caesar's last words to have been "You too, my son" in Greek.  Some sources also claim that he cried "This is violence!" (in Latin) when stabbed for the first time, but perhaps during over 20 stabs you have time to think of a more eloquent last words.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113641944717135596?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113641944717135596/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113641944717135596' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113641944717135596'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113641944717135596'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/01/more-on-caesar.html' title='More on Caesar'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113641874150864535</id><published>2006-01-05T01:49:00.000+02:00</published><updated>2007-02-27T13:07:44.336+02:00</updated><title type='text'>Homeopathic Caesar Urine</title><content type='html'>Julius Caesar (100-44 B.C) lived 55 years, and assuming he urinated two liters every day, there should be approximately 40 cubic meters of water from his urine.  Calculations I omit from here suggest that photosynthesis, the most likely method of breaking water molecules down in nature, has had a modest affect on this amount. Therefore we can approximate all that water has survived to this day and dispersed evenly on our planet during the past two millenia.&lt;br /&gt;&lt;br /&gt;Dividing the 40 cubic meters with the total volume of all oceans we obtain that one molecule in every 3.6 * 10^15 today is genuine "chez Caesar".  The strongest homeopathic medicines have been diluted up to 10^30, but 10^15 should already have a clear effect.  Unfortunately I can't quite figure out what it should cure - megalomania, or perhaps laurel rash?&lt;br /&gt;&lt;br /&gt;Another way of looking at this is to say that the Coca-Cola company has sold half a millilitre of Caesar's urine world-wide during its entire operation.  I guess that's the "secret ingredient".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113641874150864535?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113641874150864535/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113641874150864535' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113641874150864535'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113641874150864535'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/01/homeopathic-caesar-urine.html' title='Homeopathic Caesar Urine'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113627182922321011</id><published>2006-01-03T09:01:00.000+02:00</published><updated>2006-01-03T09:03:49.230+02:00</updated><title type='text'>Outlaw Radiators!</title><content type='html'>Production of electric radiators and other low-temperature electric heating elements should be outlawed.  They should be replaced by cheap computing elements which produce heat as a by-product.  Mass production of such devices would further for example SETI@home and other easily parallellisable tasks.  The increased production cost would be covered by selling CPU time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113627182922321011?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113627182922321011/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113627182922321011' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113627182922321011'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113627182922321011'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/01/outlaw-radiators.html' title='Outlaw Radiators!'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113624110955320943</id><published>2006-01-03T00:27:00.000+02:00</published><updated>2006-01-03T10:38:07.343+02:00</updated><title type='text'>Longest Shortest Path</title><content type='html'>A shortest path connects two words, for example "dos" and "lnx", with the shortest possible list of words which differ only in one letter, for example "dog"-"log"-"lug"-"lux". Surely it must be only coincidental that dos is close to dog and lnx close to lux, which among other meanings is Latin for light.&lt;br /&gt;&lt;br /&gt;Given a vocabulary of N words it relatively easy to construct an efficient program which computes the shortest path between any two words. But what is the best algorithm for finding the longest shortest path? One could use the Floyd-Warshall algorithm which takes O(N^3) time, or Johnson's algorithm which takes O(N^2 log N) time. Given N is perhaps a few thousand, these algorithms are already palatable in modern computers.&lt;br /&gt;&lt;br /&gt;The actual findings depend of course heavily on the given vocabularies, but below are some results using chosen-length subsets of an old /usr/dict/words for English words: Perhaps it is only because of my skeptical tendencies, but for some reason "chi", the circulating life energy in Chinese philosophy, and "ass" come up when searching the longest shortest path in words of three letters. The intervening words are "phi"-"poi"-"pow"-"row"-"roe"-"rye"-"aye"-"are"-"ark"-"ask". Similarly suspiciously the search in five letter long words produced "toxin" and "quack" with a whopping 44 words in between them. Only the four letter words seemed to refer to something less quacky: "zeta"-"unit" or "lava"-"chum" with 15 intervening words.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113624110955320943?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113624110955320943/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113624110955320943' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113624110955320943'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113624110955320943'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2006/01/longest-shortest-path.html' title='Longest Shortest Path'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113604296427961919</id><published>2005-12-31T17:26:00.000+02:00</published><updated>2007-02-20T06:11:09.876+02:00</updated><title type='text'>Limit of copyrightability</title><content type='html'>Perhaps it is possible to derive mathematically a limit, n, on the shortest copyrightable text.  My starting point is that my risk of incidentally producing the same copyrightable text as someone else, and should all text be perfectly searchable thus also risk false conviction, should not exceed the "acceptable risk" of 1 in 1000000.&lt;br /&gt;&lt;br /&gt;Suppose there are p other persons producing k bits of information in each ones lifetime.  We can safely assume n is much larger than k and since copyrightable works may begin anywhere, we can approximate that there are p*k copyrightable works in the world written by others.  The probability that two works whose information is n bits are identical is 2^-n.  The probability that my new work is truly unique in the world is thus (1-2^-n)^(p*k).  The probability that none of the k works I produce during my life are unique is (1-2^-n)^(p*k*k).  If I've indeed written all those works myself, that probability should be at least certainty minus the acceptable risk of 10^-6.&lt;br /&gt;&lt;br /&gt;For a numerical value we might estimate p to be one billion (10^9) literarily productive persons now and cumulatively in the past, and k being 50 years * 365 days/year * 10 kilobytes/day = 1420 million bits.  With these values n &gt; 110 bits would suffice.&lt;br /&gt;&lt;br /&gt;But note that the information content of natural text is not the same as the number of bits it contains.  Shannon estimated based on his experiments that the entropy of English text is between 0.6 and 1.3 bits per letter.  This would imply that the minimum length of a copyrightable text lies between 84 and 183 letters.  Hence the previous sentence might just barely be copyrightable.  The previous two sentences together should probably be copyrightable.&lt;br /&gt;&lt;br /&gt;Of course, in highly repetitive and phrase-heavy text (legalese, lyrics and chitchat, for example) the entropy would be even lower than 0.6 bits per letter, and shortest copyrightable texts even longer.&lt;br /&gt;&lt;br /&gt;Note that all this reasoning applies to two identical copies.  If we wish to expand this reasoning from identical copies to equivalent or similar versions of text, the minimum copyrightable text length will necessarily increase.  But more on this some other day.  Probably I'll also expand this line of thought to music and software.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113604296427961919?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113604296427961919/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113604296427961919' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113604296427961919'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113604296427961919'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2005/12/limit-of-copyrightability.html' title='Limit of copyrightability'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-20305076.post-113589176716634623</id><published>2005-12-29T23:25:00.000+02:00</published><updated>2005-12-30T00:43:47.546+02:00</updated><title type='text'>Introduction</title><content type='html'>It has taken me some five years of hesitation before I started my own blog, or "web log", as they were called then. Blogs are a form of narcissism, I've often thought. So, therefore this blog won't be a teen diary of turbulent hormonally induced feelings, nor will it be a captains's log promptly recording every irrelevant action I take in my life, nor will I re-blog all even remotely interesting links I find when surfing the net or in other blogs. In fact very little of my private or professional life will be recorded here.&lt;br /&gt;&lt;br /&gt;So, what do I plan to blog? Primarily reviews of books and software, and essays, or even shorter essaylettes, of paradoxes, my opinions, observations or ideas I don't have time to pursue further myself. And probably I will blog relatively infrequently and in bursts.&lt;br /&gt;&lt;br /&gt;And, who am I? A gradually middle-aging hacker, a software engineer currently working on my PhD in computer science at the Helsinki University of Technology, married, a happy father of a two year old daughter who's probably bound to outsmart me in everything except some arcane programming languages, a modest/low-carber, member of &lt;a href="http://iki.fi/"&gt;IKI&lt;/a&gt; and &lt;a href="http://www.skepsis.fi/"&gt;Skepsis&lt;/a&gt;, and someone who likes good discussions, elegant algorithms, terse software designs, puzzles with an element of surprize, cycling, cooking, and - given a relaxed schedule - building or renovating woodwork.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20305076-113589176716634623?l=cessu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cessu.blogspot.com/feeds/113589176716634623/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=20305076&amp;postID=113589176716634623' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113589176716634623'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/20305076/posts/default/113589176716634623'/><link rel='alternate' type='text/html' href='http://cessu.blogspot.com/2005/12/introduction_29.html' title='Introduction'/><author><name>cessu</name><uri>http://www.blogger.com/profile/17273230380381269494</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='22' height='32' src='http://hibase.cs.hut.fi/~cessu/pix/cessu.gif'/></author><thr:total>1</thr:total></entry></feed>
