Monday, April 12, 2010

Hash Based Compression

Over the weekend, I was trying to remember the name of someone I haven't seen and barely thought about for over 15 years. After much effort I finally managed to remember the family's last name, and then after coming up with a few first names that didn't seem right, I iterated through every letter of the alphabet until I finally remembered the first name.

This immediately made me wonder exactly how human memory works, and if there is a potential for something like "hash based compression." Let me explain:

A hash function is simply a way of creating a small identifier from a large data set. This sort of thing is frequently used to verify that data has been transmitted correctly (for example, in the TCP/IP protocol) - a hash of the data is sent along with it, and if the hash you received and the hash of the data that you compute yourself match, you can be fairly certain that the data transmitted correctly. Since the hash is smaller in size than the data, many data sets will generate the same hash, so there is a chance of collision, but a well designed hash function can give a high level of confidence that the data is correct.

The idea is this: Is it possible to use this sort of technique to reproduce something similar to human memory? Human memory is lossy (i.e. imperfect) and we almost certainly don't remember every detail of everything that happens to us.

My thought was, I wonder if we store something like a hash code for memories, and then try every reasonable combination of options until we find a match with the stored hash code. If the search space is relatively small - say, all known English names - then comparing the hash of every name with the name we're trying to remember will eventually allow us to "remember" the name we're looking for, without ever having stored the full name. Since our memory is a big web of connections, we'll probably have a few other ways of confirming we have the right match.

A search on Google for "hash based compression" turns up a stunning 26 entries, so there doesn't seem to be a lot of work on this technique, at least under that name. Most compression techniques seem to be based on removing redundant information rather than attempting to match in this way.

No comments: