[ptsefton.com] | [CV & Bio]

Jon Udell needs a place to put stuff

2004-04-28

Over the last year or so Jon Udell has published code fragments to do a number of useful search-related things. In May last year there was an email search system that grabbed stuff from Outlook and full-text indexed it with Lucene. This year there have been two generations (one, two) of an RSS feed storage system that normalises all feeds to into XHTML, then keeps them in Berkeley DB XML so you can query them using XPath.

Now he's talking about a proxy on your computer to save all the web pages you look at, which would let you search them with XPath, (And "Local proxies are geeky curiosities today, but someday we'll wonder how we lived without them.")

Imagine if we could combine all of these bits of sample code into a single program, ready to act as the cache for your web proxy and provide an indexed view of your own stuff. Whether you are dealing with an RSS feed, an email, a web page you have visited, a word processing document that's in development, a playlist, or whatever, there is always a normalised XHTML rendition available, and you can do XPath and full text queries across all of it.

More importantly though, your computer could use XPath to find relationships between things that didn't know about each other until they got to your computer.

Normalization into Xhtml would mean that anything that has a 'To' or a 'From'or a 'Date' field can be searched in a simple way. This will find all the html for all the items you got from my work address:

/html[head/meta[@name='From' and @content='peter.sefton@nexted.com']]]

Or maybe you would just like to collect head elements and make a summary.

/html/head[meta[@name='From' and @content='peter.sefton@nexted.com']]

I call my version of this lightweight repository, which Jon Udell has been hinting at for ages, the 'Fragletorium' - which I assert is Latin for a place to keep your fraglets — I did study Latin for one term in 1977 at school. And my classmates used to assert that it would be of no use. How wrong they were.

At this stage the Fragletorium is somewhat of an imaginary friend. Do you believe in it?