knowbot - an independent, self-acting computer program that seeks for information on behalf of a user, possibly replicating itself on other hosts on the network. As the knowbot performs its task, it sends reports back to the user, and self-destructs when it completes its task.
-Internet in Plain English by Bryan Pfaffenberger
Although it sounds like a character from a William Gibson novel, the concept of the "knowbot", an automated program which searches for information for you, is slowly becoming a reality on the Internet. Knowbots stand to change the face of research and information retrieval: rather than spending hours searching for an electronic needle in a virtual haystack, why not let the information come to you?
I first wrote about a kind of Knowbot in an article about the Stanford Netnews Filtering Service. It's a program that periodically searches the entire Usenet for the information you request and e-mails any "hits" back to you. (For information about it, send mail to firstname.lastname@example.org with the word "help" in the message body.)
I'm happy to report that the concept and power of knowbots are expanding on the īnet. The actual implementation of today's knowbots are decidedly less sexy than the futuristic-sounding self-replicating, auto-destructing Knowbot defined above. And they still don't resemble the fancy animated "agents" that Apple Computer predicted in a furturistic promotional video a few years ago. Still, knowbots are very handy as automated news clipping services. Here we'll examine what you can do with two more Knowbot utilities - another one from Stanford and one by the folks at San Jose Mercury News.
Here's how it works: once you sign up, you send a search request via e-mail to the NewsHound program. Every hour or so, NewsHound checks to see if any of its recent news and ads match your request. It automatically sends the ones that do to you via electronic mail. I use the service to send me articles about online services, my favorite musicians and ads for used computers. One might also use it to scope out editorials about Newt Gingrich, news about your home town or ads for used Pintos.
Each request that you submit includes lists of "required" terms, "possible" ones and "excluded" terms - phrases that you just don't want to hear about. The NewsHound uses a type of fuzzy logic to find the most relevant articles and advertisements. Essentially this involves counting the number of "possible" and "required" terms found in an article and then assigning the article a "selectivity" score. The score of an article ranges from 1 to 100. The higher the number, the more relevant the article.
NewsHound searches a variety of newsy databases in pursuit of your information needs, including the San Jose Mercury News, the Chicago Tribune, Detroit Free Press, Miami Herald, Philadelphia Inquirer, The New York Times News Service, The Associated Press, PR Newswire as well as published and unpublished articles from 60 other sources, for a total of about 2,000 articles each day.
NewsHound isn't free, but its inexpensive enough to be affordable for just about everyone. The special rate for charter subscribers is $4.95 per month for a up to five search profiles. That is a flat rate that applies regardless of the number of stories or ads delivered to your e-mailbox during the month. (The rate for non-charter subscribers will be $9.95 each month.)
To register to use NewsHound, call 1-800-818-NEWS or 408-297-8495. For more information, send electronic mail to email@example.com.
A user interested in receiving bibliographical records of technical reports available from the Math/Computer Science Library at Stanford submits his "interest profile" to the service. Periodically you will receive new information relevant to his interests via electronic mail.
Queries formed as plain English text (for example "high speed fiber optics communication") submitted via e-mail. Abstracts of technical reports are returned based on their relevancy to your query. As new and relevant reports are added to the database, the Knowbot will inform you. Some of them are available digitally - you can receive the full text with a simple e-mailed request. For others, you'll have to trek to your local library (or Stanford's) for the actual report. For more information about using the service, send e-mail with word "help" in message body to firstname.lastname@example.org.
[Example of Newshound service] To: NewsHound@sjmercury.com Subject: request create: music search: articles possible: laurie anderson,negativland,they might be giants selectivity: 20 -- From: NewsHound@sjmercury.com (NewsHound) To: email@example.com Subject: Your NewsHound request You now have 3 NewsHound profiles. Their attributes are: Title: INTERNET Search: articles Possible: internet,on-line service,online service,worldwideweb,mosaic,compuserve,america online,eworld,prodigy Excluded: information superhighway Selectivity: 30 Title: MUSIC Search: articles Possible: laurie anderson,negativland,they might be giants Selectivity: 20 Title: SUN Search: ads Possible: sun,sparc,sparcstation,unix,macintosh,mac Required: for sale Selectivity: 50 -- From: NewsHound@sjmercury.com (NewsHound) To: firstname.lastname@example.org Subject:  STUDENT-RUN RADIO STATION A FIRST ON THE INTERNET Selected by your NewsHound profile entitled "INTERNET". The selectivity score was 30 out of 100. Student-Run Radio Station a First on The Internet By JULIANNE BASINGER, Associated Press Writer [Example of an Electronic Library Project search] From: Electronic Library
To: email@example.com Subject: Elib Response Search knowbots information filtering Threshold 50 Score : 66 ID : 124547 Location : Math/CS Library Author : Goldberg, D. Author : Nichols, D. Author : Oki, B. Author : Terry, D. Title : Using collaborative filtering to weave an information tapestry. Institute : Xerox Corporation. Palo Alto Research Center. Report : CSL-92-10. Date : 1992. Keyword : Continuous query. Keyword : Information filtering. Keyword : Active database. Keyword : Query rewrite. Keyword : Bounding monotonic query. Keyword : Incremental query. Keyword : Query language. Score : 59 ID : 025494 Location : Math/CS Library Author : Yan, Tak W. Author : Garcia-Molina, Hector. Title : Index structures for information filtering under the vector space model. Institute : Stanford University. Department of Computer Science. Report : STAN-CS-TR-93-1494. Date : 1993. Score : 55 ID : 124542 Location : Math/CS Library Author : Terry, D. Author : Goldberg, D. Author : Nichols, D. Author : Oki, B. Title : Continuous queries over append-only databases. Institute : Xerox Corporation. Palo Alto Research Center. Report : CSL-92-5. Date : 1992. Keyword : Information filtering. Keyword : Active database. Keyword : Query rewrite. Keyword : Bounding monotonic query. Keyword : Incremental query.