New Search Engine

Date: Thu, 1 Jun 2000 10:38:06 -0300 (ADT)
From: "David L. Potter" <potter@chebucto.ns.ca>
To: csuite-dev@chebucto.ns.ca, ccn-tech@chebucto.ns.ca
Precedence: bulk
Return-Path: <csuite-dev-mml-owner@chebucto.ns.ca>

next message in archive
next message in thread
previous message in archive
Index of Subjects



Hi Everyone....

I'll offer a wish list (as I see it) of functionality with some 
explaination and we can see what others think....

The objective being to make our site(s) the primary access to, (and 
source of) information about our community... that would include 
primarily:

	-	our information providers
	-	information hosted on close partners (here in Halifax the regional 
		municipality is potential partner...)

It should also include sites hosted on other ISPs that focus on local 
content... the local 'Country Music Association', the resource pages 
developed by a local kayaking enthusiast... 

I see two (or three) distinct functions/software entities... 

1) the locating (1a) classification of documents, and
2) the retrieval of information from a database, based on queries 

As a webcrawler (function 1), it should be able to search out the local 
'country music association' but stop before it gallops off and indexes 
all of Nashville....

With respect to CSuite it should honour the convention of indexing 
UpperCase Directories and ignoring lowercase directories...

---

It should be biased towards META TAGS but be capable of indexing the body 
of documents if the META TAG information is absent or not useful...

It should be capable of accepting a 'dictionary' of local placenames 
and/or other identifiers that would assist in identifying the resource as 
relating to our communities...

-----

It should store the information in a form that can accomodate: 

+ location codes(our IP, partner site, other...),
+ 'reviews' of the site by users, (This is the best Donny Osmond site 
   ever!)
+ 'last updated' info so we can present most current resources first (and 
   send the webcrawler off ocassionally to see if the page is still 
   there...) 

-----

The retrieval tool needs to be able to read the database and rank the 
matching matches, and format the output according to the evolving 
(template) whims of site managers/individual users (whoa!)... "hide the link 
to that organization which ran off with another ISP at the very bottom of a 
page twenty layers down... and make it SMELL BAD, and I mean REAL BAD!"

cheers... ;-)

david potter


PS: A few years ago, IC funded work on document classification at National 
Research (I think). It was hoped to have commercial application but at the 
time there also seemed some possibility that CSuite might get a 
non-commercial arrangement...

dlp



next message in archive
next message in thread
previous message in archive
Index of Subjects