Monday, March 19, 2012

Compare strings in text field

I am trying to build a simple search engine using Sql Server 2000 to scan information about approximatelly 20.000 products.

Heres what I am doing:

I created a table called keywords that contains a reference for each product.

keyword -> varchar(100)
items -> Text

keyword data example:

[keyword] [items]
car 1, 3, 5, 7
blue 3, 5
compact 1,7

I am not using clustered index.

To search basically I run the "AND" or "OR" to select the keywords I want to target.
I need to run another select that would compare the data in the items field depending of the condition selected. If "AND" clause is used I would need to compare all the items that contains the same reference, for example:

looking for car compact using "AND" clause
result = 1

looking for car compact using "OR" clause
result = 1,3,5,7

There is no table that holds references. The items are stored in a text field in the keyword table. I can compare data using script like AsP spliting the items by comma or space, but that can be too slow and use up a lot of RAM. Another solution would be to use a table to hold the references but that would affect performance dramatically because of the large number of records created and storage space used. One example, if I have 60.000 keywords and each keyword has an avereage of 200 references, I would have to generate 12 milion records.

I want to know if there is a function or routine in SQL server to compare matched references on the fly in the server between two or more fields and how should I do it.

In addition, in this scenario, how should a clustered index help?

Thanks

RodrigoNot sure i get it but from what i understand why do you have a , delimited string of the items

why not a seperate row for each item so that, you will get all the items in a query that you can then use easier...i take it the item is the foreign key for some record in another table|||If I am right, that physically generating the rows would fall in the scenario in the 12 milion scenario leading to unecessary records and lots of sql file pages to hold the data references. I 'd ratter have a little IO because I could cache the results. Do you have any suggestion?

Thanks
Rod

No comments:

Post a Comment