Hello,
For security reasons within our organization, I need to compare our =
customer data with a "black list" that we are getting from an outside =
source. I need to get potential matches and get this information to the =
appropiate parties. =20
The problem is, the information is coming in a variety of formats, and =
so the name/address information doesn't always match up. I'm sure I can =
use full-text search in this situation somehow, but I'm not exactly sure =
where to proceed. Can anyone offer ideas to get me started?
Thanks!
--Michael
Raterus,
While you may certainly try to use Full-Text Search (FTS) for this
application and get some level of functionality, especially using the
Forms(Inflectional), for finding word generational terms such as searching
for all products with words of the form dry: dried, drying, for example:
USE Northwind
GO
SELECT ProductName
FROM Products
WHERE CONTAINS(ProductName, ' FORMSOF (INFLECTIONAL, dry) ')
GO
I suspect that you will be un-satisfied with this solution as I've
extensively researched this for a book project. What you should research is
string similarity functions, such as levenshtein edit distance & approx
string matching functions.
Regards,
John
"Raterus" <moc.liamtoh@.suretar.reverse> wrote in message
news:#iW1KxxeEHA.3792@.TK2MSFTNGP09.phx.gbl...
Hello,
For security reasons within our organization, I need to compare our customer
data with a "black list" that we are getting from an outside source. I need
to get potential matches and get this information to the appropiate parties.
The problem is, the information is coming in a variety of formats, and so
the name/address information doesn't always match up. I'm sure I can use
full-text search in this situation somehow, but I'm not exactly sure where
to proceed. Can anyone offer ideas to get me started?
Thanks!
--Michael
|||Thank-you very much!
"John Kane" <jt-kane@.comcast.net> wrote in message =
news:e5RFEG2eEHA.3612@.TK2MSFTNGP12.phx.gbl...
> Raterus,
> While you may certainly try to use Full-Text Search (FTS) for this
> application and get some level of functionality, especially using the
> Forms(Inflectional), for finding word generational terms such as =
searching
> for all products with words of the form dry: dried, drying, for =
example:
>=20
> USE Northwind
> GO
> SELECT ProductName
> FROM Products
> WHERE CONTAINS(ProductName, ' FORMSOF (INFLECTIONAL, dry) ')
> GO
> I suspect that you will be un-satisfied with this solution as I've
> extensively researched this for a book project. What you should =
research is
> string similarity functions, such as levenshtein edit distance & =
approx
> string matching functions.
>=20
> Regards,
> John
>=20
>=20
>=20
> "Raterus" <moc.liamtoh@.suretar.reverse> wrote in message
> news:#iW1KxxeEHA.3792@.TK2MSFTNGP09.phx.gbl...
> Hello,
>=20
> For security reasons within our organization, I need to compare our =
customer
> data with a "black list" that we are getting from an outside source. =
I need
> to get potential matches and get this information to the appropiate =
parties.
>=20
> The problem is, the information is coming in a variety of formats, and =
so
> the name/address information doesn't always match up. I'm sure I can =
use
> full-text search in this situation somehow, but I'm not exactly sure =
where
> to proceed. Can anyone offer ideas to get me started?
>=20
> Thanks!
> --Michael
>=20
>
|||I'm trying to devise a method of doing about the same thing by matching inbound records with the most likely candidate(s) for pre-existing records. Things like name, license number, title etc if they exist in either the existing or inbound records, would
provide the criteria for matching.
I'm scanning the net this morning as a result of testing that I did yesterday. I devised my own algorithm, but I was interested in looking at the character distance apart alternatives. My algorithm low-cases, eliminates vowels uses a couple of numbers s
imilar to ss and 2 dates, and then concatenates everything into one index. The order of the index is such that it narrows from most restrictive to least restrictive. For example explorer:ford:suv : 4 wheels
then I am querying starting at the far end and working my way back mostly to see how it behaves. Thats where I left it yesterday.
************************************************** ********************
Sent via Fuzzy Software @. http://www.fuzzysoftware.com/
Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...
|||John,
While, this should also be possible in SQL Sever 2000, you might want to
checkout the following new functionality of SQL Server 2005:
Fuzzy Lookup and Fuzzy Grouping in Data Transformation Services for SQL
Server 2005
http://msdn.microsoft.com/library/de...FzDTSSQL05.asp
Regards,
John
"John Reid" <jkreid@.frontiernet.net> wrote in message
news:e4Q3v8chEHA.3536@.TK2MSFTNGP12.phx.gbl...
> I'm trying to devise a method of doing about the same thing by matching
inbound records with the most likely candidate(s) for pre-existing records.
Things like name, license number, title etc if they exist in either the
existing or inbound records, would provide the criteria for matching.
> I'm scanning the net this morning as a result of testing that I did
yesterday. I devised my own algorithm, but I was interested in looking at
the character distance apart alternatives. My algorithm low-cases,
eliminates vowels uses a couple of numbers similar to ss and 2 dates, and
then concatenates everything into one index. The order of the index is such
that it narrows from most restrictive to least restrictive. For example
explorer:ford:suv : 4 wheels
> then I am querying starting at the far end and working my way back mostly
to see how it behaves. Thats where I left it yesterday.
>
> ************************************************** ********************
> Sent via Fuzzy Software @. http://www.fuzzysoftware.com/
> Comprehensive, categorised, searchable collection of links to ASP &
ASP.NET resources...
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment