Thursday, March 29, 2012

comparing text fields - Second try

Hi all,
What is the fastest and best way to compare two text fields(data type
TEXT). I just need
to know when the values are different. Are there any functions available to
do this? Please provide any code if you have it or any links regarding this
problem.
Thanks in advance...
anyone?
"helpful sql" <nospam@.stopspam.com> wrote in message
news:eQ2M2AlVFHA.132@.TK2MSFTNGP14.phx.gbl...
> Hi all,
> What is the fastest and best way to compare two text fields(data type
> TEXT). I just need
> to know when the values are different. Are there any functions available
> to
> do this? Please provide any code if you have it or any links regarding
> this
> problem.
> Thanks in advance...
>
|||Yeah, just a moment, coding will take some time, even I am right at home now
;-)
I coded a function to give you the "kind-of" checksum for the two columns.
Due to the fatc that the substring only brings back a varchar from the
function you can normally compare only the first 8000 bytes of a text
string, but... I coded a function which will chop the data into smaller
chunks , produces a checksum of every part and adds up the checksum. I now
that there could be a case that eventually two text columns wil produce the
same checksum, but i think that could be solution you can life with:
CREATE Function CompareText
(
@.EmployeeId INT
)
RETURNS INT
AS
BEGIN
DECLARE @.Datalength INT
DECLARE @.Restlength INT
DECLARE @.Checksum INT
DECLARE @.StartChunk INT
DECLARE @.EndChunk INT
SET @.StartChunk = 0
SET @.Datalength = (Select Datalength(Photo) From Employees Where EmployeeID
= @.EmployeeId)
SET @.Restlength = @.Datalength
SET @.Checksum = 0
While @.Restlength > 0
BEGIN
IF @.Restlength > 8000
BEGIN
SET @.EndChunk = 8000
END
ELSE
BEGIN
SET @.EndChunk = @.Restlength
END
SET @.Checksum = @.Checksum + (Select
CHECKSUM(SUBSTRING(Photo,@.StartChunk,@.Restlength)) From Employees Where
EmployeeID = @.EmployeeId)
SET @.StartChunk = @.StartChunk + @.Restlength
SET @.Restlength = @.Restlength - @.EndChunk
END
RETURN @.Checksum
END
This function has to be coded in your way, to not use the Employee Table of
the northwind database.
In this Example you can use the code as following (due to the case there is
only one text/image column in the northwin database:
Select * from Employees where dbo.Comparetext(EmployeeID) =
dbo.Comparetext(EmployeeID)
HTH, Jens Suessmeyer.
http://www.sqlserver2005.de
"helpful sql" <nospam@.stopspam.com> schrieb im Newsbeitrag
news:uuX8kilVFHA.628@.tk2msftngp13.phx.gbl...
> anyone?
> "helpful sql" <nospam@.stopspam.com> wrote in message
> news:eQ2M2AlVFHA.132@.TK2MSFTNGP14.phx.gbl...
>
|||Two text fields in the same record or in related tables? If you are going to
be comparing a large number of records, then you may want to speed things up
by first determing those records where the length of the text values are not
the same. Those are obviosly different. Perhaps store their primary key IDs
in a temporary table. Once done, you can then perform the full test
comparison against the remaining few that are the same size.
"helpful sql" <nospam@.stopspam.com> wrote in message
news:eQ2M2AlVFHA.132@.TK2MSFTNGP14.phx.gbl...
> Hi all,
> What is the fastest and best way to compare two text fields(data type
> TEXT). I just need
> to know when the values are different. Are there any functions available
to
> do this? Please provide any code if you have it or any links regarding
this
> problem.
> Thanks in advance...
>
|||Maybe you can use a combination of the answers in your other post and
DATALENGTH ...although this will not catch everything.
If this isn't enough for you, then I don't think you have much choice then
to do the comparison in slices of 8000 characters.
I've never had to do this so I can't help you out much. The SUBSTRING
function can return any slice you want.
Ex: select substring(columnName, 8000, 8000)
But check Datalength first, if that's doesn't match, then you don't have to
go any further.
"helpful sql" <nospam@.stopspam.com> wrote in message
news:uuX8kilVFHA.628@.tk2msftngp13.phx.gbl...
> anyone?
> "helpful sql" <nospam@.stopspam.com> wrote in message
> news:eQ2M2AlVFHA.132@.TK2MSFTNGP14.phx.gbl...
>
|||Some (bad coded) applications write chunks of data in the database in steps
of 50, 100, 200 ... steps, so comparing only the length via Datalength()
could be a problem because many columns would "seem" to be the same but they
arent.
Just a experience and my two cents.
HTH, Jens Suessmeyer.
http://www.sqlserver2005.de
"JT" <someone@.microsoft.com> schrieb im Newsbeitrag
news:%23MeyfzlVFHA.228@.TK2MSFTNGP12.phx.gbl...
> Two text fields in the same record or in related tables? If you are going
> to
> be comparing a large number of records, then you may want to speed things
> up
> by first determing those records where the length of the text values are
> not
> the same. Those are obviosly different. Perhaps store their primary key
> IDs
> in a temporary table. Once done, you can then perform the full test
> comparison against the remaining few that are the same size.
> "helpful sql" <nospam@.stopspam.com> wrote in message
> news:eQ2M2AlVFHA.132@.TK2MSFTNGP14.phx.gbl...
> to
> this
>
|||Am I correct in assuming that if the text values have different lengths, for
example 2000 vs. 2100, then they are different without performing a text
comparison? Once we have that list of these candidates in a temporary table,
we can exclude them from the query which performs the text compare.
"Jens Smeyer" <Jens@.Remove_this_For_Contacting.sqlserver2005.de> wrote in
message news:OHuoR3lVFHA.3044@.TK2MSFTNGP10.phx.gbl...
> Some (bad coded) applications write chunks of data in the database in
steps
> of 50, 100, 200 ... steps, so comparing only the length via Datalength()
> could be a problem because many columns would "seem" to be the same but
they[vbcol=seagreen]
> arent.
> Just a experience and my two cents.
> HTH, Jens Suessmeyer.
> --
> http://www.sqlserver2005.de
> --
> "JT" <someone@.microsoft.com> schrieb im Newsbeitrag
> news:%23MeyfzlVFHA.228@.TK2MSFTNGP12.phx.gbl...
going[vbcol=seagreen]
things[vbcol=seagreen]
type[vbcol=seagreen]
available
>
|||Why couldn't you do the following:
Create Table Foo
(
Id Int Primary Key
, TextData1 Text
, TextData2 Text
)
Insert Foo(Id, TextData1, TextData2)...
Select F.*
From Foo As F
Where Substring(F.TextData1,1,DataLength(F.TextData1))
= Substring(F.TextData2,1,DataLength(F.TextData2))
Thomas
"helpful sql" <nospam@.stopspam.com> wrote in message
news:eQ2M2AlVFHA.132@.TK2MSFTNGP14.phx.gbl...
> Hi all,
> What is the fastest and best way to compare two text fields(data type TEXT).
> I just need
> to know when the values are different. Are there any functions available to
> do this? Please provide any code if you have it or any links regarding this
> problem.
> Thanks in advance...
>
|||NM..Substring returns a max of 8K
Thomas
"Thomas Coleman" <replyingroup@.anywhere.com> wrote in message
news:OYAMCQmVFHA.132@.TK2MSFTNGP14.phx.gbl...
> Why couldn't you do the following:
> Create Table Foo
> (
> Id Int Primary Key
> , TextData1 Text
> , TextData2 Text
> )
> Insert Foo(Id, TextData1, TextData2)...
>
> Select F.*
> From Foo As F
> Where Substring(F.TextData1,1,DataLength(F.TextData1))
> = Substring(F.TextData2,1,DataLength(F.TextData2))
>
> Thomas
>
> "helpful sql" <nospam@.stopspam.com> wrote in message
> news:eQ2M2AlVFHA.132@.TK2MSFTNGP14.phx.gbl...
>
|||> Where Substring(F.TextData1,1,DataLength(F.TextData1))
> = Substring(F.TextData2,1,DataLength(F.TextData2))
substring() only returns 8000 chars at a time...
create table FooText(textdata1 text, textdata2 text)
...use the following to build strings > 8000 chars...
--SELECT REPLICATE('a', 8000)
--SELECT REPLICATE('a', 500)
--SELECT REPLICATE('b', 20)
...paste those into an insert statement, then massage the insert statement
so that the last character is different...
INSERT FooText SELECT
'aaa...bbb',
'aaa...bba'
...now watch the result...
select count(*) from FooText
where Substring(textdata1,1, Datalength(textdata1))=Substring(textdata2,1,
Datalength(textdata1))
1

No comments:

Post a Comment