Monday, March 19, 2012

compare two different date format column?

Hi guys,

I want to compare two date fields, which are located on different tables. One field contains a date value in this format: ###/###-#### and the other field on the other table contains (###-###-#### and some ##########)

So I want to compare this two different format. May be changing both the formats to a common format and make the comparison, something like that.

Any idea is appreciate.

Hi Amde

Just for clarity, would you be able to provide real examples (from your tables) of a date in each format?

Thanks
Chris

|||

oh yeah,

Table 1.

Phone no.

847/678-2828

Table 2.

Phone no.

333-777-5555

252 321 3333

2223334444

(123)456-789

...

|||

For phone numbers, I would consider creating a Function that strips out all but the numbers -and then saving and comparing only numbers. (Leave the formating for the client application.)

Comparing DATE fields will be somewhat different -however formating is not an issue if the datatype is (small)datetime. If Date data is stored as (var)char -that is a whole different can of worms.

|||

Well that changes things slightly - you originally said that these were date fields...

Another couple of questions - how many rows of data are you looking to compare? Are there any other formats that we should know about?

Thanks
Chris

|||

I've just put together the function below that will strip any non-numerical characters from an input string.

Don't expect fantastic performance from the function when comparing thousands of rows. It would be far better, as Arnie highlighted, to store an unformatted string and leave the formatting to the client.

Chris

CREATE FUNCTION dbo.CleanString(@.InputString VARCHAR(8000))
RETURNS VARCHAR(8000)
WITH RETURNS NULL ON NULL INPUT
AS
BEGIN
DECLARE @.NewString VARCHAR(8000)
SET @.NewString = ''
DECLARE @.ValidChars CHAR(10)
SET @.ValidChars = '0123456789'
DECLARE @.Loop INT
SET @.Loop = 1
WHILE @.Loop <= LEN(@.InputString)
BEGIN
IF CHARINDEX(SUBSTRING(@.InputString, @.Loop, 1), @.ValidChars) > 0
SET @.NewString = @.NewString + SUBSTRING(@.InputString, @.Loop, 1)

SET @.Loop = @.Loop + 1
END

RETURN @.NewString
END
GO

SELECT dbo.CleanString('1329842DCXLXMC2QPWXMQWX09SWDM09324')
--Returns: 132984220909324

SELECT CASE WHEN dbo.CleanString('1M&&2-FL3FCM4+=5M#6khn7wmx8KE9AAS0')
= dbo.CleanString('cxmn1dUI2QWWJHB3NCU4PWCK5IVN67DBW8XW9K 0')
THEN 'These strings are the same when cleansed.'
END,
dbo.CleanString('1M&&2-FL3FCM4+=5M#6khn7wmx8KE9AAS0'),
dbo.CleanString('cxmn1dUI2QWWJHB3NCU4PWCK5IVN67DBW8XW9K 0')
--Returns: 'These strings are the same when cleansed.', 1234567890, 1234567890

|||

Oh!my bad, I am really sorry.

- I want to say a phone number (which is varchar data type)

- The first table contain about 40,000 records and the second table contains about 60, 000 records. So I want to get the difference.

- Those are the common formats. I hope if I find a solution for one of the format, I can apply similar logic.

Sorry for the confusion.

|||

For a large amount of data, avoiding the loop may increase performance.

By using a 'Numbers' table, this method avoids looping.

-- Prepare Numbers Table

SET NOCOUNT ON
CREATE TABLE Numbers ( n int )
GO

DECLARE @.n int
SET @.n = 1
WHILE @.n < 100
BEGIN
INSERT INTO Numbers VALUES ( @.n )
SET @.n = ( @.n + 1 )
END
-- End Numbers Table

CREATE FUNCTION dbo.NumbersOnly
( @.Text varchar(100) )
RETURNS varchar(100)
AS
BEGIN
DECLARE @.Output varchar(100)
SELECT @.Output = ''
SELECT @.Output = @.Output + CASE
WHEN RealNum LIKE '[0-9]'
THEN RealNum
ELSE ''
END
FROM (SELECT substring( @.Text, n, 1 ) RealNum
FROM Numbers
) d
RETURN @.Output
END
GO

SELECT dbo.NumbersOnly( '(555) 987-4321' )

--
5559874321

|||

You should clean the data and store it in a normalized manner. This is the best way to handle. Solutions using UDFs are fine but they will be slow and you will have to keep changing them if you find newer formats or rules. For example, how do you handle phone numbers like 1-800-96MSFAX? What about phone numbers from other countries etc? There are so many cases where hard-coded solutions break and you will end up with more problems than necessary. I have built data warehouses in the past (in my previous company for data mining/analytics purpose) and we used public geography data, address book scrubbing software etc. So it is better to invest in a solution or package that is specialized for the type of data you are dealing with and focus on your business problem. Once you normalize the phone numbers then it is very easy to compare them - you can easily restrict phone numbers based on country code for example or compare different formats easily.

|||

I agree with you. Having a normalized database is the best thing. However, I want to do this task only for the time being. Here is the thing, I received an excel sheet which contains thousands of records. On the other hand we have a table in our database. So my goal is to get the difference and insert the new data to my table. So what I did is I created an SSIS package to migrate the excel sheet data into SQL Server table, so that it will be easy to write a query to get the differential data. Inserting the data is not an issue. All I need is to compare these two tables and get the different. The only unique field I have to do the comparison is the PHONE column. And the PHONE column has a different format, as I tried to explain before.

Thx.

|||

Thanks guys, I really appreciate your help. I'm good now.

No comments:

Post a Comment