Thursday, March 29, 2012

Comparing two data sets betweeen two tables

I am using Sql Server 7 and here is what I am wanting to do. I want to
be able to compare a data set from table A to see a exact match of data
set exist in table B, if it does not exist then I want to go ahead and
add the data set to tableB.
Here is an example I want check and see if sets of rows from table A = sets of rows from TableB. If they do not match I would then go ahead
and add that set from tableA to tableB.
In my example the set in tableA is grouped by COL1 and COL2 and the set
in tableB is grouped by COLB and COLC.
So in this example 10-A,10-B do not exist in TableB. So I should be
able to add it to Table B.
TableA
COL1 COL2
10 A
10 B
30 X
30 Y
TableB
COLA COLB COLC
1 10 A
1 10 B
1 10 C
2 30 X
2 30 Y
Any help in this regard will be greatly appreciated. I am trying to
avoid cursors to achieve this.
Thanks
ShubINSERT INTO TableB (colb, colc)
SELECT col1, col2
FROM TableA
WHERE NOT EXISTS
(SELECT *
FROM TableB
WHERE colb = TableA.col1
AND colc = TableA.col2) ;
--
David Portas
SQL Server MVP
--|||Hi David,
I am just curious to ask that why SQL Server do not have set
operater MINUS , INTERSECT implemented, which can be implemented by
using query as you did for MINUS operator
If MINUS were implemented the query would be lot simpler
Insert into tableA select * from TableA Minus Select * From TableB
With Warm regards
Jatinder Singh|||EXISTS = INTERSECT
NOT EXISTS = MINUS
Or is it the spelling your concerned about?
--
Nik Marshall-Blank MCSD/MCDBA
<jatinder.singh@.clovertechnologies.com> wrote in message
news:1126248735.883313.245060@.f14g2000cwb.googlegroups.com...
> Hi David,
> I am just curious to ask that why SQL Server do not have set
> operater MINUS , INTERSECT implemented, which can be implemented by
> using query as you did for MINUS operator
> If MINUS were implemented the query would be lot simpler
> Insert into tableA select * from TableA Minus Select * From TableB
>
> With Warm regards
> Jatinder Singh
>|||I do appreciate your quick response and apologise for not explaining my
situation clearly in my first post. I have added few lines of code that
will create and insert the rows for my examples.
After you run the script I provided to create and insert the rows in
TableA and TableB, if I run your query it does not insert rows 10-A and
10-B (First two rows) into TableB. But it really should because
although 10-A,10-B,10-C exist in table B, no set of just 10-A and 10-B
exist in tableB. I am referring to sets in tableB by uniqe value in
COLA in table B. So in my example there are really are two different
sets in table.
--Creates TableA and inserts the rows for my example
select 10 as 'col1','A' as 'col2' into tableA
insert tableA
select 10,'B'
insert tableA
select 30,'X'
insert tableA
select 30,'Y'
--Creates TableB and inserts the rows for my example
select 1 as 'cola',10 as 'colB','A' as colC into tableB
insert tableB
Select 1, 10, 'B'
insert tableB
Select 1, 10, 'C'
insert tableB
Select 2, 30, 'X'
insert tableB
Select 2, 30, 'Y'
--Displays the content of both the tables
select * from tableA order by col1,col2
Select * from tableB order by cola,colb,colc
--This is not acomplishing what I am wanting to achieve
INSERT INTO TableB (colb, colc)
SELECT col1, col2
FROM TableA
WHERE NOT EXISTS
(SELECT *
FROM TableB
WHERE colb = TableA.col1
AND colc = TableA.col2)
Thanks again for your time.
Shub|||You can also do a
select checksum_agg(binary_checksum(*)) from [Table_Name]
on both tables. You should get the same value if data is the same.
Microsoft claims that this is not absolutely one-hundred-percent perfect as
it is possible for two different tables to return the same value based on
the ascii character values; however, this is unlikely and the method has
always worked for me.
walt
<shubtech@.gmail.com> wrote in message
news:1126209114.244231.60810@.g49g2000cwa.googlegroups.com...
>I am using Sql Server 7 and here is what I am wanting to do. I want to
> be able to compare a data set from table A to see a exact match of data
> set exist in table B, if it does not exist then I want to go ahead and
> add the data set to tableB.
> Here is an example I want check and see if sets of rows from table A => sets of rows from TableB. If they do not match I would then go ahead
> and add that set from tableA to tableB.
> In my example the set in tableA is grouped by COL1 and COL2 and the set
> in tableB is grouped by COLB and COLC.
> So in this example 10-A,10-B do not exist in TableB. So I should be
> able to add it to Table B.
> TableA
> COL1 COL2
> 10 A
> 10 B
> 30 X
> 30 Y
> TableB
> COLA COLB COLC
> 1 10 A
> 1 10 B
> 1 10 C
> 2 30 X
> 2 30 Y
>
> Any help in this regard will be greatly appreciated. I am trying to
> avoid cursors to achieve this.
> Thanks
> Shub
>|||not sure if that is what you wanted:
create table t1(i int identity, j int)
insert into t1(j) values(1)
insert into t1(j) values(2)
insert into t1(j) values(3)
create table t2(i int identity, j int)
insert into t2(j) values(1)
insert into t2(j) values(2)
insert into t2(j) values(4)
-- rows in t1 that do not have exact match in t2
select * from t1 t
-- there is a row in t2 with the same PK
where exists(select * from t2 where t2.i = t.i)
-- but some other columns are different
and
(select count(*) from
(
select * from t1
union
select * from t2
)t_both where t_both.i = t.i
) = 2
i j
-- --
3 3
(1 row(s) affected)
-- rows in t2 that do not have exact match in t1
select * from t2 t
-- there is a row in t1 with the same PK
where exists(select * from t1 where t1.i = t.i)
-- but some other columns are different
and
(select count(*) from
(
select * from t1
union
select * from t2
)t_both where t_both.i = t.i
) = 2
i j
-- --
3 4
(1 row(s) affected)
drop table t1
drop table t2
and yes, it's easier to accomplish using MINUS|||SQL Server 2005 has the ANSI operators INTERSECT and EXCEPT. You can
also do a "minus join" in SQL2000:
INSERT INTO TableB (colb, colc)
SELECT col1, col2
FROM TableA
LEFT JOIN TableB
ON TableB.colb = TableA.col1
AND TableB.colc = TableA.col2
WHERE TableB.colb IS NULL ;
--
David Portas
SQL Server MVP
--|||> ON TableB.colb = TableA.col1
> AND TableB.colc = TableA.col2
well If I have more than a hundred columns I need to generate some
really long SQL like this
AND TableB.col121 = TableA.col121
AND TableB.col122 = TableA.col122
kinda boring, is it not? I'd rather have SQL Server do the job for me,
using UNION...|||This is not exactly what I am looking. In my example I should be able
to insert rows
10 A
10 B
because 10 A and 10 B do not exist as a group defined by COlA in table
B
So I should still be able to insert 10 A and 10 B rows with a different
ID for COLA in table B.
After that is inserted into the table I should no longer be able to
insert this into tableB the only other possible rows that I could
insert into table B for my given example would be
10 B
10 C
Hope this make sense. I know I am not explaining it so well.
I do appreciate your time.
Alexander Kuznetsov wrote:
> not sure if that is what you wanted:
> create table t1(i int identity, j int)
> insert into t1(j) values(1)
> insert into t1(j) values(2)
> insert into t1(j) values(3)
> create table t2(i int identity, j int)
> insert into t2(j) values(1)
> insert into t2(j) values(2)
> insert into t2(j) values(4)
> -- rows in t1 that do not have exact match in t2
> select * from t1 t
> -- there is a row in t2 with the same PK
> where exists(select * from t2 where t2.i = t.i)
> -- but some other columns are different
> and
> (select count(*) from
> (
> select * from t1
> union
> select * from t2
> )t_both where t_both.i = t.i
> ) = 2
> i j
> -- --
> 3 3
> (1 row(s) affected)
>
> -- rows in t2 that do not have exact match in t1
> select * from t2 t
> -- there is a row in t1 with the same PK
> where exists(select * from t1 where t1.i = t.i)
> -- but some other columns are different
> and
> (select count(*) from
> (
> select * from t1
> union
> select * from t2
> )t_both where t_both.i = t.i
> ) = 2
>
> i j
> -- --
> 3 4
> (1 row(s) affected)
>
> drop table t1
> drop table t2
> and yes, it's easier to accomplish using MINUS|||I would try something like this:
-- groups that do not have exact matchselect
i.col1
from
(select col1, count(*) col1_cnt from tableA group by col1) i
where not
i.col1_cnt = (select count(*) from tableA
join tableB on tableA.col1 = tableB.colB and tableA.col2 = tableB.colC
where tableA.col1 = i.col1)
or not
i.col1_cnt = (select count(*) from tableB
where tableB.colB = i.col1)|||I think this is going to work. I have tried some example and its
looking like its doing what I want.
Thank you very much for your input and time Alexander.
Alexander Kuznetsov wrote:
> I would try something like this:
> -- groups that do not have exact matchselect
> i.col1
> from
> (select col1, count(*) col1_cnt from tableA group by col1) i
> where not
> i.col1_cnt = (select count(*) from tableA
> join tableB on tableA.col1 = tableB.colB and tableA.col2 = tableB.colC
> where tableA.col1 = i.col1)
> or not
> i.col1_cnt = (select count(*) from tableB
> where tableB.colB = i.col1)|||Ok I have a similar problem that I am unable to figure it out.
Please run the following script that will create table tabA and tabA
with few rows that will help me illustate my issue.
Basically I am trying to update col3 in tabB with the value of colc in
tabA. In tabA there are two sets of rows that are identified by unique
value in colc. So the fisrt two rows are one set and the last three are
another set. The rows in tabB should match the first two rows in tabA.
Can anyone please help me with this update statement. I havce tried few
different things but I am unable to come with a solution
Thanks in advance
Shub
--*******************************************
select 100 as cola, 1 as colb, 1 as colc
into tabA
insert tabA
Select 100,2,1
insert tabA
Select 100,1,2
insert tabA
Select 100,2,2
insert tabA
Select 100,3,2
select 100 as col1, 1 as col2, 0 as col3
into tabb
insert tabb
select 100,2,0
select * from taba order by cola,colc,colb
select * from tabb
--***********************************************|||how should tabb look like after the update?|||After the update tabB should have values of 1 for col3. Thanks for
looking into it Alexander.|||you welcome. Try this:
update tabb set col3 = matches.colc
from tabb,
(select colc, count(*) c from tabA where cola=100 group by colc)
all_taba_groups,
(select colc, count(*) c from tabA, tabB
where cola=100 and col1=100 and colb=col2
group by colc) matches
where col1=100 and all_taba_groups.colc = matches.colc and
all_taba_groups.c = matches.c
with all the usual warnings than update ... from will not tell you if
there is an ambiguity (i.e. more than one matching colc)|||Thanks and that works great, however I do not think I could hardcode
100 because there could be other sets present in both tables. Sorry my
example did not illustrate that. Here please run this script.
After the update I want values of 1 for col3 in tabb where col1= 100
and
values of 4 in col3 in tabb where col1 = 200
--******************************************
select 100 as cola, 1 as colb, 1 as colc
into tabA
insert tabA
Select 100,2,1
insert tabA
Select 100,1,2
insert tabA
Select 100,2,2
insert tabA
Select 100,3,2
insert taba
select 200, 1,3
insert taba
select 200, 2,3
insert taba
select 200, 1,4
insert taba
select 200, 2,4
insert taba
select 200, 3,4
select 100 as col1, 1 as col2, 0 as col3
into tabb
insert tabb
select 100,2,0
Insert tabb
select 200,1,0
Insert tabb
select 200,2,0
Insert tabb
select 200,3,0
select * from taba order by cola,colc,colb
select * from tabb order by col1,col2
--***************************************************|||try this one:
update tabb set col3 = matches.colc
from tabb,
(select colc, cola, count(*) c from tabA group by colc, cola)
all_taba_groups,
(select colc, cola, count(*) c from tabA, tabB
where cola=col1 and colb=col2
group by colc, cola) matches
where tabb.col1=matches.cola
and all_taba_groups.cola = matches.cola
and all_taba_groups.colc = matches.colc
and all_taba_groups.c = matches.c|||Actually I want col3 to be updated with 4 for all rows in tabb where
col1= 200.|||select * from tabb
update tabb set col3 = matches.colc
from tabb,
(select colc, cola, count(*) c from tabA group by colc, cola)
all_taba_groups,
(select col1, count(*) c from tabB group by col1) all_tabB_groups,
(select colc, cola, count(*) c from tabA, tabB
where cola=col1 and colb=col2
group by colc, cola) matches
where tabb.col1=matches.cola
and all_taba_groups.cola = matches.cola
and all_taba_groups.colc = matches.colc
and all_taba_groups.c = matches.c
and all_tabB_groups.col1 = matches.cola
and all_tabB_groups.c = matches.c
select * from tabb order by col1
--drop table tabA
--drop table tabB|||You daaa man. That is exactly what I wanted. Thanks for all your help,
it is greatly appreciated.|||Alexander,
I have run into a situation with this first query you helped me with.
Basically I want a query to report if there is any mismatch of set of
groups between tabaleA and tableB. When I run the query you helped me
with in this example it returns the value 10 but it really should not
return any rows in this instance because there is no mismatch of groups
between these two tables. This query worked in everyother case except
this. Here is the script to create tableA and TableB with data in it.
You will notice that there is no mismatch but the query is returning
10.
--**************************************
--Creates tablea
select 10 as col1,'A' as col2
into tablea
insert tablea
select 10,'B'
insert tablea
select 10,'C'
insert tablea
select 30,'X'
insert tablea
select 30,'Y'
insert tablea
select 40,'A'
insert tablea
select 40,'B'
insert tablea
select 40,'C'
--tableB
Select 1 as cola,10 as colb,'A' as colc
into tableb
insert tableb
select 1,10,'B'
insert tableb
select 2,30,'X'
insert tableb
select 2,30,'Y'
insert tableb
select 3,40,'A'
insert tableb
select 3,40,'B'
insert tableb
select 3,40,'C'
insert tableb
select 4,10,'A'
insert tableb
select 4,10,'B'
insert tableb
select 4,10,'C'
select * from tablea order by col1,col2
select * from tableB order by cola,colb,colc
-- groups that do not have exact matchselect
select
i.col1
from
(select col1, count(*) col1_cnt from tableA group by col1) i
where not
i.col1_cnt = (select count(*) from tableA
join tableB on tableA.col1 = tableB.colB and tableA.col2 = tableB.colC
where tableA.col1 = i.col1)
or not
i.col1_cnt = (select count(*) from tableB
where tableB.colB = i.col1)|||in fact the original query displays groups that have mismatches:
select
i.col1
from
(select col1, count(*) col1_cnt from tableA group by col1) i
where not
i.col1_cnt = (select count(*) from tableA
join tableB on tableA.col1 = tableB.colB and tableA.col2 = tableB.colC
where tableA.col1 = i.col1)
or not
i.col1_cnt = (select count(*) from tableB
where tableB.colB = i.col1)
10 has both a match (4) and a mismatch(1)
the query that shows groups that don't have matches is slightly
different:
select * from tableA
where col1 not in(
-- groups that have exact matches
select
i.col1
from
(select col1, count(*) col1_cnt from tableA group by col1) i,
(select colA, colB, count(*) colB_cnt from tableB group by colA, colB)
j
where i.col1 = j.colB
and i.col1_cnt = j.colB_cnt
and i.col1_cnt = (select count(*) from tableA
join tableB on tableA.col1 = tableB.colB and tableA.col2 = tableB.colC
where tableA.col1 = i.col1 and tableB.cola = j.colA)
)
BTW what do you need all this stuff for.
Is it online dating service looking for a perfect match?|||You are right about the first query and I see how you are looking at
it. However let me expalin what I am trying to do. By the way I want to
thank you for all your help on this it is greatly appreciated.
Basically my project is to allow telephone companies to set up a group
of telephone services and call it a Super Duper Bundle. Customers
could then select a group of customised services from that super duper
Bundle and still call it super duper Bundle. So if the Super duper
Bundle consisted of three different services Voice Mail, CallerID,
Three way calling the possible difeerent sets of Super Duper Bundle
could be consisting of
UniqueID 1 Super Duper Bundle- Voice Mail, CallerID
unique ID 2 Super Duper Bundle- Voice Mail, Three way Calling
unique ID 3 Super Duper Bundle- Caller Id, Three way calling
unique ID 4 Super Duper Bundle -CallerId, Three way calling, Voice
Mail.
So before I create a new bundle set (one of the above 4 combinations)
in this case, I need a query to see if it already exist in the
database. So in my example
TableA consists of all the diferent combination that already exist in
the database.
IDs refer to colA, Superduper Bundle refers to colb and different
services are represented in Colc with individual rows. So the rows in
tableA are grouped together by cola to define a combination.
TableB consists of another set of combination of this Bundle that I
need to check to see if it already exist in the database.
So if you run
select
i.col1
from
(select col1, count(*) col1_cnt from tableA group by col1) i
where not
i.col1_cnt = (select count(*) from tableA
join tableB on tableA.col1 = tableB.colB and tableA.col2 = tableB.colC
where tableA.col1 = i.col1)
or not
i.col1_cnt = (select count(*) from tableB
where tableB.colB = i.col1)
I should not get anything back because the combinations as a group
already exist in tableA, there is no mismatch.
Let me know if you can come up with a solution....
Thanks Again
Shub

No comments:

Post a Comment