Tuesday, March 27, 2012

Comparing Rows in a table

Hi guys,

I just wanted to know if it is possible to compare each row in the table with the previous one.

Im looking to compare each row the time column with the previous to see if there is a gap of more then 15 minutes.

I was thinking of using the datediff function to compare but not sure how i go about accessing row by row and compare with the previous one.

Any help much appreciated.

Thanks.

I would suggest using a self-join based on your datetime field. For this kind of join it is critical that you have an index built on the target datetime field. I will put together a mock-up of this.|||

I dont understand target datetime field?

Do you mean the table that ill be out putting the results to?

Thanks for your quick response and putting together the mock-up of this.

|||

Here is a mockup. I sprayed 32767 records out over 43 days going backwards from 3/20/7 using an iterator table and a scalar RAND udf. These two items can be found on this page:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1330536&SiteID=1

Here is the mockup. I have not yet evaluated whether or not this is efficient. Stand by; another post will follow..

declare @.mockup table
( rowid integer,
aDateTime datetime,
unique (aDateTime, rowId)
)

insert into @.mockup
select iter,
cast ('3/20/7' as datetime) - 43 * dbo.rand() as aDateTime
from small_iterator
--select * from @.mockup

select rowId,
aDateTime,
nextDateTime,
convert (varchar(12), nextDateTime - aDateTime, 108)
as [Time Difference]
from ( select rowId,
aDatetime,
( select top 1 b.aDateTime
from @.mockup b
where b.aDateTime > a.aDateTime
order by aDateTime, rowId
) as nextDatetime
from @.mockup a
) x
where nextDateTime - aDateTime >= cast(cast ('0:15:00' as datetime) as float)

-- rowId aDateTime nextDateTime
-- - -- --
-- 23623 2007-02-11 09:51:57.530 2007-02-11 10:07:25.077 00:15:27
-- 12068 2007-02-12 11:46:03.190 2007-02-12 12:01:25.987 00:15:22
-- 14377 2007-02-14 03:17:02.890 2007-02-14 03:32:39.907 00:15:37
-- ...

-- 30564 2007-03-17 04:32:18.287 2007-03-17 04:48:51.520 00:16:33
-- 9053 2007-03-18 21:07:15.663 2007-03-18 21:22:21.220 00:15:05

Oh yeah: I didn't perform a self join; this looked better. Sorry, I am in the middle of a restore.

OK! This puppy doesn't look too bad: With this index I am getting a plan of index scan / index seek. To process the entire 32767 records, the IO is:

-- Table '#401BB779'. Scan count 32768, logical reads 65746, physical reads 0, read-ahead reads 0.

That's pretty good. You might be able to find a better way to do this, but this is at least pretty good.

REMEMBER!

It is CRITICAL that the date/time column has an index! Otherwise you are NOT going to be able to search by index seek.

|||

Just to clarify - how do you define your previous row? Is your definition based on the primary key, or the datetime column?

i.e.

ID Date

1 05-Jun-2007

2 06-Mar-2007

3 02-Dec-2006

In the above example - if the current row is row 2, are you looking to compare row 2's date with that of row 1 or row 3?

Chris

|||

In our GPS Table we have an ID column for each entry.

What we need to do is compare current row time by next row time.

For example

in you example above we would compare Row 1 with Row 2 if there is a gap of 15 mins in time we record that.

Else we move to row 2 and compare it with row 3 and see if there is a gap of 15 minutes.

We need to do this for our entire database table and record each time there is gap of 15 mins or greater between 2 entries in the table.

Our goal is to split the table into a new trips table.

|||

Does your table have an IDENTITY column? If so, are there any gaps between the IDENTITY values?

Thanks
Chris

|||

Thanks for your reply.

The date time column is only referenced by the Auto ID that is created when the data is inputted into our GPS Table.

We need to compare row 1 and with row 2 and check if there is a 15 minute gap or greater in time.

If there is we want to record this in our Trip table.

We would like to do this for the entire table and record each time there is a gap of 15 minutes between the current row and the next row.

|||

Yes our table has an IDENTITY column. This ID is created everytime data is inputted into our GPS table.

This column will never be blank as the number is generated when an entry is made to the table.

|||Hey all!
Thanks for your Class I help so far.
I'm involved in this project too.. The data comes in something like this:

ID VehicleID Date/Time
333 212 2006-1-12 10:45:33
334 212 2006-1-12 10:45:48
335 212 2006-1-12 10:46:32
336 212 2006-1-12 10:47:43
337 212 2006-1-12 10:47:52
<< We want to identify and generate a new trip ID here in trip table (>15mins interval) >>
338 212 2006-1-12 17:23:33
339 212 2006-1-12 17:23:58
340 212 2006-1-12 17:24:33
|||

Previous post removed as I've found a solution that can cope with gaps between values in the IDENTITY column. See below.

Chris

--Setup test data
DECLARE @.MyTable TABLE (ID INT NOT NULL, VehicleID INT, [Date/Time] DATETIME NOT NULL)

INSERT INTO @.MyTable(ID, VehicleID, [Date/Time])
SELECT 333, 212, '2006-1-12 10:45:33' UNION
SELECT 334, 212, '2006-1-12 10:45:48' UNION
SELECT 335, 212, '2006-1-12 10:46:32' UNION
SELECT 336, 212, '2006-1-12 10:47:43' UNION
SELECT 337, 212, '2006-1-12 10:47:52' UNION
--<< We want to identify and generate a new trip ID here in trip table (>15mins interval) >>
SELECT 338, 212, '2006-1-12 17:23:33' UNION
SELECT 339, 212, '2006-1-12 17:23:58' UNION
SELECT 340, 212, '2006-1-12 17:24:33' UNION
--Repeat data for a different VehicleID
SELECT 341, 100, '2006-1-12 10:45:33' UNION
SELECT 342, 100, '2006-1-12 10:45:48' UNION
SELECT 343, 100, '2006-1-12 10:46:32' UNION
SELECT 344, 100, '2006-1-12 10:47:43' UNION
SELECT 345, 100, '2006-1-12 10:47:52' UNION
--<< We want to identify and generate a new trip ID here in trip table (>15mins interval) >>
SELECT 346, 100, '2006-1-12 17:23:33' UNION
SELECT 347, 100, '2006-1-12 17:23:58' UNION
SELECT 348, 100, '2006-1-12 17:24:33'

SELECT *
FROM @.MyTable

--Return the rows between which there is a datetime gap > 15 mins
--All columns are returned purely for this example - take out any columns you don't need and add in any that you do
SELECT t1.ID, t1.[Date/Time],
t2.ID,
t2.[Date/Time],
DATEDIFF(mi, t1.[Date/Time], t2.[Date/Time]) AS [Difference (mins)],
t1.VehicleID
FROM @.MyTable t1, @.MyTable t2
WHERE t2.[Date/Time] > DATEADD(mi, 15, t1.[Date/Time])
AND t2.ID = (SELECT TOP 1 t3.ID FROM @.MyTable t3 WHERE t3.ID > t1.ID AND t3.VehicleID = t1.VehicleID ORDER BY t3.ID ASC)
AND t2.VehicleID = t1.VehicleID

--Delete a couple of rows to show we can cope with range gaps
DELETE FROM @.MyTable
WHERE id IN (338, 345)

--Repeat the query just to show we can cope with range gaps, expect different results as rows have been deleted from the source table
SELECT t1.ID, t1.[Date/Time],
t2.ID,
t2.[Date/Time],
DATEDIFF(mi, t1.[Date/Time], t2.[Date/Time]) AS [Difference (mins)],
t1.VehicleID
FROM @.MyTable t1, @.MyTable t2
WHERE t2.[Date/Time] > DATEADD(mi, 15, t1.[Date/Time])
AND t2.ID = (SELECT TOP 1 t3.ID FROM @.MyTable t3 WHERE t3.ID > t1.ID AND t3.VehicleID = t1.VehicleID ORDER BY t3.ID ASC)
AND t2.VehicleID = t1.VehicleID

|||

Hi Chris,

Thanks for your response.

I cant test it out until tomorrow as i dont have access to the database table from home.

I will try this out tomorrow and keep you informed.

Thanks a mil.

|||

Emerson, Carrics3,

How about (sorry for the double spacing, which I don't know how to control):

create table new_trips

(start_ID int not null,

vehicle_ID int,

start_time datetime

)

;

insert into new_trips

select

t2.ID,

t2.Vehicle_ID,

t2.[Date/Time]

from data_table t1

inner join data_table t2 -- you are joining a table to itself

on t2.ID = (t1.ID + 1) and

t2.vehicle_ID = t1.vehicle_ID

where DATEDIFF(mi, t1.[Date/Time], t2.[Date/Time])

order by 1, 2

;

Dan

P.S. This does not properly deal with having multiple Vehicle_ID in the same table. I will have to think a bit about how best to deal with that. It may involve placing ID and Vehicle_ID in a sorted fashion into a temporary table, and having an additional column (an IDENTITY column) in the temporary table that would be used the same way ID is being used in the above query. When Vehicle_ID changed it should also indicate a new trip -- I am guessing, anyway.

|||

In response to DanR1's post, just be aware that joining onto an identity value + 1 (and incorporating the VehicleID) will fail where either the VehicleID changes between adjacent rows or where there are gaps between identity values (where rows have been deleted or transactions containing inserts not committed).

If your data looks like either of the two examples below then this method will fail.

Chris

ID VehicleID Date/Time
333 212 2006-1-12 10:45:33
334 212 2006-1-12 10:45:48
335 212 2006-1-12 10:46:32
336 212 2006-1-12 10:47:43
337 212 2006-1-12 10:47:52
<< We want to identify and generate a new trip ID here in trip table (>15mins interval) >>
338 213 2006-1-12 17:23:33
339 212 2006-1-12 17:23:35
340 212 2006-1-12 17:23:58
341 212 2006-1-12 17:24:33

or

ID VehicleID Date/Time
333 212 2006-1-12 10:45:33
334 212 2006-1-12 10:45:48
335 212 2006-1-12 10:46:32
336 212 2006-1-12 10:47:43
337 212 2006-1-12 10:47:52
<< We want to identify and generate a new trip ID here in trip table (>15mins interval) >>
339 212 2006-1-12 17:23:33
340 212 2006-1-12 17:23:35
341 212 2006-1-12 17:23:58
342 212 2006-1-12 17:24:33

|||

Hi Chris,

I have tested out your sql statement.

It is working but its not exactly what we want to do.

We need to compare row 1 with row 2 and check if there is a gap of 15 mins

If there is we want to record that ( Instead of Checking row 1 against every record in the table)

However if there is a gap of less then 15 mins then we want to move on and query row 2 with row 3 and check if there is gap of 15 mins. And so on til we get to the end where there is no row coming after the current row.

Each time there is gap of 15 mins or greater between the current row and the next row we want to record it in our Trip table

ID VehicleID Date/Time
333 353864504523 2006-1-12 10:35:33
334 353864504523 2006-1-12 10:35:48
335 353864504523 2006-1-12 10:36:32
336 353864504523 2006-1-12 10:37:43
337 353864504523 2006-1-12 10:38:52
338 353864504523 2006-1-12 10:39:33
339 353864504523 2006-1-12 10:40:48 There is a gap of 15 mins between these 2 entries.
340 353864504523 2006-1-12 10:55:32
341 353864504523 2006-1-12 10:55:43
342 353864504523 2006-1-12 10:56:52

No comments:

Post a Comment