Sunday, February 19, 2012

committing dirty pages, recovery interval, checkpoint, etc.

Please post any responses to the group, not my e-mail...
I have a sql2k database accessed by a public consumer web site. SQL Server
has 2 GB allocated to it, recovery interval is set to 0, and logs are backed
up every 20 minutes. I have noticed that SQL Server seems to commit dirty
pages at most once during that 20 minutes before committing when the
database is checkpointed as part of the log backup, and more often only at
the checkpoint. The subsequent burst of write activity lasts as long as 30
seconds and causes a lot of queuing in the device driver (the disks are two
RAID 5 sets on a Dell PERC 2 controller). This activity causes slowness for
the consumer site, sometimes to the point of causing timeouts in OLE DB. We
bumped up the timeout in the ASP code, which decreases the number of
timeouts, but of course the consumer still has to be patient and wait for
the response.
Is this "typical"? I would prefer to spread the writes out more evenly
across the 20 minutes. I am considering checkpointing the database from a
process to run every 2 or 4 minutes. Would that be reasonable? If I set
the recovery interval to some non-zero value, might SQL Server commit dirty
pages more frequently, or would that just encourage it to commit less
frequently (which is what the docs lead me to believe)? Note that adding
disks to the RAID set and/or reconfiguring as RAID 10 is not currently an
option.
Thanks.
--
Scott NicholThe default setting of Recovery Interval in Minutes of 0 (for SQL 2k) means
that a checkpoint is done about every minute... You should be seeing page
flushing more frequently than every 20 minutes...
--
Wayne Snyder, MCDBA, SQL Server MVP
Computer Education Services Corporation (CESC), Charlotte, NC
www.computeredservices.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
"Scott Nichol" <reply_to_newsgroup@.scottnichol.com> wrote in message
news:OIHv3WzlDHA.3024@.tk2msftngp13.phx.gbl...
> Please post any responses to the group, not my e-mail...
> I have a sql2k database accessed by a public consumer web site. SQL
Server
> has 2 GB allocated to it, recovery interval is set to 0, and logs are
backed
> up every 20 minutes. I have noticed that SQL Server seems to commit dirty
> pages at most once during that 20 minutes before committing when the
> database is checkpointed as part of the log backup, and more often only at
> the checkpoint. The subsequent burst of write activity lasts as long as
30
> seconds and causes a lot of queuing in the device driver (the disks are
two
> RAID 5 sets on a Dell PERC 2 controller). This activity causes slowness
for
> the consumer site, sometimes to the point of causing timeouts in OLE DB.
We
> bumped up the timeout in the ASP code, which decreases the number of
> timeouts, but of course the consumer still has to be patient and wait for
> the response.
> Is this "typical"? I would prefer to spread the writes out more evenly
> across the 20 minutes. I am considering checkpointing the database from a
> process to run every 2 or 4 minutes. Would that be reasonable? If I set
> the recovery interval to some non-zero value, might SQL Server commit
dirty
> pages more frequently, or would that just encourage it to commit less
> frequently (which is what the docs lead me to believe)? Note that adding
> disks to the RAID set and/or reconfiguring as RAID 10 is not currently an
> option.
> Thanks.
> --
> Scott Nichol
>|||Check the transactions which access the DB. Break
some real long transactions into smaller ones.
>--Original Message--
>Please post any responses to the group, not my e-mail...
>I have a sql2k database accessed by a public consumer web
site. SQL Server
>has 2 GB allocated to it, recovery interval is set to 0,
and logs are backed
>up every 20 minutes. I have noticed that SQL Server
seems to commit dirty
>pages at most once during that 20 minutes before
committing when the
>database is checkpointed as part of the log backup, and
more often only at
>the checkpoint. The subsequent burst of write activity
lasts as long as 30
>seconds and causes a lot of queuing in the device driver
(the disks are two
>RAID 5 sets on a Dell PERC 2 controller). This activity
causes slowness for
>the consumer site, sometimes to the point of causing
timeouts in OLE DB. We
>bumped up the timeout in the ASP code, which decreases
the number of
>timeouts, but of course the consumer still has to be
patient and wait for
>the response.
>Is this "typical"? I would prefer to spread the writes
out more evenly
>across the 20 minutes. I am considering checkpointing
the database from a
>process to run every 2 or 4 minutes. Would that be
reasonable? If I set
>the recovery interval to some non-zero value, might SQL
Server commit dirty
>pages more frequently, or would that just encourage it to
commit less
>frequently (which is what the docs lead me to believe)?
Note that adding
>disks to the RAID set and/or reconfiguring as RAID 10 is
not currently an
>option.
>Thanks.
>--
>Scott Nichol
>
>.
>|||Yes, the docs do say that a recovery interval of 0 should generate a
"checkpoint approximately every one minute for active databases". It also
says
The frequency of checkpoints in each database depends on the amount of data
modifications made, not on any time-based measure. A database used primarily
for read-only operations will not have many checkpoints. A transaction
database will have frequent checkpoints.
<<<<
While I would not characterize the database as "primarily for read-only
operations", the page cache is quite large compared to the data actually
being accessed in a 20 minute period.
One other observation that I forgot to mention. When an administrative user
runs a report that requires one or more queries that read many pages, I see
a fair number of physical writes that can only be attributed to dirty pages
being written.
--
Scott Nichol
"Wayne Snyder" <wsnyder@.computeredservices.com> wrote in message
news:eAqEFszlDHA.1960@.TK2MSFTNGP12.phx.gbl...
> The default setting of Recovery Interval in Minutes of 0 (for SQL 2k)
means
> that a checkpoint is done about every minute... You should be seeing page
> flushing more frequently than every 20 minutes...
> --
> Wayne Snyder, MCDBA, SQL Server MVP
> Computer Education Services Corporation (CESC), Charlotte, NC
> www.computeredservices.com
> (Please respond only to the newsgroups.)
> I support the Professional Association of SQL Server (PASS) and it's
> community of SQL Server professionals.
> www.sqlpass.org
>
> "Scott Nichol" <reply_to_newsgroup@.scottnichol.com> wrote in message
> news:OIHv3WzlDHA.3024@.tk2msftngp13.phx.gbl...
> > Please post any responses to the group, not my e-mail...
> >
> > I have a sql2k database accessed by a public consumer web site. SQL
> Server
> > has 2 GB allocated to it, recovery interval is set to 0, and logs are
> backed
> > up every 20 minutes. I have noticed that SQL Server seems to commit
dirty
> > pages at most once during that 20 minutes before committing when the
> > database is checkpointed as part of the log backup, and more often only
at
> > the checkpoint. The subsequent burst of write activity lasts as long as
> 30
> > seconds and causes a lot of queuing in the device driver (the disks are
> two
> > RAID 5 sets on a Dell PERC 2 controller). This activity causes slowness
> for
> > the consumer site, sometimes to the point of causing timeouts in OLE DB.
> We
> > bumped up the timeout in the ASP code, which decreases the number of
> > timeouts, but of course the consumer still has to be patient and wait
for
> > the response.
> >
> > Is this "typical"? I would prefer to spread the writes out more evenly
> > across the 20 minutes. I am considering checkpointing the database from
a
> > process to run every 2 or 4 minutes. Would that be reasonable? If I
set
> > the recovery interval to some non-zero value, might SQL Server commit
> dirty
> > pages more frequently, or would that just encourage it to commit less
> > frequently (which is what the docs lead me to believe)? Note that
adding
> > disks to the RAID set and/or reconfiguring as RAID 10 is not currently
an
> > option.
> >
> > Thanks.
> >
> > --
> > Scott Nichol
> >
> >
>|||I do not have any problem with long running transactions; DBCC OPENTRAN
typically shows nothing. Probably one time in twenty it shows something,
but running it again immediately shows no open transactions again.
--
Scott Nichol
"anonymous" <anonymous@.discussions.microsoft.com> wrote in message
news:024e01c39741$52c7a570$a401280a@.phx.gbl...
> Check the transactions which access the DB. Break
> some real long transactions into smaller ones.
> >--Original Message--
> >Please post any responses to the group, not my e-mail...
> >
> >I have a sql2k database accessed by a public consumer web
> site. SQL Server
> >has 2 GB allocated to it, recovery interval is set to 0,
> and logs are backed
> >up every 20 minutes. I have noticed that SQL Server
> seems to commit dirty
> >pages at most once during that 20 minutes before
> committing when the
> >database is checkpointed as part of the log backup, and
> more often only at
> >the checkpoint. The subsequent burst of write activity
> lasts as long as 30
> >seconds and causes a lot of queuing in the device driver
> (the disks are two
> >RAID 5 sets on a Dell PERC 2 controller). This activity
> causes slowness for
> >the consumer site, sometimes to the point of causing
> timeouts in OLE DB. We
> >bumped up the timeout in the ASP code, which decreases
> the number of
> >timeouts, but of course the consumer still has to be
> patient and wait for
> >the response.
> >
> >Is this "typical"? I would prefer to spread the writes
> out more evenly
> >across the 20 minutes. I am considering checkpointing
> the database from a
> >process to run every 2 or 4 minutes. Would that be
> reasonable? If I set
> >the recovery interval to some non-zero value, might SQL
> Server commit dirty
> >pages more frequently, or would that just encourage it to
> commit less
> >frequently (which is what the docs lead me to believe)?
> Note that adding
> >disks to the RAID set and/or reconfiguring as RAID 10 is
> not currently an
> >option.
> >
> >Thanks.
> >
> >--
> >Scott Nichol
> >
> >
> >.
> >|||> The subsequent burst of write activity lasts as long as 30
> seconds and causes a lot of queuing in the device driver (the disks are two
> RAID 5 sets on a Dell PERC 2 controller).
Having the tlog on RAID5 is not efficient at all. I suggest you investigate moving the tlog to RAID1
or 01/10.
The default setting for recovery interval is based on SQL Server not having to spend more than 1
minute to do recovery for each database at startup. As I remember it. You could try to set it to
some other value, but as the config value is in minutes, I don't see how you could lower that value.
I might have forget a bit about the changes in this setting since 6.5, so please read BOL closely on
this.
Doing regular CHECKPOINT is certainly something you can try. But I would try to "solve" the problem
instead, by having a dedicated (set of) disk(s) for the tlog.
--
Tibor Karaszi, SQL Server MVP
Archive at: http://groups.google.com/groups?oi=djq&as ugroup=microsoft.public.sqlserver
"Scott Nichol" <reply_to_newsgroup@.scottnichol.com> wrote in message
news:OIHv3WzlDHA.3024@.tk2msftngp13.phx.gbl...
> Please post any responses to the group, not my e-mail...
> I have a sql2k database accessed by a public consumer web site. SQL Server
> has 2 GB allocated to it, recovery interval is set to 0, and logs are backed
> up every 20 minutes. I have noticed that SQL Server seems to commit dirty
> pages at most once during that 20 minutes before committing when the
> database is checkpointed as part of the log backup, and more often only at
> the checkpoint. The subsequent burst of write activity lasts as long as 30
> seconds and causes a lot of queuing in the device driver (the disks are two
> RAID 5 sets on a Dell PERC 2 controller). This activity causes slowness for
> the consumer site, sometimes to the point of causing timeouts in OLE DB. We
> bumped up the timeout in the ASP code, which decreases the number of
> timeouts, but of course the consumer still has to be patient and wait for
> the response.
> Is this "typical"? I would prefer to spread the writes out more evenly
> across the 20 minutes. I am considering checkpointing the database from a
> process to run every 2 or 4 minutes. Would that be reasonable? If I set
> the recovery interval to some non-zero value, might SQL Server commit dirty
> pages more frequently, or would that just encourage it to commit less
> frequently (which is what the docs lead me to believe)? Note that adding
> disks to the RAID set and/or reconfiguring as RAID 10 is not currently an
> option.
> Thanks.
> --
> Scott Nichol
>|||"Tibor Karaszi" <tibor.please_reply_to_public_forum.karaszi@.cornerstone.se>
wrote in message news:OorWq75lDHA.2200@.TK2MSFTNGP12.phx.gbl...
> > The subsequent burst of write activity lasts as long as 30
> > seconds and causes a lot of queuing in the device driver (the disks are
two
> > RAID 5 sets on a Dell PERC 2 controller).
> Having the tlog on RAID5 is not efficient at all. I suggest you
investigate moving the tlog to RAID1
> or 01/10.
-- snip --
> Doing regular CHECKPOINT is certainly something you can try. But I would
try to "solve" the problem
> instead, by having a dedicated (set of) disk(s) for the tlog.
Sorry, I was not clear about this. The transaction log is on a RAID 1 set.
It is written to consistently during the 20 minutes. It is only the dirty
data pages that are written "all at once" when the log is backed up.
Scott Nichol

No comments:

Post a Comment