Standby Redo Logs

February 25, 2012

On the Primary Database, Oracle Data Guard uses the Log Writer Process (LGWR) or Archiver Process (ARCH) or Log Writer Network Server Process (LNSn) to collect transactions redo data and ship this data to the standby. On the standby database Oracle Data Guard uses the Remote File Server (RFS) process to receive the redo records from the primary database, the Managed Recover Process (MRP) to apply redo information/recovery to the physical standby database.

Many times I have been asked, why do we need the Standby Redo Log File at standby site, I tried to explain the same ?

Without standby Redo log files at standby site, once the archive log files are generated on primary, then only, they are immediately shipped to the standby database from the primary and then the MRP reads it and applies the redo data to the physical standby

Without standby Redo log files at standby site, if the primary instance is crashed or lost, the “current” redo log (as written by the LGWR process) in primary database redo log file, stays at the primary site. As the Archive file did not get generated on primary for the data which is in Redo log file, so this data is not applied on standby and there would be the data loss forever on standby in case of primary site is lost.

To avoid such scenarios Standby redo log files at Standby site comes to our rescue.

Data guard writes the Primary’s current redo log to a “standby redo log” allowing complete recovery in case of Primary site is lost. Standby Redo Log files can be written using Synchronous (LGWR SYNC) or Asynchronous redo transmission (LGWR ASYNC).

Standby redo logs must be archived on standby database before the data can be applied to the standby database. The standby archival operation occurs automatically, even if the standby database is not in ARCHIVELOG mode. However, the archiver process (ARCn) must be started on the standby database.
The SYNC attribute performs all network I/O synchronously, in conjunction with each write operation to the online redo log file, and waits for the network I/O to complete.
The ASYNC attribute performs all network I/O asynchronously and control is returned to the executing application or user immediately, without waiting for the network I/O to complete.
If you configure a destination to use the LGWR process, but for some reason the LGWR process becomes unable to archive to the destination, then redo transport will revert to using the ARCn process to complete archival operations.

Take note of the following important points to understand when configuring the standby database:

The standby database that is used to satisfy the minimum requirements for a given protection mode must be enabled and ready to receive redo data from the primary database before you can switch to that mode.

When archiving to a physical standby destination using the LGWR process, changes (transactions) being made and committed on the primary database are not instantly written to the actual database files on the standby database. In Oracle9i Release 2 and higher, when log transport services is configured for Maximum Availability mode or Maximum Protection mode, the LGWR process on the primary database will send redo data to the standby redo logs (located on the standby database) at the same time it is writing redo data to the local (online) redo logs. Keep in mind that the LGWR process is actually communicating with a Remote File Server (RFS) process on the standby database server. This RFS process on the standby database is responsible for capturing and writing the redo data it obtains from the primary database to the standby redo logs (LGWR) or the standby archived redo logs (ARCn).

The Remote File Server process runs on the standby database and can receive redo data over the network from both LGWR and ARCn. The RFS process will write the redo data it receives to either a standby redo log or to a standby archived redo log.

When a log switch occurs on the primary database, a log switch is also triggered on the standby database where the ARCH process then archives the standby redo logs to the archive destination specified on the standby database. After the archival process has completed on the standby database, the Managed Recovery Process (MRP) then writes the changes to the actual database files from the archived redo log files.

Why is this important to point out? It illustrates the fact that the actual changes (transactions) being made and committed on the primary database do indeed make it over to the standby database, but get applied to the standby redo logs.

In Oracle9i, these changes are only made to the actual database files on the standby database when a log switch occurs on the primary.

In Oracle Database 10g, a new feature called real-time apply can be enabled which tells log apply services to apply redo data to the database files on the standby database as it is received, without waiting for the current standby redo log file to be archived. This results in faster switchover and failover times because the standby redo log files have been applied already to the standby database by the time the failover or switchover begins.

On the primary location, log transport services use the following processes:
- Log writer process (LGWR) – This process collects transaction redo and updates the online redo logs.
- Archiver process (ARC) – These processes create a copy of the online redo logs, either locally (or remotely for standby databases).
- Fetch archive log (FAL) process (physical standby databases only) – This process provides a client/server mechanism for resolving gaps detected in the range of archived redo logs generated at the primary database and received at the standby database. The FAL client requests the transfer of archived redo log files automatically when it detects a gap in the redo logs received by the standby database. The FAL server typically runs on the primary database and services the FAL requests coming from the FAL client. The FAL client and server are configured using the FAL_CLIENT and FAL_SERVER initialization parameters which are set on the standby location.
On the standby location, log transport services use the following processes:
- Remote file server (RFS) – This process receives redo logs from the primary database.
- Archiver process (ARC) – This process archives the standby redo logs when standby redo logs and LGWR are used.
On the standby location, log apply services use the following processes:
- Managed recovery process (MRP) – For physical standby databases only, the MRP applies archived redo log information to the physical standby database.
- Logical standby process (LSP) – For logical standby databases only, the LSP applies archived redo log information to the logical standby database, using SQL interfaces.
On the primary and standby locations, the Data Guard broker uses the following processes:
- Data Guard broker monitor (DMON) process – These processes work cooperatively to manage the primary and standby databases as a unified configuration. The DMON processes work together to execute switchover and failover operations, monitor the status of the databases, and manage log transport services and log apply services.

“Asynchronous Redo Transmission“ uses a new background process called LNSn.

Asynchronous redo transmission using the log writer process (LGWR ASYNC) has been improved to reduce the performance impact on the primary database. During asynchronous redo transmission, the network server (LNSn) process transmits redo data out of the online redo log files on the primary database and no longer interacts directly with the log writer process.

This change in behavior allows the log writer process to write redo data to the current online redo log file and continue processing the next request without waiting for inter-process communication or network I/O to complete.

LGWR – Specifies that LGWR rather than ARCH is responsible for transmitting redo logs to the standby. This allows redo records generated on the primary to be transmitted at the record-level, allowing for minimal data loss. Otherwise, using ARCH, a redo log switch needs to occur so the redo log can be archived and transmitted to the standby.
ASYNC=20480 – When using the primary database log writer process to archive redo logs, you can specify synchronous (SYNC) or asynchronous (ASYNC) network transmission of redo logs to archiving destinations. With ASYNC, control will be returned to the application processes immediately, even if the data has not reached the destination. This mode has a reasonable degree of data protection on the destination database, with minimal performance effect on the primary database. In general, for slower network connections, use larger block counts. ASYNC=20480 indicates to transmit the SGA network buffer in 20480 512-byte blocks. In Maximum Performance mode, this 10MB buffer size (the largest allowed) performs best in a WAN. (In a LAN ASYNC buffer size does not impact primary database throughput). Also, in a WAN, using the maximum buffer size reduces “Timing out” messages due to an async buffer full condition. This is because the smaller the buffer, the more the chance of the buffer filling up as latency increases.[1]
NOAFFIRM – Specifies to perform asynchronous log archiving disk write I/O operations on the standby database. It is not necessary for the primary database to receive acknowledgment of the availability of the modifications on the standby database in a Maximum Performance environment. This attribute applies to local and remote archive destination disk I/O operations, and to standby redo log disk write I/O operations. However, the NOAFFIRM attribute has no effect on primary database online redo log disk I/O operations.
NET_TIMEOUT=30 – Designates that if there is no reply for a network operation within 30 seconds, then the network server errors out due to the network timeout instead of stalling for the default network timeout period (TCP timeout value). A NET_TIMEOUT of 30 seconds here provided enough cushion to accommodate the latency during peak redo traffic through the dedicated NIC on the WAN.
REOPEN=15 MAX_FAILURE=10 – Denotes that if there is a connection failure, the network server reopens the connection after 15 seconds and retries up to 10 times. The maximum retry time for all failed operations is calculated as REOPEN multiplied by MAX_FAILURE, or 150 seconds (2.5 minutes).
DELAY=30 – Specifies that recovery apply is delayed for 30 minutes from the time the log is archived on the physical standby, but the redo transfer to the standby is not delayed. The correct recovery delay is important in ensuring that a user error or corruption does not get propagated to the standby database, which would compromise your disaster recovery solution. The recovery delay setting is critical for standby configurations regardless of the protection mode. The delay allows the managed recovery process (MRP) on the standby database to intentionally lag behind in applying archived redo log files. Without a recovery delay, when the standby database is in managed recovery mode, archived redo is automatically applied upon a log switch. Reducing the delay time reduces standby recovery time due to the reduced number of archived redo log files required for standby recovery. But a short delay time is possible only if you have a monitoring infrastructure that detects problems and stops the standby database within that timeframe (see Monitor Data Guard Configuration below). In the case of this client, OEM Data Guard Manager events monitored the configuration tightly enough to allow for a 30-minute delay.

A protection mode is only set on the primary database and defines the way Oracle Data Guard will maximize a Data Guard configuration for performance, availability, or protection in order to achieve the maximum amount of allowed data loss that can occur when the primary database or site fails.

Set Appropriate Database Protection Mode

You can choose between three protection modes, with different logging options. Each mode for your environment has a different impact on availability, costs, data loss, performance, and scalability. Choose one of the following depending on your service level agreements:

Maximum Protection mode with LGWR SYNC AFFIRM option for an environment that requires no data loss and no divergence. Performance overhead is incurred.

This mode offers the highest level of data protection. Data is synchronously transmitted to the standby database from the primary database and transactions are not committed on the primary database unless the redo data is available on at least one standby database configured in this mode. If the last standby database configured in this mode becomes unavailable, processing stops on the primary database. This mode ensures no-data-loss.

Maximum Availability mode with LGWR SYNC AFFIRM option for an environment that needs no data loss but tolerates divergence when sites are temporarily inaccessible.

This mode is similar to the maximum protection mode, including zero data loss. However, if a standby database becomes unavailable (for example, because of network connectivity problems), processing continues on the primary database using Max Perf mode (i.e. archives are shipped).

When the fault is corrected, the standby database is automatically resynchronized with the primary database.

But if the fault is not corrected on standby and primary site is lost during the ongoing issues with standby then there may be data loss in standby site, Again the loss of data depends on the availability of archives and redo logs of primary site.

Well Standby can be synced/recovered by restoring the archives available from the backups in case of primary is lost totally, but the problem lies with redo where few of the transaction are available in that , which will be lost, due to loss of primary site. So to avoid that we can use SRDF Mirroring technology to mirror all redo logs to different storage.

SRDF (Symmetrix Remote Data Facility) is a family of EMC products that facilitates the data replication from one Symmetrix storage array to another through a Storage Area Network or IP network.

Maximum Performance mode with ARCH/ LGWR ASYNC (AFFIRM or NOAFFIRM) option for an environment that tolerates minimal data loss and divergence when sites are temporarily inaccessible. Performance overhead is minimized.

This mode offers slightly less data protection on the primary database, but higher performance than maximum availability mode. In this mode, as the primary database processes transactions, redo data is asynchronously shipped to the standby database. The commit operation of the primary database does not wait for the standby database to acknowledge receipt of redo data before completing write operations on the primary database. If any standby destination becomes unavailable, processing continues on the primary database and there is little effect on primary database performance.

The only difference between the Maximum Protection and Maximum Performance configuration is whether LGWR writes synchronously or asynchronously, respectively. For the environment presented here, Maximum Performance mode with LGWR ASYNC NOAFFIRM was chosen based upon client requirements.

Disable ARCHIVE_LAG_TARGET

The initialization parameter ARCHIVE_LAG_TARGET limits the amount of data that can be lost and can effectively increase the availability of the standby database by forcing a log switch after a user-specified time period elapses. As with the Data Guard environment here, you would be better off disabling this time-based thread advance feature by setting it to zero to eliminate archive log switches based on time. Instead, as with any database, size redo logs such that log switches occur frequently enough to meet requirements for maximum allowable loss of data.

LGWR / ASYNC vs. ARCH

When configuring log transport services to use LGWR and remotely archive in ASYNC mode, the LGWR process does not wait for each network I/O to complete before proceeding. This behavior is made possible by the use of an intermediate process, known as a Log Writer Network Server Process (LNS), which performs the actual network I/O and waits for each network I/O to complete. Each LNS has a user configurable buffer that is used to accept outbound redo data from the LGWR process. This is configured by specifying the size (in 512 byte blocks) on the ASYNC attribute in the archive log parameter for the standby destination service. For example, ASYNC=2048 indicates a 1MB buffer. As long as the LNS process is able to empty this buffer faster than the LGWR can fill it, the LGWR process will never stall. If the LNS cannot keep up, then the buffer will become full and the LGWR process will stall until either sufficient buffer space is freed up by a successful network transmission or a timeout occurs.

When configuring log transport services to remotely archive using the ARCH attribute, redo logs are transmitted to the destination only during an archival operation. This means that the standby database does not receive any redo data until the primary database fills its current online redo and archives it. The data received and applied to the standby database is only as current as the last archived redo log sent from the primary database. The background archiver processes (ARCn) or a foreground archival process operation serves as the redo log transport service. Using ARCH to remotely archive redo data does not impact the primary database throughput as long as enough redo log groups exist so that the most recently used group can be archived before it must be reopened.

More on LNS:

Question: I am running Oracle 10gr2 with a standby database on Data Guard using ASYNC redo log transport. I am getting these LNS event in my AWR report, LNS Wait events.
%Time                Total         Wait          Avg wait      Waits
Event                Waits         -outs         Time (s)      (ms) /txn
—————      ——– ————–    ———– ——- —
LNS wait on SENDREQ 312           .0            197           632    0.0
LNS ASYNC end of log       143,963       100.0         1,545         11     5.8

I have a slow network with an average packet time of 25 milliseconds with busy updates at a rate of 250 block changes per second. During the period of high LNS wait on SENDREQ, my database “hangs”, not allowing anybody to sign-in. What is a wait on the Log Network Server (LNS) process, and how do I fix this issue?

Answer: The LNS wait on SENDREQ can happen with either the SYNC or ASYNC LGWR attributes.

When using ASYNC transport mode in Oracle 10g r2 and beyond, Oracle recommends allowing for sufficient I/O bandwidth for LNS read I/Os to the online redo logs of the production database.

You can choose the LGWR attributes for synchronous (LogXptMode = ‘SYNC’) or asynchronous mode (LogXptMode = ‘ASYNC’). See these important notes on Data Guard synchronous (SYNC) vs asynchronous (ASYNC) modes.

You can dynamically change the LGWR attributes with the “edit database” command:

edit database ‘stdbydb’ set property LogXptMode=SYNC

edit database ‘primdb’ set property LogXptMode=ASYNC

The LNS wait on SENDREQ occurs more frequently on a busy network when the LNS process is stuck, waiting for the RFS to send an ack from the standby server. The root cause of a LNS wait on SENDREQ is a busy network, and this error is most likely to appear when the system is during high-volume updates (heavy DML).

Log transport steps

Here are the log transport steps and the associated waits for each log transport step. Step 1 is manifested as a log file parallel write wait, while all subsequent steps are the “LNS wait on SENDREQ‘:

0 – The LGWR process writes from the online redo log filesystem to the archived redo log directory. This can be seen as a “log file parallel write” wait event.

1 – Using an FTP-type mechanism, the LNS process detects the completion of a redo log being written to the local archived redo log directory.

2 – Once the redo has completed archiving, the LNS process communicates with the remote file service (RFS) process) to manage the transport of the flat file to the remote standby server, waiting for a system acknowledgement (an “ack”) from the remote file service (RFS) process.

3 – Once the ack is received from standby, the LNS process notifies the log writer process (LGWR) and the commit occurs.

4 – Upon arrival at the standby server, the redo updates are applied to the standby database, thereby reproducing the updates from the originating server.

For more details on the LNS wait for SENDREQ, see MOSC note 233491.1.

Possible solutions for LNS wait on SENDREQ

Possible solutions for a hung database on LNS wait on SENDREQ include:

0 – Change net_timeout: If using SYNC mode in 10gr2 and beyond, consider reducing the value of the net_timeout attribute. The value of net_timeout indicates number of seconds that LGWR on the production database waits for Oracle*Net to respond to a LGWR request.

1 – Try setting COMMIT WRITE NOWAIT. This will allow transactions to continue processing without waiting for LGWR to post a message back stating that all redo changes are on disk.

2 – Spread-out the DML load over a longer period of time, by adjusting the schedule of any batch jobs

3 – Get a faster network transport such as dark fibre.

4 – Change the size of the online redo logs to make the archived redo log smaller, thereby creating smaller archived redo logs. This will create more frequent, smaller redo log transports that will complete faster.

5 – Tune the TNS layer, perhaps trying the tcp.nodelay parameter in the protocol.ora file

How the LNS wait on SENDREQ occurs

The Data Guard redo log transport may be synchronous mode (LogXptMode = ‘SYNC’) or asynchronous mode (LogXptMode = ‘ASYNC’).

The Async LNS process is a background daemon process that periodically checks for new redo entries from the log writer (LGWR) process. The ASYNC wait is a normal idle time event when you are using Data Guard asynchronous redo log transport.

The Log Network Server (LNS) is a background process that manages all archived redo log flat file transports to a standby server. This query display data from the

select
process,
status
from
v$managed_standby;

The “LNS wait on SENDREQ” is the sum of network time and RFS I/O time. To see the network time, take the value for LNS wait on SENDREQ and subtract the “RFS write” wait event time from the standby server. You can also use the UNIX/Linux netstat utility to measure packet shipment latency.

The Oracle documentation notes these wait events for destinations configured with the LGWR SYNC and ASYNC Attributes

Waits with the LGWR SYNC Attribute:

Wait Event	Monitors the Amount of Time Spent By . . .
LGWR wait on LNS	The LGWR process waiting to receive messages from the LNSn process.
LNS wait on ATTACH	All network servers to spawn an RFS connection.
LNS wait on SENDREQ	All network servers to write the received redo data to disk as well as open and close the remote archived redo log files.
LNS wait on DETACH	All network servers to delete an RFS connection.

Wait with the LGWR ASYNC attribute:

Wait Event	Monitors the Amount of Time Spent By . . .
LNS wait on DETACH	All network servers to delete an RFS connection.
LNS wait on ATTACH	All network servers to spawn an RFS connection.
LNS wait on SENDREQ	All network servers to write the received redo data to disk as well as open and close the remote archived redo log files.
True ASYNC Control FileTXN Wait	The LNSn process to get hold of the control file transaction during its lifetime.
True ASYNC Wait for ARCH log	The LNSn process waiting to see the archived redo log (if the LNSn process is archiving a current log file and the log is switched out).
Waiting for ASYNC dest activation	The LNSn process waiting for an inactive destination to become active.
True ASYNC log-end-of-file wait	The LNSn process waiting for the next bit of redo after it has reached the logical end of file.

References:

http://docs.oracle.com/cd/B19306_01/server.102/b14239/log_transport.htm

http://docs.oracle.com/cd/B19306_01/server.102/b14239/log_arch_dest_param.htm#CACHDECE

http://docs.oracle.com/cd/B19306_01/server.102/b14239/scenarios.htm#i1074892

http://www.idevelopment.info/data/Oracle/DBA_tips/Data_Guard/DG_3.shtml

http://www.dba-oracle.com/t_lns_wait_on_sendreq.htm

From → Standby Database

4 Comments

Kumar Gumala permalink

Goood Information……

LikeLike

Reply
kapil permalink

Question: Why does standby redo logs files get archived? What is the need for them to be archived when they can always get it from primary?

LikeLike

Reply

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29

Standby Redo Logs

Log transport steps

Possible solutions for LNS wait on SENDREQ

How the LNS wait on SENDREQ occurs

Trackbacks & Pingbacks

Leave a comment Cancel reply

Categories

Recent Posts

Archives

Malesh Calendar

Malesh Blog Stats

Top Posts & Pages

Top Clicks

Top Rated

Live Traffic Stats

Follow Blog via Email

Malesh Author

Meta

Standby Redo Logs

Log transport steps

Possible solutions for LNS wait on SENDREQ

How the LNS wait on SENDREQ occurs

Share this:

Related

Trackbacks & Pingbacks

Leave a comment Cancel reply

Categories

Recent Posts

Archives

Malesh Calendar

Malesh Blog Stats

Top Posts & Pages

Top Clicks

Top Rated

Live Traffic Stats

Follow Blog via Email

Malesh Author

Meta