Locks held on Oracle for hours after sessions abnormally terminated by node failure

If a session holds db locks and is abnormally terminated (no fin/ack), the locks will persist until the db session is closed, typically around 2 hours and 12 minutes with default network tcp_keepalive settings. Abnormally terminated does not include CTRL C or kill -9. The ojdbc6 driver apparently has a shutdown hook thread that closes the connections in graceful shutdowns. The OS apparently closes the connections when processes are killed. Abnormally terminated might include power failure, firewall failures, Out Of Memory, OS/kernel crash, network connection failure, JBoss or other server node failure, etc.

This was easily reproduced by creating a  process that connected to the db and updated a record but did not commit. While the first process was waiting, a second process with a contending update was started which blocked on the first update. The first clients network cable was then disconnected. The second clients transaction waited for approximately 2 hours until the database OS tcp-keepalive timeouts expired and terminated the connection releasing the first connections locks. At that time the second processes transaction proceeded.

Oracle also has automatic hang resolution that can also terminate blocking sessions. This did not occur in the test which only had a single blocker. An alert log message similar to the following was observed and a corresponding .trc file was generated.

Errors in file /u01/app/ora/db/diag/rdbms/orac01/ORAC010/trace/ORAC010_dia0_32169.trc  (incident=30084):
ORA-32701: Possible hangs up to hang ID=0 detected
Incident details in: /u01/app/ora/db/diag/rdbms/orac01/ORAC010/incident/incdir_40082/ORAC010_dia0_32169_i30084.trc
DIA0 terminating blocker (ospid: 12451 sid: 312 ser#: 23976) of hang with ID = 15
    requested by master DIA0 process on instance 1
    Hang Resolution Reason: Automatic hang resolution was performed to free a
    significant number of affected sessions.
    by terminating session sid: 312 ospid: 12451
DIA0 successfully terminated session sid:515 ospid:12451 with status 0.
Thu Oct 01 17:12:39 2012


The default linux tcp_keepalive values are:

$ cat /proc/sys/net/ipv4/tcp_keepalive_time
7200 (seconds till first probe - 2 hours)
$ cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75 (seconds between probes)
$ cat /proc/sys/net/ipv4/tcp_keepalive_probes
9 (unacknowledged probes till terminated)

How to set options


Comments

Popular posts from this blog

Sites, Newsletters, and Blogs

Oracle JDBC ReadTimeout QueryTimeout