<?xml version="1.0" encoding="utf-8"?>
<html>
Many problems that arise from the database can be attributed to database transactions that have not been completed propertly. This can cause the connection pool to fill to the limit and to cause other database operations such as VACUUM to fail to complete. Connection pooled transactions should read "idle" next to their process name, while transactions which have not yet completed successfully usually read "idle in transaction". Other messages you are likely to see include "SELECT waiting", which is the most serious problem, because this means that an HTTP request was made but the SELECT to get the data never completed - probably the user got fed up of waiting and went away.

...

So the objective is to get rid of these "idle in transaction" connections. There are 2 solutions, the fist first of which is just a workaround:

...

2) Find and fix the places in the code where transactions are not safely completed. This is surprisingly hard, because although the application framework initialises and closes transactions at start and finish, there are a variety of things that can happen causing the transaction to fail. Principally, exceptions that occur at the database level can have unfortunate consequences for the state of the transaction, so this need needs to be looked at. In addition, any situation which causes a Context object to go out of scope (which can happen if additional context objects are created in the course of the processing) can leave a hanging transaction.

Panel

Contents

Table of Contents

outline	true
style	none

Determining locked tables

There is no obvious way of tracking down all these situations, but we can offer a query which will allow us to generate a "signature" for the requests that cause the problems. This is done using the following SQL:

Code Blockpanel

SELECT

relation,

transaction,

pid,

mode,

granted,

relname

FROM

pg_locks

INNER

JOIN

pg_stat_user_tables

ON

pg_locks.relation

=

pg_stat_user_tables.relid

WHERE

pg_locks.pid='

[

pid

]

';

This will give us a list of table names that a given process (identified by its pid) has locked and not released. This will hopefully allow us to identify which requests are causing the problem. Therefore, if you run this query, you should be able to see if this is your own code (accessing, for example, your tables, or a set of tables that one of your operations uses) or whether it is a DSpace bug. You should post this signature to the mailing list if you can't debug it yourself.

For example, this is an output from the above query:

...

relation	pid	mode	granted	relname
18034	4909	AccessShareLock	t	group2groupcache
18113	4909	AccessShareLock	t	metadatavalue
18046	4909	AccessShareLock	t	item
18008	4909	AccessShareLock	t	eperson
18248	4909	AccessShareLock	t	handle
18235	4909	AccessShareLock	t	epersongroup2eperson

So process 4909 has AccessShareLocks on 6 tables, and these 6 tables should help us identify where to look for the bug. In this case it is particularly unclear, but a process which uses groups, epersons, items handles and metadata is probably a deposit process.

...

which produces something like:

...

datid

...

datname

...

	procpid

...

	usesysid

...

	usename

...

	current_query

...

	query_start
17142

...

dspace

...

1107

...

1

...

dsprd

...

<IDLE>

...

	2007-06-11 11:21:36.147423+01
17142

...

dspace

...

1401

...

1

...

dsprd

...

<IDLE>

...

	2007-06-11 11:22:41.730301+01
17142

...

dspace

...

25163

...

1

...

dsprd

...

<IDLE>

...

	2007-06-11 11:21:36.097556+01
17142

...

dspace

...

27973

...

1

...

dsprd

...

update history set creation_date = '...'

...

	2007-06-11 11:25:39.687193+01
17142

...

dspace

...

916

...

1

...

dsprd

...

<IDLE>

...

2007-06-11 11:21:32.759784+01

Combining queued queries with locks

A useful query acquired from a user comment on the Postgres documentation page:

http://www.postgresql.org/docs/current/interactive/monitoring-locks.html^{Image Removed}

Panel
select pg_class.relname, pg_locks.transaction, pg_locks.mode, pg_locks.granted as "g", substr(pg_stat_activity.current_query,1,30), pg_stat_activity.query_start, age(now(),pg_stat_activity.query_start) as "age", pg_stat_activity.procpid from pg_stat_activity,pg_locks left outer join pg_class on (pg_locks.relation = pg_class.oid) where pg_locks.pid=pg_stat_activity.procpid order by query_start;

An other query from Jeff Davis shows any queries that are waiting on a lock, and the
query that currently holds the lock on which those queries are waiting: (not included here for brevity)

http://groups.google.com/group/pgsql.general/browse_thread/thread/10f61707d242a308?pli=1^{Image Removed}

Workaround: Killing "Idle in Transaction" processes with crontab

...

Code Block
/usr/bin/test `/usr/bin/pgrep -f 'idle in transaction' \| \ /usr/bin/wc \-l ` \-gt 20 && /usr/bin/pkill \-o \-f 'idle in transaction'

In English: every minute, if there are more than 20 "idle in
transaction" Postgres processes, it kills the oldest one.

...

Code Block
/bin/test `/usr/local/bin/pgrep -f '127.0.0.1' \| \ /usr/bin/wc \-l ` \-gt 20 && /usr/local/bin/pkill \-n \-f '127.0.0.1'

...

Page tree

Versions Compared

Old Version 1

New Version Current

Key

Determining locked tables

Combining queued queries with locks

Workaround: Killing "Idle in Transaction" processes with crontab

Page tree

Page History

Versions Compared

Old Version 1

New Version Current

Key

Determining locked tables

Combining queued queries with locks

Workaround: Killing "Idle in Transaction" processes with crontab