Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

event.consumer.name.filters - Defines a set of event filters for the named Consumer. The value is a list of "filters" which select the events this consumer will see, selected by combinations of DSpace object type and action. The filter list value consists of a set of filter clauses separated by colons (:). Each clause is a set of DSpace Object types, a plus sign (+), and a set of actions. The object and action lists are
separated by vertical-bar (|). Here is a rough grammar:

Panel

Wiki Markup
filter-list ::= filter \[ ":" filter \]..

filter ::= object-set "+" action-set

Wiki Markup
object-set ::= object \[ "\|" object\]..

Wiki Markup
action-set ::= action \[ "\|" action\]..

object ::= "All" | "Bitstream" | "Bundle" | "Item" | "Collection" | "Community" | "Site" | "Group" | "Eperson"

action ::= "All" | "Create" | "Modify" | "Modify_Metadata" | "Add" | "Remove" | "Delete"

The special value "All" denotes all available objects or actions. The filter All+All allows all events through.

...

The default version of the ActiveMQ configuration file will be in your install directory under

...

config/activemq.xml

...

. Copy it to
the runtime config directory (e.g.

...

dspace/config

...

) and modify it if necessary, consult the ActiveMQ 4.0 documentation for details.

The version supplied works with a PostgreSQL database.

Example

Panel

# This default dispatcher preserves the status quo, all synchronous
# consumers of search, browse, and history:

event.dispatcher.default.class = org.dspace.event.BasicDispatcher
event.dispatcher.default.consumers = \
search:sync, \
browse:sync, \
history:sync

event.consumer.search.class = org.dspace.search.SearchConsumer
event.consumer.search.filters = Item|Collection|Community|Bundle+Create|Modify|Modify_Metadata|Delete:Bundle+Add|Remove

event.consumer.browse.class = org.dspace.browse.BrowseConsumer
event.consumer.browse.filters = Item+Create|Modify|Modify_Metadata:Collection+Add|Remove

event.consumer.history.class = org.dspace.history.HistoryConsumer
event.consumer.history.filters = all+*

# email to subscribers – run this asynchronously once a day.
event.consumer.mail.class = org.dspace.eperson.Subscribe
event.consumer.mail.filters = Item+Modify|Modify_Metadata:Collection+Add|Remove

# example of a configuration with a couple of async consumers
event.dispatcher.with-async.class = org.dspace.event.BasicDispatcher
event.dispatcher.with-async.consumers = \
search:sync, \
browse:sync, \
mail:async, \
testALL:async, \
history:async

event.consumer.testALL.class = org.dspace.event.TestConsumer
event.consumer.testALL.filters = All+All

# dispatcher chosen by Packager main()
packager.dispatcher = batch

# ActiveMQ JMS config:

jms.configuration = xbean:/activemq.xml

# local TCP-based broker, must start
jms.broker.uri = tcp://localhost:61616

Operation

Start and run applications as usual.

To see events in action, alter the default dispatcher configuration to
include the testALL consumer, and make sure your DSpace log is
recording at the INFO level (at least). Then, watch the log while
doing anything that changes the data model; look for messages from
the TestConsumer class.

You can also run the test consumer as an asynch consumer in a separate
process to observe how asynchronous events are passed along in real
time, or accumulated between polls.

...

If you configure any asynchronous dispatchers, you'll have to run
the ActiveMQ broker on your server as well. There is a script
to start and stop it easily which has been add to the

...

bin

...

directory of the source; it should get installed in the

...

bin

...

subdirectory of the DSpace runtime hierarchy.

...

Check the default ActiveMQ configuration in
dspace-install/

...

_config/activemq.xml

...

. The PostgreSQL
login in particular may need to be configured for your site.
ActiveMQ uses the database to keep tables of persistent events.
They are automatically maintained to discard expired events.

...

Panel

dspace-install/bin/asynch-broker stop

You can make these commands part of the regular startup and shutdown
procedure of your server; they are designed to be invoked from
System-V-type "rc.d" scripts as are used on Solaris and Linux. Just
be Justbe sure to run it as the same user who owns your DSpace processes.

...

To process asynchronous events, you can run one consumer at a time
with the command:

Panel

/dspace/bin/dsrun org.dspace.event.AsynchEventManager -c CONSUMER

e.g.

Panel

/dspace/bin/dsrun org.dspace.event.AsynchEventManager -c mail

..substituting the consumer name for

...

CONSUMER

...

, of course. Give
the command with the -h option for help on other arguments and
options, or see the comments in the source.

...

Here are some unresolved issues and problems in the prototype.
Your comments and proposed solutions are welcome!

1. Removing duplicate events

The code in

...

Context.addEvent()

...

pre-filters the events by removing
any events which are duplicates – that is, identical to an
event already in the queue for this transaction in all respects except
for timestamp. The rationale is that a duplicate soaks up processing
resources and does not convey any additional information, even to
the History system, since events are so fine-grained. Furthermore,
the way in which the current applications (e.g. WebUI) use the
data model API seems to produce a lot of extraneous duplicate events
so this filtering does a lot of good.

...

Consumer code runs in a somewhat strange environment:

  • Before calling Code Blockconsume(), the Context object will have already called its Code Blockcommit() method to commit its changes to the RDBMS, although the DB connection is still open.
  • Any DB changes made by the consumer code must be committed by explicitly calling the JDBC commit() method on the DB connection.
  • Consumers must not call the Context's commit() since it would run the event dispatch again, causing an infinite loop.
  • Given the above restrictions, consumer code can modify data model objects and update them; for example, it is possible to run the MediaFilter from an event consumer to update e.g. thumbnails whenever an object is changed.
  • Since the consumer may be run asynchronously (and it has no way to tell), it must allow for the possibility that the DSpace Objects referenced in the event may no longer exist. It should test the results of every find().

...

Since the Event system forms a vital part of the core DSpace server,
any failure in event processing should register as a fatal error in
the transaction. Unfortunately, the events are, necessarily, triggered
after the database transaction commits. (This is necessary because
otherwise there would be a race condition between asynch event processors
looking at the data model and the process generating events, the asynch
consumer might see a pre-transaction view of the DB.)

Also, failures late in the cycle of a WebUI transaction do not
get rendered correctly as an error page because the servlet has already
generated some output and the attempt to send a different status and
an error page just sets off an illegal state exception. This problem is
inherent in the current implementation of the Web UI and would be very
hard to change.

Since the asynchronous event mechanism is the part most susceptible to
errors, depending as it does on network resources and complex configuration,
the Context code makes an attempt to exercise the dispatcher and asynch
delivery as much as possible before committing the transaction, to
flush out some fatal problems in time to abort it.

...

The ActiveMQ implementation uses shutdown hooks to terminate its
internal state, and if they are not called the result is a JVM that
hangs instead of shutting down because an ActiveMQ thread is still
waiting for input from a network peer.

The solution is simple: command-line applications must always
call

Code Block
System.exit()

before terminating, and not just run off
the end of the

...

main()

...

method. (Though the fact that there is this
distinction counts as a flaw in the Java runtime, IMHO; the Unix
system call they are aping has no such restriction.)

This prototype includes patches to add

...

System.exit()

...

calls to all
command-line applications that can generate events. It's a good idea
to fix any application that generates events at all, whether or not
you anticipate any of those events being asynchronous.

...

The current configuration of ActiveMQ requires a separate "broker"
listening at a well-known TCP port. For the prototype, it is started
manually.

I'll investigate other ActiveMQ broker options (some of them simply don't
work for this application, however), and also code to start it
automatically or at least through a more friendly DSpace application.

...

When an asynch Consumer runs, its Context has the same CurrentUser set
that was set in the code that generated the event. Since Consumers should
not be doing anything that requires special privileges, this probably
won't be an issue, but it's worth noting.

...

One part of the Browse index updating cannot be put into a consumer because of an architectural problem: The Browse tables have foreign keys into the Item table. Deleting an Item thus breaks those foreign key references, so the Browse tables must be updated first. By the time a Browse update would be running in an event consumer, the Item table would have already have had to be updated to reflect the delete, which is impossible.

See the

...

delete()

...

method in

...

org.dspace.content.Item

...

.

This is the only case in the DSpace core code where a search or browse index update could not be moved into an event consumer.

...