MultiThreadingConsiderationsInProtege3

From Protege Wiki
Jump to: navigation, search

Deadlocks and multi-threaded writes to the knowledge base

Occasionally, developers want to create a background thread which will make changes to the Protege knowledge base. The first time they try this they usually get deadlocks. Sometimes the deadlock happens right away but in other cases the deadlock occurs at apparently random times. In one case (the OBO converter) the code ran fine as long as the user did not touch the mouse or keyboard. This note explains the source of these deadlocks and how they can be avoided (in the opposite order).


How do I understand and report deadlocks when they happen?

The best tool for both understanding and reporting deadlocks is the full thread stack dump which is described here. Often - but not always - a developer can look at the thread dump without even having an understanding of the code and will be able to describe why there is a deadlock. Below I have an example of a thread dump which shows a simple deadlock of a type which is instantly detected by the java software.


Full thread dump Java HotSpot(TM) Client VM (1.5.0_13-119 mixed mode, sharing):

"DestroyJavaVM" prio=5 tid=0x01001320 nid=0xf0801000 waiting on condition [0x00000000..0xf07ffed0]

"Bad Thread" prio=5 tid=0x0100cc90 nid=0x84c600 waiting for monitor entry [0xf0d0b000..0xf0d0bbb0]
	at thread.Deadlock$BadRunnable.run(Deadlock.java:48)
	- waiting to lock <0x295864e8> (a java.lang.String)
	- locked <0x29586520> (a java.lang.String)
	at java.lang.Thread.run(Thread.java:613)

"Good Thread" prio=5 tid=0x0100ca70 nid=0x84b800 waiting for monitor entry [0xf0c8a000..0xf0c8abb0]
	at thread.Deadlock$GoodRunnable.run(Deadlock.java:27)
	- waiting to lock <0x29586520> (a java.lang.String)
	- locked <0x295864e8> (a java.lang.String)
	at java.lang.Thread.run(Thread.java:613)

"Low Memory Detector" daemon prio=5 tid=0x0100a7a0 nid=0x806400 runnable [0x00000000..0x00000000]

"CompilerThread0" daemon prio=9 tid=0x01009d90 nid=0x81d200 waiting on condition [0x00000000..0xf0b074e0]

"Signal Dispatcher" daemon prio=9 tid=0x010098a0 nid=0x81c400 waiting on condition [0x00000000..0x00000000]

"Finalizer" daemon prio=8 tid=0x010090e0 nid=0x819200 in Object.wait() [0xf0a05000..0xf0a05bb0]
	at java.lang.Object.wait(Native Method)
	- waiting on <0x255806b0> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
	- locked <0x255806b0> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x01008d10 nid=0x817a00 in Object.wait() [0xf0984000..0xf0984bb0]
	at java.lang.Object.wait(Native Method)
	- waiting on <0x25580da0> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:474)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
	- locked <0x25580da0> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=9 tid=0x01008490 nid=0x816c00 runnable 

"VM Periodic Task Thread" prio=9 tid=0x0100bf10 nid=0x807200 waiting on condition 

"Exception Catcher Thread" prio=10 tid=0x01001670 nid=0x80c000 runnable 

Found one Java-level deadlock:
=============================
"Bad Thread":
  waiting to lock monitor 0x00818900 (object 0x295864e8, a java.lang.String),
  which is held by "Good Thread"
"Good Thread":
  waiting to lock monitor 0x008188dc (object 0x29586520, a java.lang.String),
  which is held by "Bad Thread"

Java stack information for the threads listed above:
===================================================
"Bad Thread":
	at thread.Deadlock$BadRunnable.run(Deadlock.java:48)
	- waiting to lock <0x295864e8> (a java.lang.String)
	- locked <0x29586520> (a java.lang.String)
	at java.lang.Thread.run(Thread.java:613)
"Good Thread":
	at thread.Deadlock$GoodRunnable.run(Deadlock.java:27)
	- waiting to lock <0x29586520> (a java.lang.String)
	- locked <0x295864e8> (a java.lang.String)
	at java.lang.Thread.run(Thread.java:613)

Found 1 deadlock.

How to avoid the deadlock

The trick to avoiding the deadlock is to have the thread doing the changes to the knowledge base turn off either event generation or event dispatch for the duration of the operation. Often the best choice is to turn off event generation. This code looks like this:

     new Thread(new Runnable() {
        public void run() {
           boolean eventGenerationEnabled = model. setEventGenerationEnabled(false);
           try {

               ... make changes to the knowledge base...
          } finally {
             if (eventGenerationEnabled) {
                model. setEventGenerationEnabled(true);
             }
             reload gui. e.g.
             ProjectManager.getProjectManager().getCurrentProjectView().reloadAll();
          }
       }
    }).start()

The disadvantage of this approach is that after the updates are made to the knowledgebase, the thread must tell all components that are listening to the knowledge base for changes (e.g. the ui) that things have changed and they have not been informed. This is the purpose of the reloadAll() line in the finally clause.

The alternative approach is to turn event dispatch off. This code looks like this:

     new Thread(new Runnable() {
        public void run() {
           boolean dispatchEnabled = model.setDispatchEventsEnabled(false);
           try {
             ... make changes the knowledge base...
           } finally {
              if (dispatchEnabled) {
                 model.setDispatchEventsEnabled(true);
              }
              SwingUtilities.invokeLater(new Runnable() {
                 public void run() {
                    model.flushEvents();
                 }
               });
           }
         }
      });

In this approach the events are calculated while the thread makes changes to the knowledge base but they are only dispatched when the thread is done with the calculation. The disadvantage of this approach is that event generation is turned on for the duration. Both the generation of the events and the flushEvents at the end can be costly operations. In general people who have experimented with these approaches opt for the first one.

Why does this deadlock happen?

Protege 3 has a very constraining thread locking model. There is a single lock for all read and write access to the knowledge base. This is fine thus far for deadlock issues (but perhaps not for performance). The problem comes with the Protege 3 event generation and dispatch capabilities.

Whenever a change is made to a Protege knowledge base, Protege will generate a list of events describing the change (event generation) and will invoke listeners based on these events (event dispatch). These two processes will be done synchronously with the change to the Protege knowledge base. In a typical Protege deployment, many of these listeners are working on behalf of the graphical interface and will invoke graphical routines. As a result these listeners wait on the AWT event queue. So as a result making a change to the knowledge base involves taking the knowledge base lock and then within that taking the AWT event queue lock.

But in addition, in a typical Protege deployment, there will be graphical code that needs to read the database. This code holds the AWT lock and then takes the knowledge base lock in order to access the database.

This means that any code that makes changes to the Protege knowledge base must be holding the AWT event queue lock in order to avoid deadlock. This rules out the possibility of multithreaded writes to Protege which is occasionally a problem.

To avoid this we can turn off either event dispatch or event generation. Either of these approaches will stop Protege from invoking the listeners while the caller writes to the Protege knowledge base. Thus the only lock taken during a write to Protege is the knowledge base lock and the deadlock is avoided.