MultiThreadingConsiderationsInProtege3

From Protege Wiki
Revision as of 11:37, June 2, 2008 by Tredmond (talk | contribs) (How to avoid the deadlock)

Jump to: navigation, search

Deadlocks and multi-threaded writes to the knowledge base

Occasionally, developers want to create a background thread which will make changes to the Protege knowledge base. The first time they try this they usually get deadlocks. Sometimes the deadlock happens right away but in other cases the deadlock occurs at apparently random times. In one case (the OBO converter) the code ran fine as long as the user did not touch the mouse or keyboard. This note explains the source of these deadlocks and how they can be avoided (in the opposite order).


How do I understand and report deadlocks when they happen?

The best tool for both understanding and reporting deadlocks is the full thread stack dump which is described here. Often - but not always - a developer can look at the thread dump without even having an understanding of the code and will be able to describe why there is a deadlock. Below I have an example of a thread dump which shows a simple deadlock of a type which is instantly detected by the java software.


Full thread dump Java HotSpot(TM) Client VM (1.5.0_13-119 mixed mode, sharing):

"DestroyJavaVM" prio=5 tid=0x01001320 nid=0xf0801000 waiting on condition [0x00000000..0xf07ffed0]

"Bad Thread" prio=5 tid=0x0100cc90 nid=0x84c600 waiting for monitor entry [0xf0d0b000..0xf0d0bbb0]
	at thread.Deadlock$BadRunnable.run(Deadlock.java:48)
	- waiting to lock <0x295864e8> (a java.lang.String)
	- locked <0x29586520> (a java.lang.String)
	at java.lang.Thread.run(Thread.java:613)

"Good Thread" prio=5 tid=0x0100ca70 nid=0x84b800 waiting for monitor entry [0xf0c8a000..0xf0c8abb0]
	at thread.Deadlock$GoodRunnable.run(Deadlock.java:27)
	- waiting to lock <0x29586520> (a java.lang.String)
	- locked <0x295864e8> (a java.lang.String)
	at java.lang.Thread.run(Thread.java:613)

"Low Memory Detector" daemon prio=5 tid=0x0100a7a0 nid=0x806400 runnable [0x00000000..0x00000000]

"CompilerThread0" daemon prio=9 tid=0x01009d90 nid=0x81d200 waiting on condition [0x00000000..0xf0b074e0]

"Signal Dispatcher" daemon prio=9 tid=0x010098a0 nid=0x81c400 waiting on condition [0x00000000..0x00000000]

"Finalizer" daemon prio=8 tid=0x010090e0 nid=0x819200 in Object.wait() [0xf0a05000..0xf0a05bb0]
	at java.lang.Object.wait(Native Method)
	- waiting on <0x255806b0> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
	- locked <0x255806b0> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x01008d10 nid=0x817a00 in Object.wait() [0xf0984000..0xf0984bb0]
	at java.lang.Object.wait(Native Method)
	- waiting on <0x25580da0> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:474)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
	- locked <0x25580da0> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=9 tid=0x01008490 nid=0x816c00 runnable 

"VM Periodic Task Thread" prio=9 tid=0x0100bf10 nid=0x807200 waiting on condition 

"Exception Catcher Thread" prio=10 tid=0x01001670 nid=0x80c000 runnable 

Found one Java-level deadlock:
=============================
"Bad Thread":
  waiting to lock monitor 0x00818900 (object 0x295864e8, a java.lang.String),
  which is held by "Good Thread"
"Good Thread":
  waiting to lock monitor 0x008188dc (object 0x29586520, a java.lang.String),
  which is held by "Bad Thread"

Java stack information for the threads listed above:
===================================================
"Bad Thread":
	at thread.Deadlock$BadRunnable.run(Deadlock.java:48)
	- waiting to lock <0x295864e8> (a java.lang.String)
	- locked <0x29586520> (a java.lang.String)
	at java.lang.Thread.run(Thread.java:613)
"Good Thread":
	at thread.Deadlock$GoodRunnable.run(Deadlock.java:27)
	- waiting to lock <0x29586520> (a java.lang.String)
	- locked <0x295864e8> (a java.lang.String)
	at java.lang.Thread.run(Thread.java:613)

Found 1 deadlock.

How to avoid the deadlock

The trick to avoiding the deadlock is to have the thread doing the changes to the knowledge base turn off either event generation for the duration of the operation. Often the best choice is to turn off event generation. This code looks like this:

     new Thread(new Runnable() {
        public void run() {
           boolean eventGenerationEnabled = model. setEventGenerationEnabled(false);
           try {

               ... make changes to the knowledge base...
          } finally {
             if (eventGenerationEnabled) {
                model. setEventGenerationEnabled(true);
             }
             reload gui. e.g.
             ProjectManager.getProjectManager().getCurrentProjectView().reloadAll();
          }
       }
    }).start()

The disadvantage of this approach is that after the updates are made to the knowledgebase, the thread must tell all components that are listening to the knowledge base for changes (e.g. the ui) that things have changed and they have not been informed. This is the purpose of the reloadAll() line in the finally clause.

The alternative approach is to turn event dispatch off. This code looks like this:

     new Thread(new Runnable() {
        public void run() {
           boolean dispatchEnabled = model.setDispatchEventsEnabled(false);
           try {
             ... make changes the knowledge base...
           } finally {
              if (dispatchEnabled) {
                 model.setDispatchEventsEnabled(true);
              }
              SwingUtilities.invokeLater(new Runnable() {
                 public void run() {
                    model.flushEvents();
                 }
               });
           }
         }
      });

In this approach the events are calculated while the thread makes changes to the knowledge base but they are only dispatched when the thread is done with the calculation. The disadvantage of this approach is that event generation is turned on for the duration. Both the generation of the events and the flushEvents at the end can be costly operations. In general people who have experimented with these approaches opt for the first one.

Why does this deadlock happen?