Scalability and Tuning
Scalability and Tuning in Protege
Contents
How Many Frames can Protege hold?
We have done testing with "simulated" knowledge bases (generated by a program) as large as 5M frames (classes & instances). We know of users of the system with more than 100K frames in a real knowledge base and they are seeing acceptable UI performance.
Projects larger than 100K frames typically required the use of the database backend. MySQL seems to give the best performance of the RDBM's that we have tested.
File-based projects are typically limited to 50-100K frames because of memory limitations. Exactly how much memory a frame takes up depends on factors such has how many slots with values it has and thus no exact numbers can be given. A rule of thumb though is that if you have 2+ GB of memory in your machine then you can probably load 100K frames from a file. If you have less memory you will run into problems sooner. Having more memory does not help because a Java VM cannot make use of it.
Does Protege slow down as the number of frames goes up?
For file based projects the answer is pretty much "not after loading". The time spent loading the project is roughly proportional to the number of frames. This can be several minutes for large projects. Also, loading Projects in some file formats is faster than others. Once the system is loaded though you probably will not see much difference in the behavior of the UI for large projects.
For database projects the speed of loading the system is not directly related to the size of the the project. Frames in a database project are loaded "on demand" and are flushed from the cache when they are no longer needed and the cache is full. Thus the first time a frame is accessed you may note a small (1 second or two) delay. After that the frame will probably remain in memory so that subsequent accesses are fast. Operations on a database project that can be slow are: expanding a class with a lot of direct subclasses and displaying the instances for a class with many instances. Here "a lot" and "many" should probably be defined as >1000. We have done testing with a class with 2500 subclasses and classes with 1M instances. Both of these cases can result in 1-10 second UI delays the first time the class is accessed. After that the response time should be "sub second".
What tuning can be done to speed things up?
The most important thing to tune is the heap size (see separate page on setting the heap size). The maximum amount of memory that a Java VM can use is 1.6 GB on Windows XP and 2 GB on most Unix machines.
You must be careful about setting the heap size parameter. If you set it too low then you will get "out of memory" errors. If you set it too high then your system will hang or you will suffer poor performance because parts of the jvm will be swapped in and out of memory. A rule of thumb is that you should not set this parameter larger than about 80% of your free physical memory. On Windows XP machines you can determine your free physical memory from the Performance tab of the Task Manager application. On Mac machines, click the apple (upper left hand corner and "about this mac"). On Linux machines, you can use the wonderful proc filesystem and look at the meminfo "file".
Boosting the heap size parameter will allow you to read in larger file-based projects. It will also improve the performance of the database back-end since more memory is available for caching.
The most common source of very slow performance on older systems (or laptops) is having the heap size set too large. If your system does not have 100MB of free memory then even the Protege default value is too big and you should make it smaller (or buy more memory).
Sharing the Virtual Machine
If you are short of memory and want to run Protege and other Java-based applications at the same time, you may want to try JDistro, a shared runtime and Swing desktop. Protege runs fine in multi-window mode (Wharf). It also runs fine in desktop mode (Korte), but only for release 0.37 and higher of JDistro. The only restriction found at this time is you cannot run two instances of Protege but this should not be a problem for most users.