Issue Details (XML | Word | Printable)

Key: UIMA-1067
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Thilo Goetz
Reporter: Thilo Goetz
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
UIMA

Remove char heap/ref heap in StringHeap of the CAS

Created: 06/Jun/08 10:21 AM   Updated: 26/Jun/08 10:54 AM
Return to search
Component/s: Core Java Framework
Affects Version/s: 2.2.2
Fix Version/s: 2.3

Time Tracking:
Not Specified

Resolution Date: 26/Jun/08 10:54 AM


 Description  « Hide
The StringHeap class provides two ways to store strings: either as Java strings, or by copying characters onto a character heap. The second option is only used for deserialization from a binary CAS. However, even if not used, this capability means a very significant memory overhead. To demonstrate this, I ran the following experiment. As analysis engine, I used our sandbox POS tagger. It sets just one string feature on each token. As text, I used a 2.4MB input file (2x moby.txt). To run this in IBM Java 1.5.0_7 (which happens to be the JVM I'm interested in) you need to specify -Xmx135M. I checked 5MB increments. The I patched the StringHeap implementation to work without the additional book keeping overhead and ran the experiment again. I was then able to run with -Xmx115M. This represents a very significant gain, particularly given the fact that I ran so little analysis (only tokens and sentences are produced, and only a single string-valued feature set). The new code also ran a tiny bit faster, but not much. One might see more improvement for analysis that is not as compute intensive as the Tagger.

The challenge is to make sure that the serialization code still works after this change.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Repository Revision Date User Message
ASF #663948 Fri Jun 06 14:14:50 UTC 2008 twgoetz Jira UIMA-1067: remove legacy char heap/ref heap from StringHeap code.

https://issues.apache.org/jira/browse/UIMA-1067
Files Changed
MODIFY /incubator/uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/cas/impl/CASSerializer.java
MODIFY /incubator/uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/cas/impl/CASImpl.java
ADD /incubator/uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/cas/impl/StringHeapDeserializationHelper.java

Repository Revision Date User Message
ASF #663949 Fri Jun 06 14:17:36 UTC 2008 twgoetz Jira UIMA-1067: remove legacy char heap/ref heap from StringHeap code.

https://issues.apache.org/jira/browse/UIMA-1067
Files Changed
MODIFY /incubator/uima/uimaj/trunk/uimaj-core/src/main/java/org/apache/uima/cas/impl/StringHeap.java

Thilo Goetz added a comment - 06/Jun/08 02:21 PM
Fixed, all unit tests pass. Please test this change if you use (binary) serialization. It should work the same as before, I haven't changed the serialization format in any way.

Thilo Goetz made changes - 06/Jun/08 02:21 PM
Field Original Value New Value
Status Open [ 1 ] Closed [ 6 ]
Resolution Fixed [ 1 ]
Thilo Goetz added a comment - 24/Jun/08 11:48 AM
Fix in 2.2.2 hotfix 1.

Thilo Goetz made changes - 24/Jun/08 11:48 AM
Status Closed [ 6 ] Reopened [ 4 ]
Resolution Fixed [ 1 ]
Repository Revision Date User Message
ASF #671166 Tue Jun 24 13:15:37 UTC 2008 twgoetz Jira UIMA-1067: Remove char heap in string heap
impl for 2.2.2 hotfix 1 (2.2.2-01).

https://issues.apache.org/jira/browse/UIMA-1067
Files Changed
MODIFY /incubator/uima/uimaj/branches/uimaj-2.2.2-01/uimaj-core/src/main/java/org/apache/uima/cas/impl/CASImpl.java
ADD /incubator/uima/uimaj/branches/uimaj-2.2.2-01/uimaj-core/src/main/java/org/apache/uima/cas/impl/StringHeapDeserializationHelper.java
MODIFY /incubator/uima/uimaj/branches/uimaj-2.2.2-01/uimaj-core/src/main/java/org/apache/uima/cas/impl/StringHeap.java
MODIFY /incubator/uima/uimaj/branches/uimaj-2.2.2-01/uimaj-core/src/main/java/org/apache/uima/cas/impl/CASSerializer.java

Thilo Goetz added a comment - 26/Jun/08 10:54 AM
Backported to 2.2.2-01.

Thilo Goetz made changes - 26/Jun/08 10:54 AM
Status Reopened [ 4 ] Closed [ 6 ]
Resolution Fixed [ 1 ]