first of all, thx for publishing the code. I think you forgot one class "org.apache.pdfbox.pdmodel.common.XrefEntry"
i take a look at  and can't find a error.
The indirect object 31 is a dictionary object with 4 key-value pairs as followed:
The first entry has the name object "Length" and redirect to the indirect object 45. So you need to take a look inside the xref table for the object 45 to see the value (e.g. 45 0 obj 500 endobj).
The other three entries named "Length1", "Length2" and "Length3" have the integer object 568, 1017 and 0
For parsing the key-value pairs. Each key is a name object beginning with / (0x2F) immediately followed by the name without whitespaces. After the key you will find a blank (0x20) and the related value. In case that the value is also a name object, the blank will be omited.
So if you try to read the whole object 31, you need also refer to object 45.
For more informations about the objects, look at the section 7.3 and 7.3.7 of the spec.
Have you take a look at the current parser? the parser categorize the engine into small parts like parsing objects, parsing trailer. each object has rules for parsing it. by example. if you find a indirect object you will parse the prefix first (number generation R) then you parse the object (parseObject()) the next byte will be a delimiter like whitespace, linefeed or maybe a "less-than sign" ... more you will find in section 7.2.2 table 1 and 2. then you know you will find the key beginning with a / and followed by the name. after the name you need to parse again an object.
hard to explain how it work proper. the actual parser do a good work and should not be replaced completely. maybe some parts can be copied.
The string objects start and end with parenthesis. if the text also has paranthesis, they shall be balanced. if not you need to escape it. see section 22.214.171.124.
the dictionary is parsed before xref table? if you want to do it spec conform, the first thing is to find the whole trailer with the startxref.
then you can know where to find the root dictionary and the xref table. so you can parse the xref table first.
the most informations about the document can be extract from the trailer and the root dictionary. inside the root dict you can find the page dictionary (i hope this can be parsed lazy), also you can find the acroform field with forms and annotations. i think there are more informations, but i don't study all of them.
parsing the page dictionary will offer you the page structure as a tree and will refer though most of the objects of the pdf. but i don't know how this exactly work. for creating a lazy parser someone need to study this part of the spec.
i will take a look at the classes next days and try also to work on it. is there a easier way to confirm changes to it? like an extra repository? i can provide a cvs repository if this can help.
otherway i will try to do the RandomAccessFile-like structure for the pdfbox.