Details
Description
ChmExtractor fails with error: "TikaException: can't copy beyond array length" when calling extractChmEntry on any non-empty entry.
Upon inspection this turns out to be caused by lzxBlockOffset being incorrectly set.
This is caused by the method ChmExtractor#getIndexOfContent returing the wrong entry.
This is because ChmCommons#indexOf(List, String) returns the first entry with a name containing the string "Content". The file I am trying to parse contains a file with the name Content.css, which is the entry returned by #indexOf(...), instead of the actual content entry.
To fix the issue, ChmCommons#indexOf(...) should be more strict in how it detects the content entry.
According to: http://www.russotto.net/chm/chmformat.html, the name of the content entry will always start with "::DataSpace/Storage/", which could be used to restrict it to find the correct entry.