Tika
  1. Tika
  2. TIKA-518

Attribute values are not indexed

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.6
    • Fix Version/s: None
    • Component/s: general
    • Labels:
      None

      Description

      I just switched from jackrabbit 1.4.11 to jackrabbit2.1.1

      Some of the test cases that were working in 1.4, fail in 2.1.1.
      These test cases(CSW service related) contain an AnyText filter and they are looking for an attribute value. No records are returned in this case. It works when an element value is used.

      By looking at Jackrabbit Content Repository project I found this issue(JCR-470 XMLIndexFilter should index the attributes) which was fixed for Jackrabbit 1.4.

      Did the switch to tika(my version of jackrabbit 2.1.1 uses tika 0.6) caused this problem?

      Thank you.

        Activity

        Hide
        Ken Krugler added a comment -

        Hi Jukka - assigning to you, since it involves Jackrabbit. Tempted to close as not being a Tika issue, but thought there might be something interesting for you to look at first.

        Thanks,

        – Ken

        Show
        Ken Krugler added a comment - Hi Jukka - assigning to you, since it involves Jackrabbit. Tempted to close as not being a Tika issue, but thought there might be something interesting for you to look at first. Thanks, – Ken
        Hide
        Chris A. Mattmann added a comment -
        • not sure what component, but might as well classify it to general
        Show
        Chris A. Mattmann added a comment - not sure what component, but might as well classify it to general
        Hide
        Jukka Zitting added a comment -

        Resolving as won't fix as in most common cases XML attribute values are not interesting from a text extraction perspective. Document types for which extracting attribute values make sense should have their own parser classes.

        Show
        Jukka Zitting added a comment - Resolving as won't fix as in most common cases XML attribute values are not interesting from a text extraction perspective. Document types for which extracting attribute values make sense should have their own parser classes.

          People

          • Assignee:
            Jukka Zitting
            Reporter:
            Ovidiu Cilnician
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development