Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-912

MoreIndexingFilter does not parse docx and xlsx date formats

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2, 1.3, nutchgora
    • 1.3, nutchgora
    • indexer
    • None
    • Patch Available

    Description

      The following error occurs in hadoop.log when MoreIndexingFilter tries to parse dates from MS Office formats:
      2010-10-08 13:56:32,555 WARN more.MoreIndexingFilter - http://ridder.uio.no/test1.xlsx: can't parse erroneous date: 2010-10-08T13:55:54Z

      This problem affects docx and xlsx formats, but probably the other XML-based MS Office formats as well.

      Attachments

        1. NUTCH-912-v12-1.patch
          0.7 kB
          Markus Jelsma
        2. NUTCH-912-v12-1.patch
          0.7 kB
          Markus Jelsma
        3. NUTCH-912-v13-1.patch
          0.7 kB
          Markus Jelsma

        Activity

          People

            markus17 Markus Jelsma
            erlendfg Erlend GarĂ¥sen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: