Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 2.0, 1.15
    • Component/s: general
    • Labels:
    • Environment:

      W2008R2

      Description

      We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive, works fine for most of our Office filetypes ( docx, xlsx,.... ) but we also have a lot of files with filetype xlsb which are not in the supported filetypes.
      In order to keep using this solution it is essential to us that there will be a solution provided in the future

        Issue Links

          Activity

          Hide
          gagravarr Nick Burch added a comment -

          There is no support for .xlsb files in either Tika or POI

          If you're interested in adding support for .xlsb files to Apache POI, join the dev list and we can give you advice. My hunch is it's several weeks work, once you're up to speed with how the Excel file formats work

          Show
          gagravarr Nick Burch added a comment - There is no support for .xlsb files in either Tika or POI If you're interested in adding support for .xlsb files to Apache POI, join the dev list and we can give you advice. My hunch is it's several weeks work, once you're up to speed with how the Excel file formats work
          Hide
          tpalsulich Tyler Palsulich added a comment -

          Is there interest in adding XLSB support? Or, has there been an update to support it in POI?

          Show
          tpalsulich Tyler Palsulich added a comment - Is there interest in adding XLSB support? Or, has there been an update to support it in POI?
          Hide
          gagravarr Nick Burch added a comment -

          No POI support as yet - will take a non-trivial amount of work (single digit days at minimum, maybe just into double digit days) to support it properly, and I'd guess at least a day for a hacky solution. Thus far, no takers to sponsor/do the work involved

          Show
          gagravarr Nick Burch added a comment - No POI support as yet - will take a non-trivial amount of work (single digit days at minimum, maybe just into double digit days) to support it properly, and I'd guess at least a day for a hacky solution. Thus far, no takers to sponsor/do the work involved
          Show
          dominik.stadler@gmx.at Dominik Stadler added a comment - Some official description is at https://msdn.microsoft.com/en-us/library/office/cc313133(v=office.12).aspx
          Hide
          tallison@mitre.org Tim Allison added a comment -

          From Matthew Caruana Galizia via twitter, an ASL 2.0 licensed javascript xlsb parser: https://github.com/SheetJS/js-xlsx

          Show
          tallison@mitre.org Tim Allison added a comment - From Matthew Caruana Galizia via twitter, an ASL 2.0 licensed javascript xlsb parser: https://github.com/SheetJS/js-xlsx
          Show
          tallison@mitre.org Tim Allison added a comment - https://bz.apache.org/bugzilla/show_bug.cgi?id=60826
          Hide
          tallison@mitre.org Tim Allison added a comment - - edited

          Basic xlsb streaming/read-only support was just added and should be available with POI 3.16-beta3.

          Show
          tallison@mitre.org Tim Allison added a comment - - edited Basic xlsb streaming/read-only support was just added and should be available with POI 3.16-beta3.
          Hide
          mcaruanagalizia Matthew Caruana Galizia added a comment -

          Tim Allison d'you reckon that will be out with Tika 1.15?

          Show
          mcaruanagalizia Matthew Caruana Galizia added a comment - Tim Allison d'you reckon that will be out with Tika 1.15?
          Hide
          tallison@mitre.org Tim Allison added a comment - - edited

          I think so, but It Depends (TM). I figure the next version of POI will be out in early/mid April, and then we could shoot for 1.15. All depends on our communities of devs, though.

          Next version of PDFBox (2.0.5) was released 4 hours ago.

          Show
          tallison@mitre.org Tim Allison added a comment - - edited I think so, but It Depends (TM). I figure the next version of POI will be out in early/mid April, and then we could shoot for 1.15. All depends on our communities of devs, though. Next version of PDFBox (2.0.5) was released 4 hours ago.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Tika-trunk #1240 (See https://builds.apache.org/job/Tika-trunk/1240/)
          TIKA-1195 and TIKA-2329 (tallison: https://github.com/apache/tika/commit/67612b8f805ad5d1085db14922d3b3b6ddce19bf)

          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParser.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java
          • (edit) CHANGES.txt
          • (edit) tika-parsers/pom.xml
          • (add) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFBExcelExtractorDecorator.java
          • (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
          • (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Tika-trunk #1240 (See https://builds.apache.org/job/Tika-trunk/1240/ ) TIKA-1195 and TIKA-2329 (tallison: https://github.com/apache/tika/commit/67612b8f805ad5d1085db14922d3b3b6ddce19bf ) (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParser.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java (edit) CHANGES.txt (edit) tika-parsers/pom.xml (add) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFBExcelExtractorDecorator.java (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Only 3.5 years later...

          Show
          tallison@mitre.org Tim Allison added a comment - Only 3.5 years later...
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build tika-2.x #244 (See https://builds.apache.org/job/tika-2.x/244/)
          TIKA-1195 and TIKA-2329, upgrade to POI 3.16-final and add xlsb parser (tallison: rev a847a863d1e25a9ba8209cd28c3e98be153f34a5)

          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParser.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java
          • (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_various.xlsb
          • (edit) tika-parser-modules/pom.xml
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java
          • (edit) CHANGES.txt
          • (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFBExcelExtractorDecorator.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build tika-2.x #244 (See https://builds.apache.org/job/tika-2.x/244/ ) TIKA-1195 and TIKA-2329 , upgrade to POI 3.16-final and add xlsb parser (tallison: rev a847a863d1e25a9ba8209cd28c3e98be153f34a5) (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParser.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_various.xlsb (edit) tika-parser-modules/pom.xml (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java (edit) CHANGES.txt (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFBExcelExtractorDecorator.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build tika-2.x-windows #198 (See https://builds.apache.org/job/tika-2.x-windows/198/)
          TIKA-1195 and TIKA-2329, upgrade to POI 3.16-final and add xlsb parser (tallison: rev a847a863d1e25a9ba8209cd28c3e98be153f34a5)

          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParser.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java
          • (edit) CHANGES.txt
          • (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java
          • (edit) tika-parser-modules/pom.xml
          • (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_various.xlsb
          • (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFBExcelExtractorDecorator.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build tika-2.x-windows #198 (See https://builds.apache.org/job/tika-2.x-windows/198/ ) TIKA-1195 and TIKA-2329 , upgrade to POI 3.16-final and add xlsb parser (tallison: rev a847a863d1e25a9ba8209cd28c3e98be153f34a5) (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParser.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java (edit) CHANGES.txt (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java (edit) tika-parser-modules/pom.xml (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_various.xlsb (add) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFBExcelExtractorDecorator.java

            People

            • Assignee:
              Unassigned
              Reporter:
              Frederic-Ronny Frederic Ronny
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development