Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-4

Use LRU caching for footers in ParquetInputFormat.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.0
    • Component/s: parquet-mr
    • Labels:
      None

      Description

      The caching approach needs to change because of issues that occur when the same ParquetInputFormat instance is reused to generate splits for different input directories. For example, it causes problems in Hive's FetchOperator when the FetchOperator is attempting to operate over more than one partition (sidenote: as far as I could tell, Hive has been reusing inputformat instances in this way for quite some time). The details of how this issue manifests itself with respect to Hive are described in more detail here: https://groups.google.com/d/msg/parquet-dev/0aXql-3z7vE/Gn5m094V7PMJ

      The proposed patch can be found here: https://github.com/apache/incubator-parquet-mr/pull/2

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                matt.martin Matt Martin
                Reporter:
                matt.martin Matt Martin
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: