Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2317

parquet-format and parquet-format-structures defines Util with inconsitent methods provided

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.12.0, 1.13.0
    • None
    • parquet-format
    • None

    Description

      I have been running into a bug due to parquet-format and parquet-format-structures both defining the org.apache.parquet.format.Util class but doing so inconsistently.

      Examples of this are several methods which include a BlockCipher parameter that are defined from parquet-format-structures but not parquet-format. While invoking code that happens to use these, such as org.apache.parquet.hadoop.ParquetFileReader.readFooter, the code will fail if the parquet-format happens to be loaded first on the classpath.

      Here is an example stack trace for a Scala Spark application.

      Caused by: java.lang.NoSuchMethodError: 'org.apache.parquet.format.FileMetaData org.apache.parquet.format.Util.readFileMetaData(java.io.InputStream, org.apache.parquet.format.BlockCipher$Decryptor, byte[])'
      at org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1441) ~[parquet_hadoop.jar:1.13.1]
      at org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1438) ~[parquet_hadoop.jar:1.13.1]
      at org.apache.parquet.format.converter.ParquetMetadataConverter$NoFilter.accept(ParquetMetadataConverter.java:1173) ~[parquet_hadoop.jar:1.13.1]
      at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1438) ~[parquet_hadoop.jar:1.13.1]
      at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:591) ~[parquet_hadoop.jar:1.13.1]
      at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:536) ~[parquet_hadoop.jar:1.13.1]
      at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:530) ~[parquet_hadoop.jar:1.13.1]
      at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:478) ~[parquet_hadoop.jar:1.13.1]
      ... (my application code invoking the above)
      

      Because of issues external to Parquet that I have yet to figure out (a complex Spark and dependency setup), my classpaths are not deterministically ordered and I am unable to pin the parquet-format-structures ahead hence why I'm chiming in about this.

      Even if that weren't the case, this is a fairly prickly edge to run into as both modules define overlapping classes. Util is not the only class that appears to be defined by both, just what I have been focusing on due to this bug.
      It appears these methods were introduced in at least 1.12: https://github.com/apache/parquet-mr/commit/65b95fb72be8f5a8a193a6f7bc4560fdcd742fc7#diff-852341c99dcae06c8fa2b764bcf3d9e6860e40442d0ab1cf5b935df80a9cacb7

      Attachments

        Activity

          People

            Unassigned Unassigned
            legojoey17 Joey Pereira
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: