Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1974

LZ4 decoding is not working over hadoop

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Blocker
    • Resolution: Unresolved
    • Affects Version/s: 1.11.1
    • Fix Version/s: None
    • Component/s: parquet-mr
    • Labels:
      None

      Description

      Hello , we just tried latest apache-arrow version 3.0.0 and the write example included in low level api example, but lz4 still seems not compatible with Hadoop . we got this error reading over hadoop file parquet produced with 3.0.0 library  :

       [leal@sulu parquet]$ ./hadoop-3.2.2/bin/hadoop jar apache-parquet-1.11.1/parquet-tools/target/parquet-tools-1.11.1.jar head --debug parquet_2_0_example2.parquet
      2021-02-04 21:24:36,354 INFO hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 1500001 records.
      2021-02-04 21:24:36,355 INFO hadoop.InternalParquetRecordReader: at row 0. reading next block
      2021-02-04 21:24:36,397 INFO compress.CodecPool: Got brand-new decompressor [.lz4]
      2021-02-04 21:24:36,410 INFO hadoop.InternalParquetRecordReader: block read in memory in 55 ms. row count = 434436
      org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/home/leal/parquet/parquet_2_0_example2.parquet
      at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:255)
      at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
      at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
      at org.apache.parquet.tools.command.HeadCommand.execute(HeadCommand.java:87)
      at org.apache.parquet.tools.Main.main(Main.java:223)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
      Caused by: java.lang.IllegalArgumentException
      at java.nio.Buffer.limit(Buffer.java:275)
      at org.apache.hadoop.io.compress.lz4.Lz4Decompressor.decompress(Lz4Decompressor.java:232)
      at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
      at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
      at java.io.DataInputStream.readFully(DataInputStream.java:195)
       
      any advice ? we need to write Lz4 files by C++ and read oover Hadoop jobs but still stuck on this problem . 

        Attachments

        1. parquet_3_0_example2.parquet
          30.75 MB
          mario luzi

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mario.luzi mario luzi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: