Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-20221

DelimitedInputFormat does not restore compressed filesplits correctly leading to dataloss

    XMLWordPrintableJSON

Details

    Description

      It seems that the delimited input format cannot correctly restore input splits if they belong to compressed files. Basically when a compressed filesplit is restored in the middle, it won't read it anymore leading to dataloss.

      The cause of the problem is that for compressed splits that use an inflater stream, the splitlength is set to the magic number -1 which is ignored in the reopen method and causes the split to go to `end` state immediately.

      The problem and the fix is shown in this commit:
      https://github.com/gyfora/flink/commit/4adc8ba8d1989fff2db43881c9cb3799848c6e0d

      Attachments

        Activity

          People

            gyfora Gyula Fora
            gyfora Gyula Fora
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: