Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25311

DelimitedInputFormat cannot read compressed files correctly

    XMLWordPrintableJSON

Details

    Description

      This is reported from the user mailing list.

      Run the following test to reproduce this bug.

      import org.apache.flink.table.api.EnvironmentSettings;
      
      import org.apache.flink.table.api.TableEnvironment;
      import org.apache.flink.table.api.internal.TableEnvironmentImpl;
      
      import org.junit.Test;
      
      public class MyTest {
      
          @Test
          public void myTest() throws Exception {
              EnvironmentSettings settings = EnvironmentSettings.inBatchMode();
              TableEnvironment tEnv = TableEnvironmentImpl.create(settings);
              tEnv.executeSql(
                              "create table T1 ( a INT ) with ( 'connector' = 'filesystem', 'format' = 'json', 'path' = '/tmp/gao.json' )")
                      .await();
              tEnv.executeSql(
                              "create table T2 ( a INT ) with ( 'connector' = 'filesystem', 'format' = 'json', 'path' = '/tmp/gao.gz' )")
                      .await();
              tEnv.executeSql("select count(*) from T1 UNION ALL select count(*) from T2").print();
          }
      }
      

      Data files used are attached in the attachment.

      The result is

      +----------------------+
      |               EXPR$0 |
      +----------------------+
      |                  100 |
      |                   24 |
      +----------------------+
      

      which is obviously incorrect.

      This is because DelimitedInputFormat#fillBuffer cannot deal with compressed files correctly. It limits the number of (uncompressed) bytes read with splitLength, while splitLength is the length of compressed bytes, so they cannot match.

      Attachments

        1. gao.gz
          0.2 kB
          Caizhi Weng
        2. gao.json
          1.0 kB
          Caizhi Weng

        Activity

          People

            xiaoxingStack Jinxin.Tang
            TsReaper Caizhi Weng
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: