Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.14.2
-
None
Description
This is reported from the user mailing list.
Run the following test to reproduce this bug.
import org.apache.flink.table.api.EnvironmentSettings; import org.apache.flink.table.api.TableEnvironment; import org.apache.flink.table.api.internal.TableEnvironmentImpl; import org.junit.Test; public class MyTest { @Test public void myTest() throws Exception { EnvironmentSettings settings = EnvironmentSettings.inBatchMode(); TableEnvironment tEnv = TableEnvironmentImpl.create(settings); tEnv.executeSql( "create table T1 ( a INT ) with ( 'connector' = 'filesystem', 'format' = 'json', 'path' = '/tmp/gao.json' )") .await(); tEnv.executeSql( "create table T2 ( a INT ) with ( 'connector' = 'filesystem', 'format' = 'json', 'path' = '/tmp/gao.gz' )") .await(); tEnv.executeSql("select count(*) from T1 UNION ALL select count(*) from T2").print(); } }
Data files used are attached in the attachment.
The result is
+----------------------+ | EXPR$0 | +----------------------+ | 100 | | 24 | +----------------------+
which is obviously incorrect.
This is because DelimitedInputFormat#fillBuffer cannot deal with compressed files correctly. It limits the number of (uncompressed) bytes read with splitLength, while splitLength is the length of compressed bytes, so they cannot match.