Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.17.0
-
None
-
None
Description
2 files (a.json and b.json) and the concat of these 2 file (ab.json) produce different results on a simple SELECT when using Drill embedded.
Problem appears from a number of byte (~ 102 400 000 in my case)
#!/bin/bash # script gen.sh to reproduce the problem for ((i=1;i<=$1;++i)); do echo -n '{"At":"' for j in {1..999}; do echo -n 'aaaaabbbbb' done echo '"}' done
== I == $ gen.sh 10000 > a.json $ gen.sh 239 > b.json $ wc -c *.json 100000000 a.json 2390000 b.json 102390000 total $ bash drill-embedded apache drill> SELECT * FROM dfs.tmp.`*.json` LIMIT 1; +--------------------+ | At | +--------------------+ | aaaaabbbbaaaaab... | +--------------------+ => All is fine here == II == $ gen.sh 10000 > a.json $ gen.sh 240 > b.json $ wc -c *.json 100000000 a.json 2400000 b.json 102400000 total $ bash drill-embedded apache drill> SELECT * FROM dfs.tmp.`*.json` LIMIT 1; +--------------------+ | At | +--------------------+ | | +--------------------+ => In a surprising way field `At` is empty == III == $ gen.sh 10240 > ab.json $ wc -c *.json 102400000 ab.json $ bash drill-embedded apache drill> SELECT * FROM dfs.tmp.`c.json` LIMIT 1; +--------------------+ | At | +--------------------+ | aaaaabbbbaaaaab... | +--------------------+ => All is fine here although the number of lines is equal to case II
The Version of the Drill 1.17 tested here is the latest at 2019-11-13
This problem doesn't appears with Drill embedded 1.16