Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7444

JSON blank result on SELECT when too much byte in multiple files on Drill embedded

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.17.0
    • Fix Version/s: None
    • Component/s: Storage - JSON
    • Labels:
      None

      Description

      2 files (a.json and b.json) and the concat of these 2 file (ab.json) produce different results on a simple SELECT when using Drill embedded.

      Problem appears from a number of byte (~ 102 400 000 in my case)

      #!/bin/bash
      # script gen.sh to reproduce the problem
      for ((i=1;i<=$1;++i));
      do
          echo -n '{"At":"'
          for j in {1..999};
          do
      	echo -n 'aaaaabbbbb'
          done
          echo '"}'
      done
      
      == I ==
      $ gen.sh 10000 > a.json
      $ gen.sh 239 > b.json
      $ wc -c *.json
      100000000 a.json
        2390000 b.json
      102390000 total
      $ bash drill-embedded
      apache drill> SELECT * FROM dfs.tmp.`*.json` LIMIT 1;
      +--------------------+
      |       At           |
      +--------------------+
      | aaaaabbbbaaaaab... |
      +--------------------+
      => All is fine here
      
      
      == II ==
      $ gen.sh 10000 > a.json
      $ gen.sh 240 > b.json
      $ wc -c *.json
      100000000 a.json
        2400000 b.json
      102400000 total
      $ bash drill-embedded
      apache drill> SELECT * FROM dfs.tmp.`*.json` LIMIT 1;
      +--------------------+
      |       At           |
      +--------------------+
      |                    |
      +--------------------+
      => In a surprising way field `At` is empty
      
      == III ==
      $ gen.sh 10240 > ab.json
      $ wc -c *.json 
      102400000 ab.json
      $ bash drill-embedded
      apache drill> SELECT * FROM dfs.tmp.`c.json` LIMIT 1;
      +--------------------+ 
      |        At          |
      +--------------------+
      | aaaaabbbbaaaaab... |
      +--------------------+
      => All is fine here although the number of lines is equal to case II
        

      The Version of the Drill 1.17 tested here is the latest at 2019-11-13
      This problem doesn't appears with Drill embedded 1.16

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              benj641 benj
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: