Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3551

CTAS from complex Json source with schema change is not written (and hence not read back ) correctly

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.1.0
    • 1.2.0
    • Execution - Data Types
    • None

    Description

      The source data contains -

      20K rows with the following -
      {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}

      200 rows with the following -
      {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
      entries only"}}

      Creating a table and reading it back returns incorrect data -

      CREATE TABLE testparquet as select * from `test.json`;
      SELECT * from testparquet;

      Yields

      yes {"other":"true","all":"false","sometimes":"yes"}
      yes {"other":"true","all":"false","sometimes":"yes"}
      yes {"other":"true","all":"false","sometimes":"yes"}
      yes {"other":"true","all":"false","sometimes":"yes"}

      The "additional" field is missing in all records

      Parquet metadata for the created file does not have the 'additional' field

      Attachments

        1. DRILL-3551.json
          1.45 MB
          Parth Chandra

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cchang@maprtech.com Chun Chang
            parthc Parth Chandra
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment