Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5527

Create a nested testdata flattener for the query generator

Details

    • ghx-label-7

    Description

      In order to use the query generator to test nested types, we need a way to convert a nested dataset into an equivalent flattened one that can be loaded into Postgres. Maps and Arrays should be converted into tables that can be joined with the original table to simulate the nesting.

      Attachments

        Activity

          alex.behm Alexander Behm added a comment -

          commit bd6d2df7304ea395cfbce286ab2d47711e2ea5b4
          Author: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
          Date: Tue Jan 24 14:54:03 2017 -0800

          IMPALA-5527: Add nested testdata flattener

          The TableFlattener takes a nested dataset and creates an equivalent
          unnested dataset. The unnested dataset is saved as Parquet.

          When an array or map is encountered in the original table, the flattener
          creates a new table and adds an id column to it which references the row
          in the parent table. Joining on the id column should produce the
          original dataset.

          The flattened dataset should be loaded into Postgres in order to run the
          query generator (in nested types mode) on it. There is a script that
          automates generaration, flattening and loading random data into Postgres
          and Impala:
          testdata/bin/generate-load-nested.sh -f

          Testing:

          • ran ./testdata/bin/generate-load-nested.sh -f and random nested data
            was generated and flattened as expected.

          Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6
          Reviewed-on: http://gerrit.cloudera.org:8080/5787
          Reviewed-by: Alex Behm <alex.behm@cloudera.com>
          Tested-by: Impala Public Jenkins

          alex.behm Alexander Behm added a comment - commit bd6d2df7304ea395cfbce286ab2d47711e2ea5b4 Author: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Date: Tue Jan 24 14:54:03 2017 -0800 IMPALA-5527 : Add nested testdata flattener The TableFlattener takes a nested dataset and creates an equivalent unnested dataset. The unnested dataset is saved as Parquet. When an array or map is encountered in the original table, the flattener creates a new table and adds an id column to it which references the row in the parent table. Joining on the id column should produce the original dataset. The flattened dataset should be loaded into Postgres in order to run the query generator (in nested types mode) on it. There is a script that automates generaration, flattening and loading random data into Postgres and Impala: testdata/bin/generate-load-nested.sh -f Testing: ran ./testdata/bin/generate-load-nested.sh -f and random nested data was generated and flattened as expected. Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6 Reviewed-on: http://gerrit.cloudera.org:8080/5787 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins

          People

            tarasbob Taras Bobrovytsky
            tarasbob Taras Bobrovytsky
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: