Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11306

single_node_perf_run.py fail to load dataset if scale factor is 1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 4.0.0
    • Impala 4.2.0
    • Infrastructure
    • ghx-label-4

    Description

      single_node_perf_run.py has a required argument "scale". If scale > 1, the script runs fine. But if scale = 1 and load is true, the data loading script will fail due to missing dataset. This is becasue the preload script omit the scale number padding when creating dataset directory.

      https://github.com/apache/impala/blob/6ea15409b879a1286e72848defdda8d5d8568c19/testdata/datasets/tpch/preload#L27

      ie., tpch scale 1 will create dataset dir "testdata/impala-data/tpch".
      On the other hand, generate-schema-statements.py will create template sql referring to "testdata/impala-data/tpch1".

      https://github.com/apache/impala/blob/6ea15409b879a1286e72848defdda8d5d8568c19/testdata/bin/generate-schema-statements.py#L599 

      Consider creating symlink if scale factor = 1 in the preload script.

      Attachments

        Activity

          People

            rizaon Riza Suminto
            rizaon Riza Suminto
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: