Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5961

Test data for TPC-DS schema contains a non-Unicode character

    XMLWordPrintableJSON

Details

    • Task
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • Impala 2.10.0
    • None
    • Infrastructure
    • ghx-label-5

    Description

      The customer table contains rows whose c_birth_country values contain character 0xd4 (o-circumflex) in an illegal position for Unicode. This causes tpcds-q30 to fail. Either tests need to change to accommodate the different character set, or the test data should change to contain the proper Unicode character.

      To reproduce, build a mini-cluster and load with test data (./buildall.sh -testdata ...), then run the query from the attached file. Find the affected rows with:
      SELECT * FROM customer WHERE c_birth_country LIKE '%IVOIRE%';

      Attachments

        1. ttq-50.out
          20 kB
          Tim Wood

        Issue Links

          Activity

            People

              Unassigned Unassigned
              twood@cloudera.com Tim Wood
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: