Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11469

Ignore _spark_metadata folder in table location

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.2.0
    • Backend
    • None
    • ghx-label-9

    Description

      When spark streaming is used to write parquet files out to an external table a folder _spark_metadata is created within the directory of the table. Hive is capable of dealing with this directory, but Impala trips on it. 

      So REFRESH TABLE won't work as it sees a directory with data Impala cannot cope with. A SELECT will also not work as it trips on the spark_metadata __ folder _.

      Issue was found in CDP 7.1.7 SP1 but I suspect it is in all versions

      Regards Matthias

      Attachments

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              mwies Matthias Wies
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: