Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12918

Do not allow non-numeric values in Hive table stats during an alter table

Agile BoardAttach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • ghx-label-4

    Description

      Hive table properties are string in their nature, however some of them have special meaning and should have numeric values, like the "totalSize", "numRows", "rawDataSize".
      Impala currently allows these to be set to non-numeric values (including empty string).
      From certain applications (like from Spark) we get quite obscure "NumberFormatException" errors while trying to access such broken tables. (see SPARK-47444)

      Impala should also validate "alter table" statements and not allow non-numeric values in the "totalSize", "numRows", "rawDataSize" table properties.

      For example a query which may break the table (after it can't be read from Spark):

      [impalacoordinator:21000] default> alter table t1p set tblproperties('numRows'='', 'STATS_GENERATED_VIA_STATS_TASK'='true');
      

      Note: beeline/Hive validates alter table statements with the "numRows" and "rawDataSize", the "totalSize" still needs validation there too.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            mszurap Miklos Szurap

            Dates

              Created:
              Updated:

              Slack

                Issue deployment