Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38319

Implement Strict Mode to prevent QUERY the entire table

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      We are using Spark Thrift Server as a service to run Spark SQL queries along with Hive metastore as the metadata service.

      We would like to restrict users from querying the entire table and force them to use WHERE clause in the query based on partition column (i.e. SELECT * FROM TABLE WHERE partition_column=<column_value>) and  LIMIT the output of the query when ORDER BY is used.

      This behaviour is similar to what hive exposes as configuration

      hive.strict.checks.no.partition.filter

      hive.strict.checks.orderby.no.limit

      and is described here: https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812

      and

      https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1816

       

      This is a pretty common usecase / feature that we meet in other tools as well,  like in BigQuery for example: https://cloud.google.com/bigquery/docs/querying-partitioned-tables#require_a_partition_filter_in_queries  .

      It would be nice to have this feature implemented in Spark when hive support is enabled in a spark session. 

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            kanoute dimtiris kanoute

            Dates

              Created:
              Updated:

              Slack

                Issue deployment