Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1635

ability to check partition size for dynamic partiions

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      With dynamic partitions, it becomes very easy to create partitions.

      We have seen some scenarios, where a lot of partitions/files get created due to some corrupt data (1 corrupt row
      can end up creating a partition and a lot of files (number of mappers, if merge is false)).

      This puts a lot of load on the cluster, and is a debugging nightmare.

      It would be good to have a configuration parameter, for the minimum number of rows for a partition.
      If the number of rows is less than the threshold, the partition need not be created. The default value
      of this parameter can be zero for backward compatibility

        Attachments

          Activity

            People

            • Assignee:
              nzhang Ning Zhang
              Reporter:
              namit Namit Jain
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated: