Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2671

Change hash number for range partitioning

    XMLWordPrintableJSON

    Details

    • Flags:
      Important

      Description

      For our usage, the kudu schema design isn't flexible enough.

      We create our table for day range such as dt='20181112' as hive table.

      But our data size change a lot every day, for one day it will be 50G, but for some other day it will be 500G. For this case, it be hard to set the hash schema. If too big, for most case, it will be too wasteful. But too small, there is a performance problem in the case of a large amount of data.

       

      So we suggest a solution we can change the hash number by the history data of a table.

      for example

      1. we create schema with one estimated value.
      2. we collect the data size by day range
      3. we create new day range partition by our collected day size.

      We use this feature for half a year, and it work well. We hope this feature will be useful for the community. Maybe the solution isn't so complete. Please help us make it better.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mreddy Mahesh Reddy
                Reporter:
                yangz yangz
              • Votes:
                3 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated: