Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17124

PlanUtils: Rand() is not a failure-tolerant distribution column

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.3.0, 3.0.0
    • None
    • Query Planning
    • None

    Description

      else {
            // numPartitionFields = -1 means random partitioning
            partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand"));
          }
      

      This causes known data corruption during failure tolerance operations.

      There is a failure tolerant distribution function inside ReduceSinkOperator, which kicks in automatically when using no partition columns

          if (partitionEval.length == 0) {
            // If no partition cols, just distribute the data uniformly
            // to provide better load balance. If the requirement is to have a single reducer, we should
            // set the number of reducers to 1. Use a constant seed to make the code deterministic.
            if (random == null) {
              random = new Random(12345);
            }
            keyHashCode = random.nextInt();
          }
      

      Attachments

        1. HIVE-17124.1.patch
          0.7 kB
          Gopal Vijayaraghavan

        Issue Links

          Activity

            People

              gopalv Gopal Vijayaraghavan
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: