Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16865 Handle replication bootstrap of large databases
  3. HIVE-16896

move replication load related work in semantic analysis phase to execution phase using a task

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      we want to not create too many tasks in memory in the analysis phase while loading data. Currently we load all the files in the bootstrap dump location as FileStatus[] and then iterate over it to load objects, we should rather move to

      org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus>	listFiles(Path f, boolean recursive)
      

      which would internally batch and return values.

      additionally since we cant hand off partial tasks from analysis pahse => execution phase, we are going to move the whole repl load functionality to execution phase so we can better control creation/execution of tasks (not related to hive Task, we may get rid of ReplCopyTask)

      Additional consideration to take into account at the end of this jira is to see if we want to specifically do a multi threaded load of bootstrap dump.

        Attachments

        1. HIVE-16896.3.patch
          140 kB
          anishek
        2. HIVE-16896.2.patch
          120 kB
          anishek
        3. HIVE-16896.1.patch
          119 kB
          anishek

          Issue Links

            Activity

              People

              • Assignee:
                anishek anishek
                Reporter:
                anishek anishek
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: