Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2814

improve the kudu-mapreduce handling of KuduTableOutputFormat

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.9.0
    • Fix Version/s: NA
    • Component/s: java
    • Labels:
      None

      Description

      https://github.com/apache/kudu/blob/master/java/kudu-mapreduce/src/main/java/org/apache/kudu/mapreduce/KuduTableOutputFormat.java#L134-L135

       

      • MULTITON only allows one instance per thread, a second Unit Test fails (That seems like it would also cause problems when MR's JVM reuse feature is enabled)
      • whole point is so that we can have a static `KuduTableOutputFormat.getKuduTable` to have this `getTableFromContext` method in the KuduTableMapReduceUtil
      • `getTableFromContext` is used by two sample mapper implementations in kudu-client-tools, the  importcsvmapper and importparquetmapper:
        • Insert insert = this.table.newInsert();
          PartialRow row = insert.getRow();
      • the whole point of having an OutputFormat is to not have to worry how your data gets written when you work on the mapper/reducer
      • You write to the OutputFormat not get it’s internals and write yourself

       

      To Do: remove those static methods and think about how importparquetmapper and csvmapper can properly use a KuduTableOutputFormat without having to rely on its internals

      remove the Multiton if it's not needed any more.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                cvaliente Clemens Valiente
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: