Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-2906

Optimization of AvroUtil.toAvroIdentifier

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.7
    • Component/s: None
    • Flags:
      Patch

      Description

      Hi all

      Our distributed profiler indicated some inefficiencies in the AvroUtil.toAvroIdentifier method, more specifically, the use of Regex patterns. This can be directly observed from the FlameGraph generated by this profiler (https://jhermans.web.cern.ch/jhermans/sqoop_avro_flamegraph.svg). We implemented an optimization, and compared this with the original method. On our testing machine, the optimization by itself is about 500% (on average) more efficient compared to the original implementation. We have yet to test how this optimization will influence the performance of user jobs.

      Any suggestions or remarks are welcome.

      Kind regards,

      Joeri

      https://github.com/apache/sqoop/pull/18

      Writeup:

      https://db-blog.web.cern.ch/blog/joeri-hermans/2016-04-hadoop-performance-troubleshooting-stack-tracing-introduction

        Attachments

        1. diff.txt
          1 kB
          Joeri Hermans

          Issue Links

            Activity

              People

              • Assignee:
                joeri.hermans Joeri Hermans
                Reporter:
                joeri.hermans Joeri Hermans
              • Votes:
                2 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: