[SQOOP-2906] Optimization of AvroUtil.toAvroIdentifier - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.7
Component/s: None
Labels:
- avro
- hadoop
- optimization

Flags:

Patch

Description

Hi all

Our distributed profiler indicated some inefficiencies in the AvroUtil.toAvroIdentifier method, more specifically, the use of Regex patterns. This can be directly observed from the FlameGraph generated by this profiler (https://jhermans.web.cern.ch/jhermans/sqoop_avro_flamegraph.svg). We implemented an optimization, and compared this with the original method. On our testing machine, the optimization by itself is about 500% (on average) more efficient compared to the original implementation. We have yet to test how this optimization will influence the performance of user jobs.

Any suggestions or remarks are welcome.

Kind regards,

Joeri

https://github.com/apache/sqoop/pull/18

Writeup:

https://db-blog.web.cern.ch/blog/joeri-hermans/2016-04-hadoop-performance-troubleshooting-stack-tracing-introduction

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

diff.txt
22/Apr/16 07:51
1 kB
Joeri Hermans

Issue Links

links to

Review board

Activity

People

Assignee:: Joeri Hermans

Reporter:: Joeri Hermans

Votes:: 2 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 11/Apr/16 14:11

Updated:: 19/May/16 15:31

Resolved:: 19/May/16 14:31