[PIG-3015] Rewrite of AvroStorage - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.12.0
Component/s: None
Labels:
None

Patch Info:

Patch Available

Description

The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.)

I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni (as TrevniStorage).

I'm opening this ticket to facilitate discussion while I figure out the best way to contribute the changes back to Apache.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

bad.avro
08/Jan/13 17:01
2.86 MB
Cheolsoo Park
good.avro
08/Jan/13 17:01
2.86 MB
Cheolsoo Park
PIG-3015-10.patch
05/Mar/13 19:59
160 kB
Cheolsoo Park
PIG-3015-11.patch
07/Mar/13 23:58
159 kB
Cheolsoo Park
PIG-3015-12.patch
17/May/13 21:36
16 kB
Joseph Adler
PIG-3015-2.patch
22/Dec/12 21:54
153 kB
Cheolsoo Park
PIG-3015-20May2013.diff
20/May/13 16:40
179 kB
Joseph Adler
PIG-3015-22June2013.diff
23/Jun/13 04:14
182 kB
Cheolsoo Park
PIG-3015-3.patch
23/Dec/12 07:49
154 kB
Cheolsoo Park
PIG-3015-4.patch
24/Dec/12 01:04
163 kB
Cheolsoo Park
PIG-3015-5.patch
04/Jan/13 19:20
153 kB
Joseph Adler
PIG-3015-6.patch
24/Jan/13 21:04
160 kB
Joseph Adler
PIG-3015-7.patch
27/Jan/13 22:32
160 kB
Cheolsoo Park
PIG-3015-9.patch
19/Feb/13 18:22
154 kB
Joseph Adler
PIG-3015-doc.patch
17/Feb/13 04:15
13 kB
Cheolsoo Park
PIG-3015-doc-2.patch
08/Mar/13 02:01
13 kB
Cheolsoo Park
Test.java
08/Jan/13 17:01
2 kB
Cheolsoo Park
TestInput.java
08/Jan/13 17:01
1 kB
Cheolsoo Park
with_dates.pig
05/Mar/13 22:33
0.4 kB
Joseph Adler

Issue Links

is blocked by

PIG-2266 bug with input file joining optimization in Pig

Closed

is related to

PIG-3111 ToAvro to convert any Pig record to an Avro bytearray

Open

AVRO-806 add a column-major file format

Closed

AVRO-1241 improve trevni performance on string deserialization

Closed

HIVE-3585 Integrate Trevni as another columnar oriented file format

Patch Available

relates to

AVRO-1235 Avro does not handle corrupt records

Open

PIG-3059 Global configurable minimum 'bad record' thresholds

Open

PIG-3111 ToAvro to convert any Pig record to an Avro bytearray

Open

AVRO-1241 improve trevni performance on string deserialization

Closed

links to

Review board

(4 relates to, 1 links to)

Activity

People

Assignee:: Joseph Adler

Reporter:: Joseph Adler

Votes:: 6 Vote for this issue

Watchers:: 31 Start watching this issue

Dates

Created:: 29/Oct/12 21:14

Updated:: 14/Oct/13 16:45

Resolved:: 28/Jun/13 22:58