1. Pig
  2. PIG-2632

Create a SchemaTuple which generates efficient Tuples via code gen


    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:


      This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.

      Need to clean up the code and add tests.

      Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.

      1. PIG-2632-0.patch
        55 kB
        Jonathan Coveney
      2. PIG-2632-1.patch
        69 kB
        Jonathan Coveney
      3. PIG-2632-3.patch
        100 kB
        Jonathan Coveney
      4. schematuple benchmarking.pptx
        104 kB
        Jonathan Coveney
      5. PIG-2632-4.patch
        219 kB
        Jonathan Coveney
      6. PIG-2632-5.patch
        222 kB
        Jonathan Coveney
      7. PIG-2632-6.patch
        232 kB
        Jonathan Coveney
      8. PIG-2632-7.patch
        236 kB
        Jonathan Coveney
      9. PIG-2632-8.patch
        242 kB
        Jonathan Coveney
      10. schematuple benchmarking.pdf
        70 kB
        Jonathan Coveney
      11. PIG-2632-9.patch
        67 kB
        Jonathan Coveney
      12. PIG-2632-9.patch
        284 kB
        Jonathan Coveney
      13. PIG-2632-10.patch
        318 kB
        Jonathan Coveney
      14. PIG-2632-10.patch
        319 kB
        Jonathan Coveney

        Issue Links



            • Assignee:
              Jonathan Coveney
              Jonathan Coveney
            • Votes:
              1 Vote for this issue
              10 Start watching this issue


              • Created: