Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2632

Create a SchemaTuple which generates efficient Tuples via code gen

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None

      Description

      This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.

      Need to clean up the code and add tests.

      Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.

        Attachments

        1. PIG-2632-0.patch
          55 kB
          Jonathan Coveney
        2. PIG-2632-1.patch
          69 kB
          Jonathan Coveney
        3. PIG-2632-10.patch
          319 kB
          Jonathan Coveney
        4. PIG-2632-10.patch
          318 kB
          Jonathan Coveney
        5. PIG-2632-3.patch
          100 kB
          Jonathan Coveney
        6. PIG-2632-4.patch
          219 kB
          Jonathan Coveney
        7. PIG-2632-5.patch
          222 kB
          Jonathan Coveney
        8. PIG-2632-6.patch
          232 kB
          Jonathan Coveney
        9. PIG-2632-7.patch
          236 kB
          Jonathan Coveney
        10. PIG-2632-8.patch
          242 kB
          Jonathan Coveney
        11. PIG-2632-9.patch
          284 kB
          Jonathan Coveney
        12. PIG-2632-9.patch
          67 kB
          Jonathan Coveney
        13. schematuple benchmarking.pdf
          70 kB
          Jonathan Coveney
        14. schematuple benchmarking.pptx
          104 kB
          Jonathan Coveney

          Issue Links

            Activity

              People

              • Assignee:
                jcoveney Jonathan Coveney
                Reporter:
                jcoveney Jonathan Coveney
              • Votes:
                1 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: