Pig
  1. Pig
  2. PIG-2764

Add a biginteger and bigdecimal type to pig

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None

      Description

      I think it would be useful for applications where precision is more important than speed to have the option of using java's bigdecimal and biginteger types natively.

      1. PIG-2764-0.patch
        172 kB
        Jonathan Coveney
      2. PIG-2764-1.patch
        169 kB
        Jonathan Coveney
      3. fixedpoint.patch
        45 kB
        Alan Gates
      4. PIG-2764-2_nows.patch
        176 kB
        Jonathan Coveney
      5. PIG-2764-2.patch
        369 kB
        Jonathan Coveney
      6. PIG-2764-3.patch
        376 kB
        Jonathan Coveney
      7. PIG-2764-4.patch
        405 kB
        Jonathan Coveney
      8. PIG-2764-5.patch
        404 kB
        Jonathan Coveney
      9. PIG-2764-6.patch
        359 kB
        Jonathan Coveney
      10. PIG-2764-hotfix-0.patch
        1 kB
        Jonathan Coveney

        Issue Links

          Activity

          Hide
          Rohini Palaniswamy added a comment -

          Actually I am going to backport this feature to pig-0.11.

          I am assuming you are doing this for a internal version of 0.11 that you maintain in git. Since this is a feature, it cannot be ported to apache pig 0.11 branch. Only critical and major bugs can go into the branch.

          Show
          Rohini Palaniswamy added a comment - Actually I am going to backport this feature to pig-0.11. I am assuming you are doing this for a internal version of 0.11 that you maintain in git. Since this is a feature, it cannot be ported to apache pig 0.11 branch. Only critical and major bugs can go into the branch.
          Hide
          fang fang chen added a comment -

          Hi Cheolsoo,

          Many thanks for your information~
          Actually I am going to backport this feature to pig-0.11.

          Thanks

          Show
          fang fang chen added a comment - Hi Cheolsoo, Many thanks for your information~ Actually I am going to backport this feature to pig-0.11. Thanks
          Hide
          Cheolsoo Park added a comment -

          Hi Fangfang,

          1. This is a new feature, so there is no fixed issue.
          2. The patch is committed to trunk (0.12-SNAPSHOT).
          3. I recommend you get the diff on git (or svn) log since the attached file is not exactly the same as what's committed:
            commit 95d8342c020ccbbc7eedced55fc639a5f55ef809
            Author: Jonathan Coveney <jcoveney@apache.org>
            Date:   Mon Jan 28 01:28:52 2013 +0000
            
                Fix bad, unused import brought in with PIG-2764 (jcoveney)
                
                git-svn-id: https://svn.apache.org/repos/asf/pig/trunk@1439220 13f79535-47bb-0310-9956-ffa450edef68
            
            commit 20c923fa8f9861408c43c14f4a742a0f15e02d95
            Author: Jonathan Coveney <jcoveney@apache.org>
            Date:   Fri Jan 25 19:57:56 2013 +0000
            
                PIG-2764: Add a biginteger and bigdecimal type to pig (jcoveney)
                
                git-svn-id: https://svn.apache.org/repos/asf/pig/trunk@1438674 13f79535-47bb-0310-9956-ffa450edef68
            

          In addition, if you're going to backport this feature to a previous version of Pig, you should backport the following patches too:

          • PIG-3156: TestSchemaTuple fails in trunk
          • PIG-3155: TestTypeCheckingValidatorNewLP.testSortWithInnerPlan3 fails in trunk
          • PIG-3154: TestPackage.testOperator fails in trunk
          • PIG-3218: Add support for biginteger/bigdecimal type in Groovy UDF
          Show
          Cheolsoo Park added a comment - Hi Fangfang, This is a new feature, so there is no fixed issue. The patch is committed to trunk (0.12-SNAPSHOT). I recommend you get the diff on git (or svn) log since the attached file is not exactly the same as what's committed: commit 95d8342c020ccbbc7eedced55fc639a5f55ef809 Author: Jonathan Coveney <jcoveney@apache.org> Date: Mon Jan 28 01:28:52 2013 +0000 Fix bad, unused import brought in with PIG-2764 (jcoveney) git-svn-id: https: //svn.apache.org/repos/asf/pig/trunk@1439220 13f79535-47bb-0310-9956-ffa450edef68 commit 20c923fa8f9861408c43c14f4a742a0f15e02d95 Author: Jonathan Coveney <jcoveney@apache.org> Date: Fri Jan 25 19:57:56 2013 +0000 PIG-2764: Add a biginteger and bigdecimal type to pig (jcoveney) git-svn-id: https: //svn.apache.org/repos/asf/pig/trunk@1438674 13f79535-47bb-0310-9956-ffa450edef68 In addition, if you're going to backport this feature to a previous version of Pig, you should backport the following patches too: PIG-3156 : TestSchemaTuple fails in trunk PIG-3155 : TestTypeCheckingValidatorNewLP.testSortWithInnerPlan3 fails in trunk PIG-3154 : TestPackage.testOperator fails in trunk PIG-3218 : Add support for biginteger/bigdecimal type in Groovy UDF
          Hide
          fang fang chen added a comment -

          Hi,

          I have some questions, please help to answer:
          1. which version is this issue found in?
          2. And which version pig is the patched based on?
          3. which file in attachment is the final available patch?

          Thanks

          Show
          fang fang chen added a comment - Hi, I have some questions, please help to answer: 1. which version is this issue found in? 2. And which version pig is the patched based on? 3. which file in attachment is the final available patch? Thanks
          Hide
          Cheolsoo Park added a comment -

          +1 for PIG-2764-hotfix-0.patch. Thanks for the quick fix!

          Show
          Cheolsoo Park added a comment - +1 for PIG-2764 -hotfix-0.patch. Thanks for the quick fix!
          Hide
          Jonathan Coveney added a comment -

          Attaching the fix. I have no clue how it got in there, but it was only referenced in a javadoc, so it was easy to remove. I'm going to proactively commit this (given it is a hot fix to trunk, and so minor), but please take a look to make sure there are no remaining issues.

          Thanks for bringing this to my attention, Cheolsoo.

          Show
          Jonathan Coveney added a comment - Attaching the fix. I have no clue how it got in there, but it was only referenced in a javadoc, so it was easy to remove. I'm going to proactively commit this (given it is a hot fix to trunk, and so minor), but please take a look to make sure there are no remaining issues. Thanks for bringing this to my attention, Cheolsoo.
          Hide
          Cheolsoo Park added a comment -

          Jonathan Coveney, trunk doesn't compile. Would you mind fixing it?

              [javac] Compiling 799 source files to /home/cheolsoo/workspace/pig-git/build/classes
              [javac] /home/cheolsoo/workspace/pig-git/src/org/apache/pig/newplan/logical/visitor/TypeCheckingExpVisitor.java:78: package sun.tools.java does not exist
              [javac] import sun.tools.java.BinaryExceptionHandler;
              [javac]                      ^
          

          Looks like your IDE auto-added some import statement.

          Show
          Cheolsoo Park added a comment - Jonathan Coveney , trunk doesn't compile. Would you mind fixing it? [javac] Compiling 799 source files to /home/cheolsoo/workspace/pig-git/build/classes [javac] /home/cheolsoo/workspace/pig-git/src/org/apache/pig/newplan/logical/visitor/TypeCheckingExpVisitor.java:78: package sun.tools.java does not exist [javac] import sun.tools.java.BinaryExceptionHandler; [javac] ^ Looks like your IDE auto-added some import statement.
          Hide
          Russell Jurney added a comment -

          Reviewed, albeit late. There was a typo, and some lint questions.

          Show
          Russell Jurney added a comment - Reviewed, albeit late. There was a typo, and some lint questions.
          Hide
          Jonathan Coveney added a comment -

          It's in! Thank you all for the eyes.

          Show
          Jonathan Coveney added a comment - It's in! Thank you all for the eyes.
          Hide
          Jonathan Coveney added a comment -

          This is the patch I applied to trunk (there were some very minor merge conflicts, is all)

          Show
          Jonathan Coveney added a comment - This is the patch I applied to trunk (there were some very minor merge conflicts, is all)
          Hide
          Alan Gates added a comment -

          Sorry, didn't realize you were waiting for me. +1, since you addressed all my comments. By the way, thanks for doing this. I think it's huge for Pig.

          Show
          Alan Gates added a comment - Sorry, didn't realize you were waiting for me. +1, since you addressed all my comments. By the way, thanks for doing this. I think it's huge for Pig.
          Hide
          Jonathan Coveney added a comment -

          Alan, would love your final eyes on this.

          Show
          Jonathan Coveney added a comment - Alan, would love your final eyes on this.
          Hide
          Jonathan Coveney added a comment -

          Updated RB, and attaching here.

          Show
          Jonathan Coveney added a comment - Updated RB, and attaching here.
          Hide
          Jonathan Coveney added a comment -

          PS thanks to Alan, Matthias, and Cheolsoo for the eyes. Definitely let me know if you have anything you'd like to me look at. This was a big patch.

          Show
          Jonathan Coveney added a comment - PS thanks to Alan, Matthias, and Cheolsoo for the eyes. Definitely let me know if you have anything you'd like to me look at. This was a big patch.
          Hide
          Jonathan Coveney added a comment -

          Good comments all around. I just responded and made the tickets, and will update the latest patch shortly once test-commit runs (changes were fairly minor). The comments are mainly discussions of how TextLoasder works etc.

          Show
          Jonathan Coveney added a comment - Good comments all around. I just responded and made the tickets, and will update the latest patch shortly once test-commit runs (changes were fairly minor). The comments are mainly discussions of how TextLoasder works etc.
          Hide
          Alan Gates added a comment -

          Posted my comments on RB. Except for two function signatures that I think are wrong I'm +1. As noted on RB we need a separate JIRA to track e2e tests. We also need a JIRA to update the docs (forgot to put that on RB). Both should be blockers for 0.12. You can assigned the e2e tests one to me.

          Show
          Alan Gates added a comment - Posted my comments on RB. Except for two function signatures that I think are wrong I'm +1. As noted on RB we need a separate JIRA to track e2e tests. We also need a JIRA to update the docs (forgot to put that on RB). Both should be blockers for 0.12. You can assigned the e2e tests one to me.
          Hide
          Jonathan Coveney added a comment -

          I updated this based on RB comments, will update RB shortly.

          Show
          Jonathan Coveney added a comment - I updated this based on RB comments, will update RB shortly.
          Hide
          Jonathan Coveney added a comment -

          Pulled master, put patch here and in RB: https://reviews.apache.org/r/9012/

          Show
          Jonathan Coveney added a comment - Pulled master, put patch here and in RB: https://reviews.apache.org/r/9012/
          Hide
          Alan Gates added a comment -

          I can review it. I'll try to get to it tomorrow (1/18).

          Show
          Alan Gates added a comment - I can review it. I'll try to get to it tomorrow (1/18).
          Hide
          Jonathan Coveney added a comment -

          Would love a review. I understand patches like this can be monsters (I can upload to rb), but is anyone interesting in working with me to make sure it is all good?

          Show
          Jonathan Coveney added a comment - Would love a review. I understand patches like this can be monsters (I can upload to rb), but is anyone interesting in working with me to make sure it is all good?
          Hide
          Alan Gates added a comment -

          I'm +1 with going with BigNumber. There's no reason to spin our own implementation for only a small speed up. That only makes sense if you can significantly improve it.

          Show
          Alan Gates added a comment - I'm +1 with going with BigNumber. There's no reason to spin our own implementation for only a small speed up. That only makes sense if you can significantly improve it.
          Hide
          Jonathan Coveney added a comment -

          Updated! And I fixed a couple of errors. Merging in the DateTime patch took a little tlc, but it should be fully functional now. I don't know that I love that the DateTime patch let's you cast ints to DateTime and so on, but I followed that convention.

          I would love some eyes on this. It really needs more tests...I can tackle that eventually, but wouldn't mind some guidance one what is critical to test.

          Show
          Jonathan Coveney added a comment - Updated! And I fixed a couple of errors. Merging in the DateTime patch took a little tlc, but it should be fully functional now. I don't know that I love that the DateTime patch let's you cast ints to DateTime and so on, but I followed that convention. I would love some eyes on this. It really needs more tests...I can tackle that eventually, but wouldn't mind some guidance one what is critical to test.
          Hide
          Olga Natkovich added a comment -

          I agree with using standard type.

          Show
          Olga Natkovich added a comment - I agree with using standard type.
          Hide
          Mathias Herberts added a comment -

          Support for BigDecimal would be nice, and I think there is no need to have a separate BigInteger type, it suffice to be smart in the way casts are done.

          Show
          Mathias Herberts added a comment - Support for BigDecimal would be nice, and I think there is no need to have a separate BigInteger type, it suffice to be smart in the way casts are done.
          Hide
          Jonathan Coveney added a comment -

          I would happy to make a cut against trunk, since this patch is probably out of date. Would love to have people weigh in on criteria for inclusion. IMHO (re: Alan's previous comments) I would rather go with the well known (even if a little slower) BigNumber types...rolling our own will mean we'll continue adding features until we converge on a crappier version of them, imho.

          Show
          Jonathan Coveney added a comment - I would happy to make a cut against trunk, since this patch is probably out of date. Would love to have people weigh in on criteria for inclusion. IMHO (re: Alan's previous comments) I would rather go with the well known (even if a little slower) BigNumber types...rolling our own will mean we'll continue adding features until we converge on a crappier version of them, imho.
          Hide
          Olga Natkovich added a comment -

          I think having support for BigInteger would be very helpful. We have asks within Yahoo for it.

          Show
          Olga Natkovich added a comment - I think having support for BigInteger would be very helpful. We have asks within Yahoo for it.
          Hide
          Jonathan Coveney added a comment -

          I would have no problem tidying this up for inclusion, but all of the comments are pretty vague. Would love some concrete feedback on whether or not this deserves to be in Pig, and how it should be or could be different.

          I think that more broadly speaking it would be huge to make type support a lot easier to add, but that's a separate JIRA.

          Show
          Jonathan Coveney added a comment - I would have no problem tidying this up for inclusion, but all of the comments are pretty vague. Would love some concrete feedback on whether or not this deserves to be in Pig, and how it should be or could be different. I think that more broadly speaking it would be huge to make type support a lot easier to add, but that's a separate JIRA.
          Hide
          Olga Natkovich added a comment -

          Is anybody working on this or planning to in the near future?

          Show
          Olga Natkovich added a comment - Is anybody working on this or planning to in the near future?
          Hide
          Patrick Salami added a comment -

          +1

          Show
          Patrick Salami added a comment - +1
          Hide
          Jeremy Hanna added a comment -

          It would be nice to have an expanded set of basic java types so that when people use alternate data stores, they can more easily convert data types to native pig types. For example, Cassandra has an BigInteger based data type validator and it would streamline conversion to pig types to have something native.

          Show
          Jeremy Hanna added a comment - It would be nice to have an expanded set of basic java types so that when people use alternate data stores, they can more easily convert data types to native pig types. For example, Cassandra has an BigInteger based data type validator and it would streamline conversion to pig types to have something native.
          Hide
          Alan Gates added a comment -

          I was hoping we could build something faster since BigDecimal is dreadfully slow. But at the moment what I put in there only beats it by about 20%, so not worth it. We can do it with BigDecimal for now, but I may keep playing with tuning this stuff to see if I can get it faster.

          Show
          Alan Gates added a comment - I was hoping we could build something faster since BigDecimal is dreadfully slow. But at the moment what I put in there only beats it by about 20%, so not worth it. We can do it with BigDecimal for now, but I may keep playing with tuning this stuff to see if I can get it faster.
          Hide
          Jonathan Coveney added a comment -

          Alan, I'm curious why you think managing our own fixed point implementation is better than just leveraging BigDecimal and BigInteger, and the arbitrary precision they provide?

          I mean, I can benchmark things to see how radical the performance difference is, but I'm just curious.

          Show
          Jonathan Coveney added a comment - Alan, I'm curious why you think managing our own fixed point implementation is better than just leveraging BigDecimal and BigInteger, and the arbitrary precision they provide? I mean, I can benchmark things to see how radical the performance difference is, but I'm just curious.
          Hide
          Alan Gates added a comment -

          First pass at a class to implement fixed point types.

          Show
          Alan Gates added a comment - First pass at a class to implement fixed point types.
          Hide
          Alan Gates added a comment -

          I don't know of any good libraries. And everything Google turned up looked old and unsupported. I took a quick crack at it (just the fixed point part, not integrating with Pig's data type system). I'll attach it as a separate patch.

          Show
          Alan Gates added a comment - I don't know of any good libraries. And everything Google turned up looked old and unsupported. I took a quick crack at it (just the fixed point part, not integrating with Pig's data type system). I'll attach it as a separate patch.
          Hide
          Jonathan Coveney added a comment -

          Alan: it seemed like something that would be good to support, since I know there are financial users who could potentially leverage something like this.

          Agreed that a fast, fixed point type would be excellent as well. Know any good libraries?

          This was all done rogue because I wanted to know how hard it is to implement a type (answer: tedious and lots of corner cases, but not hard per se), and I thought it would be useful to have.

          Show
          Jonathan Coveney added a comment - Alan: it seemed like something that would be good to support, since I know there are financial users who could potentially leverage something like this. Agreed that a fast, fixed point type would be excellent as well. Know any good libraries? This was all done rogue because I wanted to know how hard it is to implement a type (answer: tedious and lots of corner cases, but not hard per se), and I thought it would be useful to have.
          Hide
          Alan Gates added a comment -

          Do you really need biginteger and bigdecimal, or are you trying to get fixed point processing cheaply? Pig definitely needs a fixed point type.

          Show
          Alan Gates added a comment - Do you really need biginteger and bigdecimal, or are you trying to get fixed point processing cheaply? Pig definitely needs a fixed point type.
          Hide
          Jonathan Coveney added a comment -

          Fixed a couple of small things. Also, put up a review request:
          https://reviews.apache.org/r/5468/

          Based on this and the datetime stuff, we should put together a primer on how to add a type (and I have some refactoring in mind that could make it much better).

          Show
          Jonathan Coveney added a comment - Fixed a couple of small things. Also, put up a review request: https://reviews.apache.org/r/5468/ Based on this and the datetime stuff, we should put together a primer on how to add a type (and I have some refactoring in mind that could make it much better).
          Hide
          Jonathan Coveney added a comment -

          So here is a first pass at this.[1] It works in the command line, but I'm not sure how to really test it. Would love to have eyes on it. I will upload to revieboard later.

          There are a couple of choices that are suboptimal, but aren't really related to the core of the patch, so they can be fixed later (mainly just that I serialize it in a way that can be optimized...but given these aren't for performance but for accuracy, I'm not as worried).

          [1] I needed a short respite from SchemaTuple work And I also wanted to see how difficult adding types is[2]

          [2] difficult

          Show
          Jonathan Coveney added a comment - So here is a first pass at this. [1] It works in the command line, but I'm not sure how to really test it. Would love to have eyes on it. I will upload to revieboard later. There are a couple of choices that are suboptimal, but aren't really related to the core of the patch, so they can be fixed later (mainly just that I serialize it in a way that can be optimized...but given these aren't for performance but for accuracy, I'm not as worried). [1] I needed a short respite from SchemaTuple work And I also wanted to see how difficult adding types is [2] [2] difficult

            People

            • Assignee:
              Jonathan Coveney
              Reporter:
              Jonathan Coveney
            • Votes:
              2 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development