Pig
  1. Pig
  2. PIG-2582

Store size in bytes (not mbytes) in ResourceStatistics

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available
    • Release Note:
      Committed, thanks Prashant!

      Description

      In ResourceStatistics.java we see mBytes is public, and has a public getter/setter.

      47	    public Long mBytes; // size in megabytes
      
      196	    public Long getmBytes() {
      197	        return mBytes;
      198	    }
      199	    public ResourceStatistics setmBytes(Long mBytes) {
      200	        this.mBytes = mBytes;
      201	        return this;
      202	    }
      

      Typically sizes are stored as bytes, potentially having convenience functions to return with different units.

      If mBytes can be marked private without causing woes it might be worth storing size as bytes instead.

      1. PIG-2582_2.patch
        7 kB
        Prashant Kommireddi
      2. PIG-2582_1.patch
        8 kB
        Prashant Kommireddi
      3. PIG-2582.patch
        6 kB
        Prashant Kommireddi

        Activity

        Hide
        Prashant Kommireddi added a comment -

        Thanks Bill.

        Show
        Prashant Kommireddi added a comment - Thanks Bill.
        Hide
        Bill Graham added a comment -

        +1 LGTM. I can commit after the weekend unless anyone else has comments.

        Show
        Bill Graham added a comment - +1 LGTM. I can commit after the weekend unless anyone else has comments.
        Hide
        Prashant Kommireddi added a comment -

        Thanks for keeping an eye out on this, Prasanth!

        Show
        Prashant Kommireddi added a comment - Thanks for keeping an eye out on this, Prasanth!
        Hide
        Prasanth J added a comment -

        Once this is committed.. i will fix PIG-2831 to reflect the change..

        Show
        Prasanth J added a comment - Once this is committed.. i will fix PIG-2831 to reflect the change..
        Hide
        Prashant Kommireddi added a comment -

        Making member variables private, and eliminating mBytes. Reviewers, please take a look.

        Show
        Prashant Kommireddi added a comment - Making member variables private, and eliminating mBytes. Reviewers, please take a look.
        Hide
        Prashant Kommireddi added a comment -

        If we are ok with abusing this setter for now, we need to definitely open a JIRA to completely get rid of the deprecated method in 0.12.

        Show
        Prashant Kommireddi added a comment - If we are ok with abusing this setter for now, we need to definitely open a JIRA to completely get rid of the deprecated method in 0.12.
        Hide
        Jonathan Coveney added a comment -

        Couldn't it just set realBytes = mBytes * 1024 * 1024 ? There would be no loss of precision. And if they happened to do getMBytes we just divided (in this case, we can in fact even just do shifts).

        Show
        Jonathan Coveney added a comment - Couldn't it just set realBytes = mBytes * 1024 * 1024 ? There would be no loss of precision. And if they happened to do getMBytes we just divided (in this case, we can in fact even just do shifts).
        Hide
        Prashant Kommireddi added a comment -

        What would we have the deprecated mBytes setter set the value on?

        Show
        Prashant Kommireddi added a comment - What would we have the deprecated mBytes setter set the value on?
        Hide
        Jonathan Coveney added a comment -

        As long as the instance variables are private, then how come we can't get rid of the mBytes variable?

        Show
        Jonathan Coveney added a comment - As long as the instance variables are private, then how come we can't get rid of the mBytes variable?
        Hide
        Prashant Kommireddi added a comment -

        Making instance variables private. The setter on mBytes is being made deprecated, however getter can be thought of as a convenience method going forward and the implementation changed once we get rid of "mBytes" variable in the next iteration. We cannot safely get rid of mBytes until the setter is completely taken out of this class.

        Show
        Prashant Kommireddi added a comment - Making instance variables private. The setter on mBytes is being made deprecated, however getter can be thought of as a convenience method going forward and the implementation changed once we get rid of "mBytes" variable in the next iteration. We cannot safely get rid of mBytes until the setter is completely taken out of this class.
        Hide
        Prashant Kommireddi added a comment -

        Also, there is a common theme of returning the object "return this;" on all setters. I don't think this should exist, but we should probably tackle that in the next release to make sure we are fine with the existing changes at first.

        Show
        Prashant Kommireddi added a comment - Also, there is a common theme of returning the object "return this;" on all setters. I don't think this should exist, but we should probably tackle that in the next release to make sure we are fine with the existing changes at first.
        Hide
        Jonathan Coveney added a comment -

        Given it is unstable, I say just go ahead with the plan to make it private, and have the deprecated getter/setter

        Show
        Jonathan Coveney added a comment - Given it is unstable, I say just go ahead with the plan to make it private, and have the deprecated getter/setter
        Hide
        Rohini Palaniswamy added a comment -

        Agree with Bill. It would be good to make them private. Also the getters and setters were already available and the interface is marked unstable. We can wait another week to see if anyone disagrees, else we can go ahead with a patch that makes the member variables private but retains the getter/setter for mBytes marked deprecated.

        setmBytes also needs to set the bytes. If we are making the variables private we can remove mbytes variable altogether and only retain bytes.

        Show
        Rohini Palaniswamy added a comment - Agree with Bill. It would be good to make them private. Also the getters and setters were already available and the interface is marked unstable. We can wait another week to see if anyone disagrees, else we can go ahead with a patch that makes the member variables private but retains the getter/setter for mBytes marked deprecated. setmBytes also needs to set the bytes. If we are making the variables private we can remove mbytes variable altogether and only retain bytes.
        Hide
        Bill Graham added a comment -

        I didn't notice the Unstable annotation. I'm ok changing scope if others agree.

        Show
        Bill Graham added a comment - I didn't notice the Unstable annotation. I'm ok changing scope if others agree.
        Hide
        Bill Graham added a comment -

        As much as we'd love to make these members private, we should resist the urge and keep them public for backward compatibility.

        Show
        Bill Graham added a comment - As much as we'd love to make these members private, we should resist the urge and keep them public for backward compatibility.
        Hide
        Prashant Kommireddi added a comment -

        Hey Prasanth, the reason is that we don't want to break backward compatibility in case these member variables are accessed directly and not through getters. There are no such references from within the Pig project, but I'm being wary of any users who use this from outside of it.

        The interface stability is marked Unstable on this, so I am ok if all decide its cool to change the scope of these variables

        Show
        Prashant Kommireddi added a comment - Hey Prasanth, the reason is that we don't want to break backward compatibility in case these member variables are accessed directly and not through getters. There are no such references from within the Pig project, but I'm being wary of any users who use this from outside of it. The interface stability is marked Unstable on this, so I am ok if all decide its cool to change the scope of these variables
        Hide
        Prasanth J added a comment -

        Hi Prashant

        Looks like some of the class member variables are still public. Is there any reason to leave it public?

        Show
        Prasanth J added a comment - Hi Prashant Looks like some of the class member variables are still public. Is there any reason to leave it public?
        Hide
        Jonathan Coveney added a comment -

        Thanks for handling this, ya'll. I'll +1 and commit, pending this making Travis happy.

        Show
        Jonathan Coveney added a comment - Thanks for handling this, ya'll. I'll +1 and commit, pending this making Travis happy.
        Hide
        Prashant Kommireddi added a comment -

        Hi Travis Crawford, can you please review the patch when you get a chance?

        Show
        Prashant Kommireddi added a comment - Hi Travis Crawford , can you please review the patch when you get a chance?
        Hide
        Prashant Kommireddi added a comment -

        Sure.

        "So as per the implementation logic, only if mBytes!=null and numRecords!=null it will return the size in MB else it will return whatever it contains which in my case works fine." - this works but might be confusing/inconsistent though. I will wait for Travis/others to comment on this patch and let's stay in touch on how this might affect PIG-2831.

        Show
        Prashant Kommireddi added a comment - Sure. "So as per the implementation logic, only if mBytes!=null and numRecords!=null it will return the size in MB else it will return whatever it contains which in my case works fine." - this works but might be confusing/inconsistent though. I will wait for Travis/others to comment on this patch and let's stay in touch on how this might affect PIG-2831 .
        Hide
        Prasanth J added a comment -

        I am not using setmBytes() or mBytes at all. So as per the implementation logic, only if mBytes!=null and numRecords!=null it will return the size in MB else it will return whatever it contains which in my case works fine. I also din't see any places using this. Please let me know if there is going to be any changes to the APIs, so that I will modify the patch accordingly.

        Show
        Prasanth J added a comment - I am not using setmBytes() or mBytes at all. So as per the implementation logic, only if mBytes!=null and numRecords!=null it will return the size in MB else it will return whatever it contains which in my case works fine. I also din't see any places using this. Please let me know if there is going to be any changes to the APIs, so that I will modify the patch accordingly.
        Hide
        Prashant Kommireddi added a comment -

        Hi Prasanth, I looked at PIG-2831. The current impl of getAvgRecordSize in trunk returns size in MB. That is not what you want?
        Also, it might be better to access the values from getters instead of directly accessing them. That will allow us to clean-up the class further in future. Those members should really be private.

        Show
        Prashant Kommireddi added a comment - Hi Prasanth, I looked at PIG-2831 . The current impl of getAvgRecordSize in trunk returns size in MB. That is not what you want? Also, it might be better to access the values from getters instead of directly accessing them. That will allow us to clean-up the class further in future. Those members should really be private.
        Hide
        Prasanth J added a comment -

        I am using ResourceStatistics to store intermediate stats in PIG-2831. I am not using getmBytes() method but I use getAvgRecordSize() for storing and returning average record size in bytes.

        Show
        Prasanth J added a comment - I am using ResourceStatistics to store intermediate stats in PIG-2831 . I am not using getmBytes() method but I use getAvgRecordSize() for storing and returning average record size in bytes.
        Hide
        Prashant Kommireddi added a comment -

        Should methods like getAvgRecordSize() be returning size in bytes? Its not called from within the project, but not sure if such a change would be acceptable to users.

        Show
        Prashant Kommireddi added a comment - Should methods like getAvgRecordSize() be returning size in bytes? Its not called from within the project, but not sure if such a change would be acceptable to users.
        Hide
        Travis Crawford added a comment -

        Sounds good!

        Show
        Travis Crawford added a comment - Sounds good!
        Hide
        Prashant Kommireddi added a comment -

        I agree on storing size in bytes, just wanted to make sure I understand all the reasons you had in mind. It wouldn't be a huge change to make it happen, but changing the scope might be tricky if someone is using it outside of the Pig project. What do you think about marking the setter "setmBytes(Long)" deprecated and creating a new setter for bytes? To start with, we can atleast have Pig refer to byte-based methods.

        Show
        Prashant Kommireddi added a comment - I agree on storing size in bytes, just wanted to make sure I understand all the reasons you had in mind. It wouldn't be a huge change to make it happen, but changing the scope might be tricky if someone is using it outside of the Pig project. What do you think about marking the setter "setmBytes(Long)" deprecated and creating a new setter for bytes? To start with, we can atleast have Pig refer to byte-based methods.
        Hide
        Travis Crawford added a comment -

        Hey Prashant Kommireddi, this came up while working on PIG-2573. Basically it felt a bit janky to round the size to MB rather than just keep the size in bytes. Feel free to close if the consensus is to keep it as-is.

        Show
        Travis Crawford added a comment - Hey Prashant Kommireddi , this came up while working on PIG-2573 . Basically it felt a bit janky to round the size to MB rather than just keep the size in bytes. Feel free to close if the consensus is to keep it as-is.
        Hide
        Prashant Kommireddi added a comment -

        Hi Travis, trying to understand the goal here. Seems like ResourceStatistics has a method getSizeInBytes(), what would be the advantage of storing size in bytes and instead have getmBytes() do the reverse calculation? Is it for clarity and conforming with the norm?

        Show
        Prashant Kommireddi added a comment - Hi Travis, trying to understand the goal here. Seems like ResourceStatistics has a method getSizeInBytes(), what would be the advantage of storing size in bytes and instead have getmBytes() do the reverse calculation? Is it for clarity and conforming with the norm?

          People

          • Assignee:
            Prashant Kommireddi
            Reporter:
            Travis Crawford
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development