Commons Math
  1. Commons Math
  2. MATH-742

Please make PolynomialSplineFunction Serializable

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: 2.2
    • Fix Version/s: None
    • Labels:
      None

      Description

      PolynomialSplineFunction is not Serializable, while the very similar PolynomialFunction class in the same package is. All that needs to be done is to add the import:

      import java.io.Serializable;

      and change this:

      public class PolynomialSplineFunction implements DifferentiableUnivariateRealFunction

      to this:

      public class PolynomialSplineFunction implements DifferentiableUnivariateRealFunction, Serializable

      I made exactly that modification to a local copy and it serialized successfully. Before the change, I got serialization errors.

      Thanks.

        Activity

        Hide
        Leandro Ariel Pezzente added a comment -

        Anyone has any thougths about this issue ?

        Show
        Leandro Ariel Pezzente added a comment - Anyone has any thougths about this issue ?
        Hide
        Luc Maisonobe added a comment -

        Sounds good to me.
        Note that once class becomes serializable, it should have a serialization ID. We recently decided to use the date of the source file change as the idea in short ISO-8601 format. So if the change were made today, the ID would be 20120210.

        Show
        Luc Maisonobe added a comment - Sounds good to me. Note that once class becomes serializable, it should have a serialization ID. We recently decided to use the date of the source file change as the idea in short ISO-8601 format. So if the change were made today, the ID would be 20120210.
        Hide
        Leandro Ariel Pezzente added a comment -

        Ok. There , can you check if The changes have being made properly ?

        Show
        Leandro Ariel Pezzente added a comment - Ok. There , can you check if The changes have being made properly ?
        Hide
        Gilles added a comment -

        In principle, I'm against applying the Serializable pseudo-interface.

        In this case, as with most of what is in CM, we deal with an object whose main purpose is to compute something, not to store data. I know that the border between data and computation is not clear-cut, especially in the OO paradigm...

        But the main rationale is to not induce the users to think that it is good practice to store data using the default serialization system. If I'm not mistaken, this feature was meant to ease some of the plumbing needed for distributed applications.
        "Serializable" is certainly not to be taken as a substitute for a stable API aimed at safe "long-term" storage. It is for this reason that I think that the default should be to not implement "Serializable" unless there is a good clear case (i.e. not one that displays a bad practice) that needs it. E.g. there was the convincing case for the exception infrastructure (cf. package "exception.util") to be "Serializable": Whenever some remote application using CM needed to report a failure and could not unless the given exception could be serialized.

        Show
        Gilles added a comment - In principle, I'm against applying the Serializable pseudo-interface. In this case, as with most of what is in CM, we deal with an object whose main purpose is to compute something, not to store data. I know that the border between data and computation is not clear-cut, especially in the OO paradigm... But the main rationale is to not induce the users to think that it is good practice to store data using the default serialization system. If I'm not mistaken, this feature was meant to ease some of the plumbing needed for distributed applications. "Serializable" is certainly not to be taken as a substitute for a stable API aimed at safe "long-term" storage. It is for this reason that I think that the default should be to not implement "Serializable" unless there is a good clear case (i.e. not one that displays a bad practice) that needs it. E.g. there was the convincing case for the exception infrastructure (cf. package "exception.util") to be "Serializable": Whenever some remote application using CM needed to report a failure and could not unless the given exception could be serialized.
        Hide
        Luc Maisonobe added a comment -

        Instead of full file, we often prefer to get patches in the form of diff. It simplifies checking the changes and applying them.

        The problem with having non-serializable classes is that this completely prevents them from being used as fields in user-defined classes which may need to be serializable. Serialization is useful in many situations which are not related to long-term storage and we should not prevent it. So as long as there is no identified problem to have serializable classes (such as very big data structures or singletons), then serializable could be added liberally.

        Show
        Luc Maisonobe added a comment - Instead of full file, we often prefer to get patches in the form of diff. It simplifies checking the changes and applying them. The problem with having non-serializable classes is that this completely prevents them from being used as fields in user-defined classes which may need to be serializable. Serialization is useful in many situations which are not related to long-term storage and we should not prevent it. So as long as there is no identified problem to have serializable classes (such as very big data structures or singletons), then serializable could be added liberally.
        Hide
        Neil Roeth added a comment -

        Luc, that is precisely my situation. I have a grid computing application where my user-defined objects are returned from the grid, and they contain a PolynomialSplineFunction that is a critical part of the calculated result. Gilles, this is "the plumbing needed for distributed applications" in my case.

        Show
        Neil Roeth added a comment - Luc, that is precisely my situation. I have a grid computing application where my user-defined objects are returned from the grid, and they contain a PolynomialSplineFunction that is a critical part of the calculated result. Gilles, this is "the plumbing needed for distributed applications" in my case.
        Hide
        Gilles added a comment -

        this completely prevents them from being used as fields in user-defined classes which may need to be serializable

        The issue (and my question) is: Is this field needed in the serialization? If not, the user can add the "transient" keyword to the variable instance declaration.
        If it is needed, then again, is "Serializable" the correct way? That it is the quick-and-dirty solution, I know.

        Finally, if we don't care about advertising good (or bad) practice in this area, then let's globally add "Serializable" to every class...
        Is there a CheckStyle rule to detect such a requirement? Can anyone come up with a script that would automatically do the job?

        Show
        Gilles added a comment - this completely prevents them from being used as fields in user-defined classes which may need to be serializable The issue (and my question) is: Is this field needed in the serialization? If not, the user can add the "transient" keyword to the variable instance declaration. If it is needed, then again, is "Serializable" the correct way? That it is the quick-and-dirty solution, I know. Finally, if we don't care about advertising good (or bad) practice in this area, then let's globally add "Serializable" to every class... Is there a CheckStyle rule to detect such a requirement? Can anyone come up with a script that would automatically do the job?
        Hide
        Neil Roeth added a comment -

        Gilles,

        The issue (and my question) is: Is this field needed in the serialization?

        Yes.

        If it is needed, then again, is "Serializable" the correct way?

        As you said, "[Serializable] was meant to ease some of the plumbing needed for distributed applications", so since my application is distributed and needs this plumbing, the answer is again yes.

        I raised this issue to add Serializable to this one particular class; if you want to discuss globally adding Serializable to every class, that's a different issue. Could you please open a new issue to discuss that topic? Thanks.

        Show
        Neil Roeth added a comment - Gilles, The issue (and my question) is: Is this field needed in the serialization? Yes. If it is needed, then again, is "Serializable" the correct way? As you said, " [Serializable] was meant to ease some of the plumbing needed for distributed applications", so since my application is distributed and needs this plumbing, the answer is again yes. I raised this issue to add Serializable to this one particular class; if you want to discuss globally adding Serializable to every class, that's a different issue. Could you please open a new issue to discuss that topic? Thanks.
        Hide
        Luc Maisonobe added a comment -

        After a discussion on the developers list, the consensus reached was to not set up Serializable for this kind of objects. See the mailing lists archives for the complete thread.

        There are several workarounds you can use to solve your problem.

        You could use a custom derived class that would implement Serialible. The only code you would have to write is the constructors that would call the constructors of the base class.

        An alternative solution would be to keep the existing non serializable class as is but implement Serialization by custom code on the application level. This would work of course only if you have access to the serialization framework code.

        Show
        Luc Maisonobe added a comment - After a discussion on the developers list, the consensus reached was to not set up Serializable for this kind of objects. See the mailing lists archives for the complete thread. There are several workarounds you can use to solve your problem. You could use a custom derived class that would implement Serialible. The only code you would have to write is the constructors that would call the constructors of the base class. An alternative solution would be to keep the existing non serializable class as is but implement Serialization by custom code on the application level. This would work of course only if you have access to the serialization framework code.
        Hide
        Neil Roeth added a comment -

        I'm kind of surprised at this result. One developer out of three objected, and that's a "consensus"? That developer basically said this class should only be Serialized if X, Y and Z were true. I responded that yes, indeed, it was precisely X,Y and Z that were true and in fact were what led to my request. I'd have thought that, all objections having been answered, it would have been a done deal then. Particularly since PolynomialFunction, a class in the same package, is already Serializable. Instead, "Won't Fix"? What objections were raised that were not answered?

        Luc, the work arounds you propose are poorer practice than using Serializable exactly as it was intended. The work arounds both boil down to "implement serialization of PolynomialSplineFunction through home grown custom code instead of simply adding the Java standard 'implements Serializable'". That is not a better solution, so I am going to keep my locally customized version of the class and hope you someday change your mind.

        Show
        Neil Roeth added a comment - I'm kind of surprised at this result. One developer out of three objected, and that's a "consensus"? That developer basically said this class should only be Serialized if X, Y and Z were true. I responded that yes, indeed, it was precisely X,Y and Z that were true and in fact were what led to my request. I'd have thought that, all objections having been answered, it would have been a done deal then. Particularly since PolynomialFunction, a class in the same package, is already Serializable. Instead, "Won't Fix"? What objections were raised that were not answered? Luc, the work arounds you propose are poorer practice than using Serializable exactly as it was intended. The work arounds both boil down to "implement serialization of PolynomialSplineFunction through home grown custom code instead of simply adding the Java standard 'implements Serializable'". That is not a better solution, so I am going to keep my locally customized version of the class and hope you someday change your mind.
        Hide
        Gilles added a comment -

        The following is what you need to serialize a PolynomialSplineFunction object:

        import java.io.Serializable;
        import org.apache.commons.math3.analysis.polynomials.PolynomialSplineFunction;
        import org.apache.commons.math3.analysis.polynomials.PolynomialFunction;
        
        public class SerialPolynomialSpline extends PolynomialSplineFunction
            implements Serializable {
        
            public SerialPolynomialSpline(double[] knots,
                                          PolynomialFunction[] coeff) {
                super(knots, coeff);
            }
        
            private Object writeReplace() {
                return new SerializationProxy(this);
            }
        
            private static class SerializationProxy
                implements Serializable {
                final double[] knots;
                final double[][] coefficients;
        
                public SerializationProxy() {
                    knots = null;
                    coefficients = null;
                }
                public SerializationProxy(SerialPolynomialSpline spline) {
                    knots = spline.getKnots();
        
                    final PolynomialFunction[] p = spline.getPolynomials();
                    coefficients = new double[p.length][];
                    for (int i = 0; i < p.length; i++) {
                        coefficients[i] = p[i].getCoefficients();
                    }
                }
        
                Object readResolve() {
                    final PolynomialFunction[] p = new PolynomialFunction[coefficients.length];
                    for (int i = 0; i < p.length; i++) {
                        p[i] = new PolynomialFunction(coefficients[i]);
                    }
                    return new SerialPolynomialSpline(knots, p);
                }
            }    
        }
        

        If you only need it for PolynomialSplineFunction, you'll write that once, you'll test it (or not) and off you go.
        In the case of a library, consistency is an important quality; thus, we would need to write that for all the classes, and test them all because by implementing "Serializable", we advertise that the class can be used robustly in any application that would make use of that feature. We did not make that promise, and one of the developers indeed pointed out that supporting "Serializable" is not trivial in terms of maintenance.

        As a matter fact, your issue raised the question of the CM policy with respect to "Serializable". I said that it could be that we implement "Serializable" for everything, in the right way. Waiving that point as unrelated to your issue was not very constructive.
        However, if you want to go in that direction, you are welcome to contribute.

        Show
        Gilles added a comment - The following is what you need to serialize a PolynomialSplineFunction object: import java.io.Serializable; import org.apache.commons.math3.analysis.polynomials.PolynomialSplineFunction; import org.apache.commons.math3.analysis.polynomials.PolynomialFunction; public class SerialPolynomialSpline extends PolynomialSplineFunction implements Serializable { public SerialPolynomialSpline( double [] knots, PolynomialFunction[] coeff) { super (knots, coeff); } private Object writeReplace() { return new SerializationProxy( this ); } private static class SerializationProxy implements Serializable { final double [] knots; final double [][] coefficients; public SerializationProxy() { knots = null ; coefficients = null ; } public SerializationProxy(SerialPolynomialSpline spline) { knots = spline.getKnots(); final PolynomialFunction[] p = spline.getPolynomials(); coefficients = new double [p.length][]; for ( int i = 0; i < p.length; i++) { coefficients[i] = p[i].getCoefficients(); } } Object readResolve() { final PolynomialFunction[] p = new PolynomialFunction[coefficients.length]; for ( int i = 0; i < p.length; i++) { p[i] = new PolynomialFunction(coefficients[i]); } return new SerialPolynomialSpline(knots, p); } } } If you only need it for PolynomialSplineFunction , you'll write that once, you'll test it (or not) and off you go. In the case of a library, consistency is an important quality; thus, we would need to write that for all the classes, and test them all because by implementing "Serializable", we advertise that the class can be used robustly in any application that would make use of that feature. We did not make that promise, and one of the developers indeed pointed out that supporting "Serializable" is not trivial in terms of maintenance. As a matter fact, your issue raised the question of the CM policy with respect to "Serializable". I said that it could be that we implement "Serializable" for everything, in the right way . Waiving that point as unrelated to your issue was not very constructive. However, if you want to go in that direction, you are welcome to contribute.
        Hide
        Leandro Ariel Pezzente added a comment -

        I wonder if there is any way to take your example and transform it in a Generic Class , that way you could create a Serializable wrapper for any propper subclassobject , without having to duplicate code for every class you want to serialize.

        Show
        Leandro Ariel Pezzente added a comment - I wonder if there is any way to take your example and transform it in a Generic Class , that way you could create a Serializable wrapper for any propper subclassobject , without having to duplicate code for every class you want to serialize.
        Hide
        Neil Roeth added a comment -

        Gilles,

        In many cases, the default serialization mechanism is perfectly adequate, as in this case where no one, not even the library maintainers, actually needs to do anything nearly as complex as this code you supplied. Is there any possible problem that adding "implements Serializable" in this one class could cause? The only possible problem you raised is that someone, somewhere, might possibly save this class as long term storage. How about you make this one class Serializable, and if you get a bug report that it doesn't work when someone uses it for long term storage, you just tell that person "Don't do that"? Meanwhile, the rest of us can go ahead and use the Serializable class properly, in exactly the circumstances that you stated serialization was for, i.e., passing data around in distributed processing.

        There is no reason to saddle my simple, isolated issue with the burden of proving that the Commons Math team must commit to making the whole library Serializable. It doesn't have to commit to that. It is perfectly fine to make only part of the library Serializable. The last time I checked, only part of the Java API was Serializable. Just add "implements Serializable" to this class. Then, put a big, bold statement on the web page that says, "We made some parts of Commons Math Serializable where it was trivial to do so, but we do not plan at this time to tackle the parts that are not trivial. Patches to do the latter are welcome, as long as they do it IN THE RIGHT WAY." Then declare success and be happy that you made the library more useful.

        Show
        Neil Roeth added a comment - Gilles, In many cases, the default serialization mechanism is perfectly adequate, as in this case where no one, not even the library maintainers, actually needs to do anything nearly as complex as this code you supplied. Is there any possible problem that adding "implements Serializable" in this one class could cause? The only possible problem you raised is that someone, somewhere, might possibly save this class as long term storage. How about you make this one class Serializable, and if you get a bug report that it doesn't work when someone uses it for long term storage, you just tell that person "Don't do that"? Meanwhile, the rest of us can go ahead and use the Serializable class properly, in exactly the circumstances that you stated serialization was for, i.e., passing data around in distributed processing. There is no reason to saddle my simple, isolated issue with the burden of proving that the Commons Math team must commit to making the whole library Serializable. It doesn't have to commit to that. It is perfectly fine to make only part of the library Serializable. The last time I checked, only part of the Java API was Serializable. Just add "implements Serializable" to this class. Then, put a big, bold statement on the web page that says, "We made some parts of Commons Math Serializable where it was trivial to do so, but we do not plan at this time to tackle the parts that are not trivial. Patches to do the latter are welcome, as long as they do it IN THE RIGHT WAY." Then declare success and be happy that you made the library more useful.
        Hide
        Gilles added a comment -

        It is perfectly fine to make only part of the library Serializable.

        And the policy must provide an clear decision-making procedure to select between the part that will be "Serializable" and the one that won't.
        We are already in trouble, because of this lack of policy, for several classes for which "Serializable" is a supposedly innocuous feature (e.g. fields that should be transient cannot be because no explicit serialization is implemented).
        In the Commons project, there is a commitment that minor releases must be fully compatible; that implies that relying on a default serialization would prevent any change to internal structure of a class.
        Maybe that you are not aware of the constraints imposed by "Serializable"; maybe that you don't care because in your use-case, you'll never be confronted to the problem of a wrong serialized form. But another user's use-case might bring him here complaining about our inconsistent support of "Serializable". Would he be less right than you?
        It is always trivial to add "implements Serializable" to a class definition. But it is not trivial to do the implementation in the right way; just adding "implements Serializable" is not the right way, never. Again, it could be good enough for some purpose, but the wish for CM to be an example of good Java programming, is not compatible with statements such as "We know it's sloppy, so don't use it whenever you need something that works...".

        The purpose of CM is to provide robust implementations of mathematical utilities; supporting distributed applications is a totally different game.
        Easing the use of CM in distributed applications is a worthy enhancement, but it should not put the primary goal at risk (like blocking development because it would be tied by "puny" considerations of backwards compatibility not even related to the "core business").
        Hence I think that it has to be thought about with more care than was done up to now: You rightly pointed out the inconsistency of tagging "PolynomialFunction" as serializable while "PolynomialSplineFunction" is not. Personally, I'm really not attached to backwards compatibility! So I wouldn't mind making that class "Serializable" until we decide to clean up the mess. Thus, if I have it my way, that would mean that in, say, version 4.0, all of the CM classes (not obviously meant as data storage) will be stripped of their "trivial serializability".

        Show
        Gilles added a comment - It is perfectly fine to make only part of the library Serializable. And the policy must provide an clear decision-making procedure to select between the part that will be "Serializable" and the one that won't. We are already in trouble, because of this lack of policy, for several classes for which "Serializable" is a supposedly innocuous feature (e.g. fields that should be transient cannot be because no explicit serialization is implemented). In the Commons project, there is a commitment that minor releases must be fully compatible; that implies that relying on a default serialization would prevent any change to internal structure of a class. Maybe that you are not aware of the constraints imposed by "Serializable"; maybe that you don't care because in your use-case, you'll never be confronted to the problem of a wrong serialized form. But another user's use-case might bring him here complaining about our inconsistent support of "Serializable". Would he be less right than you? It is always trivial to add "implements Serializable" to a class definition. But it is not trivial to do the implementation in the right way; just adding "implements Serializable" is not the right way, never. Again, it could be good enough for some purpose, but the wish for CM to be an example of good Java programming, is not compatible with statements such as "We know it's sloppy, so don't use it whenever you need something that works...". The purpose of CM is to provide robust implementations of mathematical utilities; supporting distributed applications is a totally different game. Easing the use of CM in distributed applications is a worthy enhancement, but it should not put the primary goal at risk (like blocking development because it would be tied by "puny" considerations of backwards compatibility not even related to the "core business"). Hence I think that it has to be thought about with more care than was done up to now: You rightly pointed out the inconsistency of tagging "PolynomialFunction" as serializable while "PolynomialSplineFunction" is not. Personally, I'm really not attached to backwards compatibility! So I wouldn't mind making that class "Serializable" until we decide to clean up the mess. Thus, if I have it my way, that would mean that in, say, version 4.0, all of the CM classes (not obviously meant as data storage) will be stripped of their "trivial serializability".
        Hide
        Neil Roeth added a comment -

        First, thanks for continuing to respond, I do want to understand the reasoning behind this decision and you are helping make it clear.

        We are already in trouble, because of this lack of policy, for several classes for which "Serializable" is a supposedly innocuous feature (e.g. fields that should be transient cannot be because no explicit serialization is implemented).

        Do you mean that if you corrected this and made them transient, you would introduce an incompatibility with objects serialized using older versions of the class? If not that, then could you explain?

        In the Commons project, there is a commitment that minor releases must be fully compatible; that implies that relying on a default serialization would prevent any change to internal structure of a class.

        I understand completely about not making changes in minor releases. Since that commitment is for minor releases, that implies that it could be done at the next major release, right?

        Maybe that you are not aware of the constraints imposed by "Serializable"; maybe that you don't care because in your use-case, you'll never be confronted to the problem of a wrong serialized form. But another user's use-case might bring him here complaining about our inconsistent support of "Serializable". Would he be less right than you?

        "Not aware" is more likely than "don't care". I may be ignorant but I'm not callous. I understand that suddenly changing how a class is serialized could break things for use cases other than mine and that is no less right than mine. Another reason to restrict the change to a major version.

        It is always trivial to add "implements Serializable" to a class definition. But it is not trivial to do the implementation in the right way; just adding "implements Serializable" is not the right way, never. Again, it could be good enough for some purpose, but the wish for CM to be an example of good Java programming, is not compatible with statements such as "We know it's sloppy, so don't use it whenever you need something that works...".

        This is the core of the issue. I can see that implementing Serializable might require special code in some cases, but why is just adding "implements Serializable" never the right way? You provided an example of wrapper code to do serialization for a class that is not serializable, but would you provide an example of how to transform a class that is not serializable directly into one that is? I.e., not with a wrapper, but by adding "implements Serializable" and then adding the explicit serialization methods to the class itself? I would like to see an example of "good Java programming" with regard to serialization.

        The purpose of CM is to provide robust implementations of mathematical utilities; supporting distributed applications is a totally different game.

        In my business, we often do massive calculations that require hundreds or thousands of cpu-hours that need to be completed overnight, so they need to be distributed. Regardless of how robust the implementation is otherwise, if it does not support distribution, then that alone might make it unusable.

        Show
        Neil Roeth added a comment - First, thanks for continuing to respond, I do want to understand the reasoning behind this decision and you are helping make it clear. We are already in trouble, because of this lack of policy, for several classes for which "Serializable" is a supposedly innocuous feature (e.g. fields that should be transient cannot be because no explicit serialization is implemented). Do you mean that if you corrected this and made them transient, you would introduce an incompatibility with objects serialized using older versions of the class? If not that, then could you explain? In the Commons project, there is a commitment that minor releases must be fully compatible; that implies that relying on a default serialization would prevent any change to internal structure of a class. I understand completely about not making changes in minor releases. Since that commitment is for minor releases, that implies that it could be done at the next major release, right? Maybe that you are not aware of the constraints imposed by "Serializable"; maybe that you don't care because in your use-case, you'll never be confronted to the problem of a wrong serialized form. But another user's use-case might bring him here complaining about our inconsistent support of "Serializable". Would he be less right than you? "Not aware" is more likely than "don't care". I may be ignorant but I'm not callous. I understand that suddenly changing how a class is serialized could break things for use cases other than mine and that is no less right than mine. Another reason to restrict the change to a major version. It is always trivial to add "implements Serializable" to a class definition. But it is not trivial to do the implementation in the right way; just adding "implements Serializable" is not the right way, never. Again, it could be good enough for some purpose, but the wish for CM to be an example of good Java programming, is not compatible with statements such as "We know it's sloppy, so don't use it whenever you need something that works...". This is the core of the issue. I can see that implementing Serializable might require special code in some cases, but why is just adding "implements Serializable" never the right way? You provided an example of wrapper code to do serialization for a class that is not serializable, but would you provide an example of how to transform a class that is not serializable directly into one that is? I.e., not with a wrapper, but by adding "implements Serializable" and then adding the explicit serialization methods to the class itself? I would like to see an example of "good Java programming" with regard to serialization. The purpose of CM is to provide robust implementations of mathematical utilities; supporting distributed applications is a totally different game. In my business, we often do massive calculations that require hundreds or thousands of cpu-hours that need to be completed overnight, so they need to be distributed. Regardless of how robust the implementation is otherwise, if it does not support distribution, then that alone might make it unusable.
        Hide
        Gilles added a comment -

        Do you mean that if you corrected this and made them transient, you would introduce an incompatibility with objects serialized using older versions of the class? If not that, then could you explain? [...]

        It was an example of a case where the appropriate serialization code might be missing. In other words, because it is so simple to add "Serializable" just in case, we create problems for later.

        My main point is that CM should not spend its scarce human resources to provide features (in that case: support for serialization) that should require careful planning.
        But if people want to contribute, they will bring the feature, together with the necessary unit testing. Personally, I think that there should be a commitment to support the feature...

        However, as you've understood, my preference would really be to drop all "Serializable" but the indispensable (I was convinced by the remotely-generated exception use case).
        This case will also give you the example of implementing the explicit serialization methods (if that's what you asked): Have a look at the "ExceptionContext" class (in package "o.a.c.m.exception.util").

        In CM, the trends is currently going towards having "immutable" classes. The constructors' precondition checks guarantee the state of the object and people who want that serialization provides the same guarantee (where "good Java programming" includes robustness) have come up with the idiom which I've adapted above for "PolynomialSplineFunction".

        Show
        Gilles added a comment - Do you mean that if you corrected this and made them transient, you would introduce an incompatibility with objects serialized using older versions of the class? If not that, then could you explain? [...] It was an example of a case where the appropriate serialization code might be missing. In other words, because it is so simple to add "Serializable" just in case , we create problems for later. My main point is that CM should not spend its scarce human resources to provide features (in that case: support for serialization) that should require careful planning. But if people want to contribute, they will bring the feature, together with the necessary unit testing. Personally, I think that there should be a commitment to support the feature... However, as you've understood, my preference would really be to drop all "Serializable" but the indispensable (I was convinced by the remotely-generated exception use case). This case will also give you the example of implementing the explicit serialization methods (if that's what you asked): Have a look at the "ExceptionContext" class (in package "o.a.c.m.exception.util"). In CM, the trends is currently going towards having "immutable" classes. The constructors' precondition checks guarantee the state of the object and people who want that serialization provides the same guarantee (where "good Java programming" includes robustness) have come up with the idiom which I've adapted above for "PolynomialSplineFunction".
        Hide
        Neil Roeth added a comment -

        It was an example of a case where the appropriate serialization code might be missing. In other words, because it is so simple to add "Serializable" just in case, we create problems for later.

        I am not understanding what problems this could cause - can you give some explicit examples? I don't know what your criteria are for "appropriate serialization code" - can you describe a case where the default serialization code is not appropriate and why it is not appropriate?

        As far as I can tell, the writeObject() and readObject() methods in ExceptionContext are unnecessary because they simply do what the default mechanism does anyway (except that it replaces throwing an exception for a non-serializable object with a String that says it is non-serializable). What am I missing?

        Your code for a class that serializes PolynomialSplineFunction pulls coefficients[] out of the underlying PolynomialFunction class and serializes them. Why do that instead of just using PolynomialFunction.writeObject()? Why is it better programming practice to create separate serialization classes that have deep knowledge of the class structure of each of its data members rather than encapsulate the serialization in each of those classes? If PolynomialFunction implements Serializable, then either PolynomialFunction.writeObject() will do the right thing or it is a bug in PolynomialFunction's implementation of writeObject() - a serializer for PolynomialSplineFunction shouldn't have to take on the responsibility of serializing the guts of PolynomialFunction. I don't see how breaking encapsulation like this is an improvement over the default mechanism, which doesn't break it.

        Show
        Neil Roeth added a comment - It was an example of a case where the appropriate serialization code might be missing. In other words, because it is so simple to add "Serializable" just in case, we create problems for later. I am not understanding what problems this could cause - can you give some explicit examples? I don't know what your criteria are for "appropriate serialization code" - can you describe a case where the default serialization code is not appropriate and why it is not appropriate? As far as I can tell, the writeObject() and readObject() methods in ExceptionContext are unnecessary because they simply do what the default mechanism does anyway (except that it replaces throwing an exception for a non-serializable object with a String that says it is non-serializable). What am I missing? Your code for a class that serializes PolynomialSplineFunction pulls coefficients[] out of the underlying PolynomialFunction class and serializes them. Why do that instead of just using PolynomialFunction.writeObject()? Why is it better programming practice to create separate serialization classes that have deep knowledge of the class structure of each of its data members rather than encapsulate the serialization in each of those classes? If PolynomialFunction implements Serializable, then either PolynomialFunction.writeObject() will do the right thing or it is a bug in PolynomialFunction's implementation of writeObject() - a serializer for PolynomialSplineFunction shouldn't have to take on the responsibility of serializing the guts of PolynomialFunction. I don't see how breaking encapsulation like this is an improvement over the default mechanism, which doesn't break it.
        Hide
        Gilles added a comment -

        I am not understanding what problems this could cause - can you give some explicit examples? I don't know what your criteria are for "appropriate serialization code" - can you describe a case where the default serialization code is not appropriate and why it is not appropriate?

        I guess that the most trivial problem is serializing with some version of the library and deserializing with another where the internal structure has changed in the meantime (e.g. substituting one instance variable with another that would provide the same basic functionality).

        Please note that the position which I am defending here is not based on "my" criteria for how to handle "Serializable". I'm reporting concerns very well explained in the book "Effective Java" by J. Bloch. Those concerns might be far-fetched and we are totally entitled to ignore them in standalone applications but in the case of a library, we should not presume of the innocuousness of anything we put in.

        As far as I can tell, the writeObject() and readObject() methods in ExceptionContext are unnecessary because they simply do what the default mechanism does anyway (except that it replaces throwing an exception for a non-serializable object with a String that says it is non-serializable). What am I missing?

        That the exception context is part of the exceptions defined in CM: If a non-"Serializable" object is stored in the context, it must be detected, lest the default serialization generates another exception, the result of which would be that the original exception cannot be propagated remotely.

        Concerning your last series of points, there are answers at several different levels.

        • Policy (as a means toward consistency of code design, implementation and maintenance): Cf. previous posts.
        • Resources: CM's core business is mathematical utilities programmed in clear OO Java code. This alone is already too much for a small team. Adding more administrivia is IMHO unreasonable ATM.
        • There is no encapsulation breaking: The code uses only "public" accessors in order to extract the objectively meaningful data that are needed to define the concept represented by "PolynomialSplineFunction" (the knots and the set of coefficients of the polynomial functions that are connected at the knots).
        • In some sense, the default serialization breaks the encapsulation because the object is deserialized without undergoing the constructor's precondition checks; while the "SerializationProxy" actually enforces encapsulation by passing the deserialized to the constructor.
        Show
        Gilles added a comment - I am not understanding what problems this could cause - can you give some explicit examples? I don't know what your criteria are for "appropriate serialization code" - can you describe a case where the default serialization code is not appropriate and why it is not appropriate? I guess that the most trivial problem is serializing with some version of the library and deserializing with another where the internal structure has changed in the meantime (e.g. substituting one instance variable with another that would provide the same basic functionality). Please note that the position which I am defending here is not based on "my" criteria for how to handle "Serializable". I'm reporting concerns very well explained in the book "Effective Java" by J. Bloch. Those concerns might be far-fetched and we are totally entitled to ignore them in standalone applications but in the case of a library , we should not presume of the innocuousness of anything we put in. As far as I can tell, the writeObject() and readObject() methods in ExceptionContext are unnecessary because they simply do what the default mechanism does anyway (except that it replaces throwing an exception for a non-serializable object with a String that says it is non-serializable). What am I missing? That the exception context is part of the exceptions defined in CM: If a non-"Serializable" object is stored in the context, it must be detected, lest the default serialization generates another exception, the result of which would be that the original exception cannot be propagated remotely. Concerning your last series of points, there are answers at several different levels. Policy (as a means toward consistency of code design, implementation and maintenance): Cf. previous posts. Resources: CM's core business is mathematical utilities programmed in clear OO Java code. This alone is already too much for a small team. Adding more administrivia is IMHO unreasonable ATM. There is no encapsulation breaking: The code uses only "public" accessors in order to extract the objectively meaningful data that are needed to define the concept represented by "PolynomialSplineFunction" (the knots and the set of coefficients of the polynomial functions that are connected at the knots). In some sense, the default serialization breaks the encapsulation because the object is deserialized without undergoing the constructor's precondition checks; while the "SerializationProxy" actually enforces encapsulation by passing the deserialized to the constructor.
        Hide
        Luc Maisonobe added a comment -

        I guess that the most trivial problem is serializing with some version of the library and deserializing with another where the internal structure has changed in the meantime (e.g. substituting one instance variable with another that would provide the same basic functionality).

        I think we can simply say this is not supported. Serialization should not be used for long time storage, and even if it can be used for distributed computation with different versions between client and server, we can simply say we don't support it.

        Those concerns might be far-fetched and we are totally entitled to ignore them in standalone applications but in the case of a library, we should not presume of the innocuousness of anything we put in.

        +1 for the concerns of a library, but there are many cases when a simple "implements Serializable" is sufficient and does not imply too much work.

        Could we go for a case by case policy ? There are some classes where serialization could help users and not be too cumbersome. This is a trade off between forbidding serialization completely and supporting it for all classes in a guaranteed way with cross-version support.

        Show
        Luc Maisonobe added a comment - I guess that the most trivial problem is serializing with some version of the library and deserializing with another where the internal structure has changed in the meantime (e.g. substituting one instance variable with another that would provide the same basic functionality). I think we can simply say this is not supported. Serialization should not be used for long time storage, and even if it can be used for distributed computation with different versions between client and server, we can simply say we don't support it. Those concerns might be far-fetched and we are totally entitled to ignore them in standalone applications but in the case of a library, we should not presume of the innocuousness of anything we put in. +1 for the concerns of a library, but there are many cases when a simple "implements Serializable" is sufficient and does not imply too much work. Could we go for a case by case policy ? There are some classes where serialization could help users and not be too cumbersome. This is a trade off between forbidding serialization completely and supporting it for all classes in a guaranteed way with cross-version support.
        Hide
        Neil Roeth added a comment -

        Responding to Luc:

        I think we can simply say this is not supported. Serialization should not be used for long time storage, and even if it can be used for distributed computation with different versions between client and server, we can simply say we don't support it.

        Completely reasonable.

        +1 for the concerns of a library, but there are many cases when a simple "implements Serializable" is sufficient and does not imply too much work.

        Yes, agreed. I think this is one of them.

        Responding to Gilles:

        There is no encapsulation breaking: The code uses only "public" accessors in order to extract the objectively meaningful data that are needed to define the concept represented by "PolynomialSplineFunction" (the knots and the set of coefficients of the polynomial functions that are connected at the knots).

        Well, yes, but add a private field critical to the state of PolynomialFunction and the idiom breaks down. There would be no breakdown if the class itself were responsible for ensuring that readObject() and writeObject() did the right thing.

        Here's an example: Suppose you have a random number generator class RngGilles that has a constructor that takes a single long integer as an argument. The constructor precondition returns true for any value of the long integer. With that long integer, it initializes a private table of several hundred long integers and a private marker of its current position in the table. When RngGilles.nextRandom() is called, it calculates the value to be returned from the table and its current place in the table , then updates the table and its current place, then returns the value. It does this a billion times, then needs to be serialized and deserialized in order to continue to use that same random number sequence in another process. The idiom would fail in that case, while the default writeObject() and readObject() would succeed.

        In some sense, the default serialization breaks the encapsulation because the object is deserialized without undergoing the constructor's precondition checks; while the "SerializationProxy" actually enforces encapsulation by passing the deserialized to the constructor.

        I understand what you are saying, but the concept of serialization is that when you are done deserializing, you have an object that represents the exact same state as the object that was serialized, so since it already satisfied any constructor preconditions when it was first created, there is no need to check them again. It is, of course, possible to subvert this for some particular class, e.g., through poorly written explicit writeObject() and readObject() methods, or by failing to mark some fields transient and making methods that use the transient field handle that, but I'd call that a bug in that particular object's serialization implementation, not a general failure of serialization that needs to be handled outside the class by serialization wrappers. ISTM that the idiom basically boils down to deconstructing objects to their primitive fields and then reconstructing them from those primitives, completely bypassing the whole Serialization mechanism except for those primitive fields.

        Show
        Neil Roeth added a comment - Responding to Luc: I think we can simply say this is not supported. Serialization should not be used for long time storage, and even if it can be used for distributed computation with different versions between client and server, we can simply say we don't support it. Completely reasonable. +1 for the concerns of a library, but there are many cases when a simple "implements Serializable" is sufficient and does not imply too much work. Yes, agreed. I think this is one of them. Responding to Gilles: There is no encapsulation breaking: The code uses only "public" accessors in order to extract the objectively meaningful data that are needed to define the concept represented by "PolynomialSplineFunction" (the knots and the set of coefficients of the polynomial functions that are connected at the knots). Well, yes, but add a private field critical to the state of PolynomialFunction and the idiom breaks down. There would be no breakdown if the class itself were responsible for ensuring that readObject() and writeObject() did the right thing. Here's an example: Suppose you have a random number generator class RngGilles that has a constructor that takes a single long integer as an argument. The constructor precondition returns true for any value of the long integer. With that long integer, it initializes a private table of several hundred long integers and a private marker of its current position in the table. When RngGilles.nextRandom() is called, it calculates the value to be returned from the table and its current place in the table , then updates the table and its current place, then returns the value. It does this a billion times, then needs to be serialized and deserialized in order to continue to use that same random number sequence in another process. The idiom would fail in that case, while the default writeObject() and readObject() would succeed. In some sense, the default serialization breaks the encapsulation because the object is deserialized without undergoing the constructor's precondition checks; while the "SerializationProxy" actually enforces encapsulation by passing the deserialized to the constructor. I understand what you are saying, but the concept of serialization is that when you are done deserializing, you have an object that represents the exact same state as the object that was serialized, so since it already satisfied any constructor preconditions when it was first created, there is no need to check them again. It is, of course, possible to subvert this for some particular class, e.g., through poorly written explicit writeObject() and readObject() methods, or by failing to mark some fields transient and making methods that use the transient field handle that, but I'd call that a bug in that particular object's serialization implementation, not a general failure of serialization that needs to be handled outside the class by serialization wrappers. ISTM that the idiom basically boils down to deconstructing objects to their primitive fields and then reconstructing them from those primitives, completely bypassing the whole Serialization mechanism except for those primitive fields.
        Hide
        Gilles added a comment -

        According to the general convention, this discussion should be ported to the "dev" ML, possibly with a subject prefix of "[ALL]" so that we can get opinions of people who are much more knowledgeable than me on the pros and cons of supporting serialization.

        After I respond to the last points raised here, we could maybe suspend the comments in this forum...

        Adding an "implements Serializable" on a case-by-case basis is not a substitute for a policy (i.e. defining what would be the ideal state for CM). Personally, I think that it will not help users in the long-term, because they will know (because we'll tell them) that they should not rely on a serialization policy that reads "Serializable is added for your convenience but is otherwise unsupported".

        I don't understand why the idiom would break down with the addition of a private field; the only requirement is that a class provides accessors to everything one needs to reconstruct an identical object.

        Could you give a concrete example where one would need to serialize a RNG (where it must continue drawing from the same sequence)?

        There is one subversion which you did not mention (which is the one that usually comes up when the experts talk about serialization) is the possible corruption (accidental or intentional) that could happen while the data goes over the wire. That's what the idiom can thwart.
        On the other hand, you enumerate many sources of possible bugs in the implementation of serialization: That's exactly why I say that we should not implement it, unless we have people on the team that can commit to support it! But that would come back to the issue of policy, having a discussion on the "dev" ML about what should be the serialized form, which classes should be serializable and which not, how to decide for border-line cases, etc.
        That's a moderately ambitious project. IMHO, it is people like you, that would benefit from it, who should primarily contribute to it, even if just to start the discussion on the ML to get on the right track.
        As I told at the beginning, the issue is not simply to add "implements Serializable" to "PolynomialSplienFunction", even if the current lack of consistency in CM would have you believed that it was, for which we apologize.

        Show
        Gilles added a comment - According to the general convention, this discussion should be ported to the "dev" ML, possibly with a subject prefix of " [ALL] " so that we can get opinions of people who are much more knowledgeable than me on the pros and cons of supporting serialization. After I respond to the last points raised here, we could maybe suspend the comments in this forum... Adding an "implements Serializable" on a case-by-case basis is not a substitute for a policy (i.e. defining what would be the ideal state for CM). Personally, I think that it will not help users in the long-term, because they will know (because we'll tell them) that they should not rely on a serialization policy that reads " Serializable is added for your convenience but is otherwise unsupported". I don't understand why the idiom would break down with the addition of a private field; the only requirement is that a class provides accessors to everything one needs to reconstruct an identical object. Could you give a concrete example where one would need to serialize a RNG (where it must continue drawing from the same sequence)? There is one subversion which you did not mention (which is the one that usually comes up when the experts talk about serialization) is the possible corruption (accidental or intentional) that could happen while the data goes over the wire. That's what the idiom can thwart. On the other hand, you enumerate many sources of possible bugs in the implementation of serialization: That's exactly why I say that we should not implement it, unless we have people on the team that can commit to support it! But that would come back to the issue of policy, having a discussion on the "dev" ML about what should be the serialized form, which classes should be serializable and which not, how to decide for border-line cases, etc. That's a moderately ambitious project. IMHO, it is people like you, that would benefit from it, who should primarily contribute to it, even if just to start the discussion on the ML to get on the right track. As I told at the beginning, the issue is not simply to add "implements Serializable" to "PolynomialSplienFunction", even if the current lack of consistency in CM would have you believed that it was, for which we apologize.
        Hide
        Neil Roeth added a comment -

        Of course, I'm fine moving the general discussion elsewhere. IIRC, that suggestion was made earlier in this thread by someone. To answer your specific questions:

        I don't understand why the idiom would break down with the addition of a private field; the only requirement is that a class provides accessors to everything one needs to reconstruct an identical object.

        I meant private fields with no public accessors, and the requirement you state is precisely the one that would make the class impossible to serialize with the idiom while it would be possible with the other methods.

        Could you give a concrete example where one would need to serialize a RNG (where it must continue drawing from the same sequence)?

        Pretty much any RNG should continue using the same sequence rather than restarting from a new seed if the sequences are meant to model the same random element. So, a calculation where you do part of the calculation, do some processing, then do some calculation might be an example. However, evaluating whether this particular case needs to be serialized is missing the point. The point is that there are classes (like this RNG) where the constructor preconditions that the idiom depends upon are nowhere near sufficient to guarantee that an identical object can be created. If you prefer, think of class A with a constructor that takes an argument of class B, then creates from that an internal instance of class C that is required to completely define the state of an instance of A. There is no public accessor for C because that field has no meaning outside of the class itself. The instance of class B used to construct the instance of A is useless after the object is constructed, so it is not saved and therefore is not available at serialization/deserialization time to construct a new instance. The idiom cannot handle that, but the default serialization or explicit writeObject() and readObject() methods can (as long as class C is made Serializable).

        I would be happy to respond to the other, more general points, but since you asked to suspend further comments on the general issue and others are probably tired of the discussion , I'll not do so.

        Show
        Neil Roeth added a comment - Of course, I'm fine moving the general discussion elsewhere. IIRC, that suggestion was made earlier in this thread by someone. To answer your specific questions: I don't understand why the idiom would break down with the addition of a private field; the only requirement is that a class provides accessors to everything one needs to reconstruct an identical object. I meant private fields with no public accessors, and the requirement you state is precisely the one that would make the class impossible to serialize with the idiom while it would be possible with the other methods. Could you give a concrete example where one would need to serialize a RNG (where it must continue drawing from the same sequence)? Pretty much any RNG should continue using the same sequence rather than restarting from a new seed if the sequences are meant to model the same random element. So, a calculation where you do part of the calculation, do some processing, then do some calculation might be an example. However, evaluating whether this particular case needs to be serialized is missing the point. The point is that there are classes (like this RNG) where the constructor preconditions that the idiom depends upon are nowhere near sufficient to guarantee that an identical object can be created. If you prefer, think of class A with a constructor that takes an argument of class B, then creates from that an internal instance of class C that is required to completely define the state of an instance of A. There is no public accessor for C because that field has no meaning outside of the class itself. The instance of class B used to construct the instance of A is useless after the object is constructed, so it is not saved and therefore is not available at serialization/deserialization time to construct a new instance. The idiom cannot handle that, but the default serialization or explicit writeObject() and readObject() methods can (as long as class C is made Serializable). I would be happy to respond to the other, more general points, but since you asked to suspend further comments on the general issue and others are probably tired of the discussion , I'll not do so.

          People

          • Assignee:
            Unassigned
            Reporter:
            Neil Roeth
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development