Avro
  1. Avro
  2. AVRO-150

Ant tasks re-generate code even if it's up-to-date

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.3
    • Fix Version/s: 1.4.0
    • Component/s: java
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The Ant tasks for Java's protocol and schema should skip generation when the generated files are newer than their source.

      1. AVRO-150.patch
        3 kB
        John Yu
      2. AVRO-150.patch
        4 kB
        John Yu

        Activity

        Hide
        John Yu added a comment -

        Hi, we are a group of CMU master students that is thinking of contributing to Avro. We plan to tackle this issue first, and would just like to check-in with you beforehand to see if this is ok.

        Show
        John Yu added a comment - Hi, we are a group of CMU master students that is thinking of contributing to Avro. We plan to tackle this issue first, and would just like to check-in with you beforehand to see if this is ok.
        Hide
        Jeff Hammerbacher added a comment -

        Hey John,

        Welcome! It's certainly okay to tackle this issue. Excited to have new contributors in the community.

        Thanks,
        Jeff

        Show
        Jeff Hammerbacher added a comment - Hey John, Welcome! It's certainly okay to tackle this issue. Excited to have new contributors in the community. Thanks, Jeff
        Hide
        John Yu added a comment -

        Hi Jeff,

        Thanks for the reply. Because we are new to this, what is our next step? Will you assign this issue to us or anyone can submit patch to it?

        John

        Show
        John Yu added a comment - Hi Jeff, Thanks for the reply. Because we are new to this, what is our next step? Will you assign this issue to us or anyone can submit patch to it? John
        Hide
        Jeff Hammerbacher added a comment -

        Hey John,

        The next step is for you to attach a patch to this issue. See https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute for more detailed instructions on how to generate the patch.

        Thanks,
        Jeff

        Show
        Jeff Hammerbacher added a comment - Hey John, The next step is for you to attach a patch to this issue. See https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute for more detailed instructions on how to generate the patch. Thanks, Jeff
        Hide
        John Yu added a comment -

        Hi Jeff,

        I have attached a patch we have worked on today. Can you give any comments or feedbacks of it? Thanks

        John

        Show
        John Yu added a comment - Hi Jeff, I have attached a patch we have worked on today. Can you give any comments or feedbacks of it? Thanks John
        Hide
        Doug Cutting added a comment -

        Unfortunately I think this is a little harder than that. The compiler's destination is a root directory. If you compile a protocol named org.foo.Bar then it will generate an interface in org/foo/Bar.java. If that protocol uses a record named com.baz.Goop then it will also write com/baz/Goop.java. Etc.

        If we assume that org.foo.Bar lives in a file named $

        {src}

        /org/foo/Bar.avpr, then we can simply check the dates between that and $

        {dest}

        /org/foo/Bar.java, but I'm not sure whether we should assume that.

        An alternative would be to change SpecificCompiler#compileToDestination() to compare dates and write nothing when the existing file is newer. The SpecificCompiler still has to parse the input file and generate output, but, if no new files are written it should prevent subsequent javac and jar processing.

        Show
        Doug Cutting added a comment - Unfortunately I think this is a little harder than that. The compiler's destination is a root directory. If you compile a protocol named org.foo.Bar then it will generate an interface in org/foo/Bar.java. If that protocol uses a record named com.baz.Goop then it will also write com/baz/Goop.java. Etc. If we assume that org.foo.Bar lives in a file named $ {src} /org/foo/Bar.avpr, then we can simply check the dates between that and $ {dest} /org/foo/Bar.java, but I'm not sure whether we should assume that. An alternative would be to change SpecificCompiler#compileToDestination() to compare dates and write nothing when the existing file is newer. The SpecificCompiler still has to parse the input file and generate output, but, if no new files are written it should prevent subsequent javac and jar processing.
        Hide
        John Yu added a comment -

        Hi,

        Thanks for the reply. First, I think we should not make that assumption because in quickstart tutorial, the ant task scan all "avpr" and "avsc" files in input folder "*/.avpr".

        Second, if we change SpecificCompiler#compileToDestination(), that means everytime we need to pay computation cost to parse the schema and protocol file. Most of the computation time are not saved. But it actually can avoid running later tasks, such as jar. I want to go with the second choice. What do you think?

        Show
        John Yu added a comment - Hi, Thanks for the reply. First, I think we should not make that assumption because in quickstart tutorial, the ant task scan all "avpr" and "avsc" files in input folder "* / .avpr". Second, if we change SpecificCompiler#compileToDestination(), that means everytime we need to pay computation cost to parse the schema and protocol file. Most of the computation time are not saved. But it actually can avoid running later tasks, such as jar. I want to go with the second choice. What do you think?
        Hide
        Doug Cutting added a comment -

        I agree that fixing this in compileToDestination() is preferable.

        Show
        Doug Cutting added a comment - I agree that fixing this in compileToDestination() is preferable.
        Hide
        John Yu added a comment -

        Hello,

        Just to double check and clarify the following two things.

        First, we will do the checking of timestamp in SpecificCompiler#compileToDestination() and write nothing when the existing file is newer. Out of curiosity, couldn't we write what we have into disk, since we have already spent the computation power to parse the protocol and schema? Or would doing that cause logical problems or because writing to disk could also be very resource intensive in this case?

        Second, in the comment you have mentioned "if no new files are written it should prevent subsequent javac and jar processing." We are thinking that while the protocols and schemas may not have changed, other java source files might have. So we are not sure if we should really prevent the javac and jar from running if no new parsing for avpr/avsc has happened. In addition, since we can't assume the order in which the tasks are called in Ant, would it be ok if we stop Ant right then and there?

        Thanks!

        Show
        John Yu added a comment - Hello, Just to double check and clarify the following two things. First, we will do the checking of timestamp in SpecificCompiler#compileToDestination() and write nothing when the existing file is newer. Out of curiosity, couldn't we write what we have into disk, since we have already spent the computation power to parse the protocol and schema? Or would doing that cause logical problems or because writing to disk could also be very resource intensive in this case? Second, in the comment you have mentioned "if no new files are written it should prevent subsequent javac and jar processing." We are thinking that while the protocols and schemas may not have changed, other java source files might have. So we are not sure if we should really prevent the javac and jar from running if no new parsing for avpr/avsc has happened. In addition, since we can't assume the order in which the tasks are called in Ant, would it be ok if we stop Ant right then and there? Thanks!
        Hide
        Doug Cutting added a comment -

        > couldn't we write what we have into disk, since we have already spent the computation power to parse the protocol and schema?

        That would trigger further re-compilation, which we are trying to avoid. We might, when the file is present but with a newer date, check its size. If it's not the same size, then we should re-write it anyway, as the schema has probably changed even though the date is wrong. We could even compare the content before re-writing, and only re-write if the content is different, but that's more expensive and probably overkill. If a workspace becomes confused and folks are seeing unexpected problems, 'ant clean' should always repair things. So I'd suggest checking the size as a double-check, but wouldn't bother checking the content.

        > we are not sure if we should really prevent the javac and jar from running

        You shouldn't have to do anything explicit to stop these. The javac task will only compile things if .java files are newer than corresponding .class files. For jar, if we pass update="true" then it will incrementally update changed files in the jar rather than re-writing the entire jar.

        Show
        Doug Cutting added a comment - > couldn't we write what we have into disk, since we have already spent the computation power to parse the protocol and schema? That would trigger further re-compilation, which we are trying to avoid. We might, when the file is present but with a newer date, check its size. If it's not the same size, then we should re-write it anyway, as the schema has probably changed even though the date is wrong. We could even compare the content before re-writing, and only re-write if the content is different, but that's more expensive and probably overkill. If a workspace becomes confused and folks are seeing unexpected problems, 'ant clean' should always repair things. So I'd suggest checking the size as a double-check, but wouldn't bother checking the content. > we are not sure if we should really prevent the javac and jar from running You shouldn't have to do anything explicit to stop these. The javac task will only compile things if .java files are newer than corresponding .class files. For jar, if we pass update="true" then it will incrementally update changed files in the jar rather than re-writing the entire jar.
        Hide
        John Yu added a comment -

        Hello everyone,

        1. I accidentally set the status of this issue to "patch available", which is a mistake. Can someone who has the privileges unset it for me?

        2. A new patch that reflects the discussion with Doug has been uploaded.

        Thanks!

        Show
        John Yu added a comment - Hello everyone, 1. I accidentally set the status of this issue to "patch available", which is a mistake. Can someone who has the privileges unset it for me? 2. A new patch that reflects the discussion with Doug has been uploaded. Thanks!
        Hide
        Doug Cutting added a comment -

        I just committed this. I fixed a few other things in build.xml so that, e.g., javacc files are not re-generated either. Now when running 'ant compile; ant compile' the second compilation does nothing. The patch also failed to correctly detect unchanged protocol files, so I fixed that too, and fixed the test to remove its files before running, so that it can be run twice without 'ant clean' between.

        Show
        Doug Cutting added a comment - I just committed this. I fixed a few other things in build.xml so that, e.g., javacc files are not re-generated either. Now when running 'ant compile; ant compile' the second compilation does nothing. The patch also failed to correctly detect unchanged protocol files, so I fixed that too, and fixed the test to remove its files before running, so that it can be run twice without 'ant clean' between.

          People

          • Assignee:
            John Yu
            Reporter:
            Doug Cutting
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development