Avro
  1. Avro
  2. AVRO-149

"avrotool" runner to execute avro commands from command-line

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: None
    • Labels:
      None

      Description

      There's already an ant task to generate java code based on a schema, but you can't do it from the command-line, with any ease. I will shortly upload a patch that does just that.

      1. AVRO-149.patch.txt
        10 kB
        Philip Zeyliger
      2. AVRO-149.patch.v2.txt
        12 kB
        Philip Zeyliger
      3. AVRO-149.patch.txt
        12 kB
        Philip Zeyliger
      4. AVRO-149.patch.txt
        10 kB
        Philip Zeyliger

        Activity

        Hide
        Philip Zeyliger added a comment -

        The attached patch introdues "avrotool", a wrapper for calling tools, for Avro, written in Java. Right now there's only one "tool", called "compile", for compiling schemas and protocols. "binary2text", "text2binary", "validate_schema", etc. all make sense, and can be further work.

        To walk you through the patch:

        • There are two shell scripts: avrotool.sh and avrotool-dev.sh. The latter is intended to be called from a checked-out tree, and should work after an "ant compile-java". The former is copied into build/avro-1.2.0-devel/avrotool, has a VERSION tag replaced (to point to the current jar), and is set executable. The scripts are essentially the same, and both are just a few lines. I chose to have two scripts to make it way obvious what's going on.
        • There's a new interface called "Tool" to run tools, a new package (avro.tool), and a runner class (avro.tool.Main()). I hope all of this is considered "evolving" in terms of interfaces: ideally we'd standardize on command-line parsing, help text, return codes, etc., but I wanted to get the most minimal reasonable thing working first. There's certainly much to be done with SpecificCompiler in terms of usability, but, hey, at least now you can compile both schemas and protocols without changing the code
        • I added a dependency on google collections for avrotool. We already pull it in somewhere (it shows up in lib/), and I'm using its ImmutableMap.Builder. If we want to avoid the extra dependency, I'm happy to blow it away.
        • At some point, we might want to separate the runtime and the tools into their own jars. I don't think that point is this change. The jar is "only" 221k.
        • In the future, it would be very reasonable to write some of the tools in C or python. Do you think sitting on the name "avrotool" is bad? I could do "avrojtool"?

        There aren't yet tests. I did the following manually and checked that the temporary directories were filled with java files.

        # From the directory created by "ant package"
        $./avrotool compile protocol src/test/schemata/namespace.avpr /tmp/bar
        $./avrotool compile schema src/test/schemata/interop.avsc /tmp/foo
        # From the checkout directory
        $src/scripts/avrotool-dev.sh  compile schema src/test/schemata/interop.avsc /tmp/foo
        $src/scripts/avrotool-dev.sh  compile protocol src/test/schemata/simple.avpr /tmp/bar
        # Check some bad arguments.  These should output errors.
        $ src/scripts/avrotool-dev.sh 
        $ src/scripts/avrotool-dev.sh  foo
        $ src/scripts/avrotool-dev.sh  compile protocol notenoughargs
        

        I feel bad that the tests aren't automated: suggestions? Shell script invoked by ant?

        Show
        Philip Zeyliger added a comment - The attached patch introdues "avrotool", a wrapper for calling tools, for Avro, written in Java. Right now there's only one "tool", called "compile", for compiling schemas and protocols. "binary2text", "text2binary", "validate_schema", etc. all make sense, and can be further work. To walk you through the patch: There are two shell scripts: avrotool.sh and avrotool-dev.sh. The latter is intended to be called from a checked-out tree, and should work after an "ant compile-java". The former is copied into build/avro-1.2.0-devel/avrotool, has a VERSION tag replaced (to point to the current jar), and is set executable. The scripts are essentially the same, and both are just a few lines. I chose to have two scripts to make it way obvious what's going on. There's a new interface called "Tool" to run tools, a new package (avro.tool), and a runner class (avro.tool.Main()). I hope all of this is considered "evolving" in terms of interfaces: ideally we'd standardize on command-line parsing, help text, return codes, etc., but I wanted to get the most minimal reasonable thing working first. There's certainly much to be done with SpecificCompiler in terms of usability, but, hey, at least now you can compile both schemas and protocols without changing the code I added a dependency on google collections for avrotool. We already pull it in somewhere (it shows up in lib/), and I'm using its ImmutableMap.Builder. If we want to avoid the extra dependency, I'm happy to blow it away. At some point, we might want to separate the runtime and the tools into their own jars. I don't think that point is this change. The jar is "only" 221k. In the future, it would be very reasonable to write some of the tools in C or python. Do you think sitting on the name "avrotool" is bad? I could do "avrojtool"? There aren't yet tests. I did the following manually and checked that the temporary directories were filled with java files. # From the directory created by "ant package" $./avrotool compile protocol src/test/schemata/namespace.avpr /tmp/bar $./avrotool compile schema src/test/schemata/interop.avsc /tmp/foo # From the checkout directory $src/scripts/avrotool-dev.sh compile schema src/test/schemata/interop.avsc /tmp/foo $src/scripts/avrotool-dev.sh compile protocol src/test/schemata/simple.avpr /tmp/bar # Check some bad arguments. These should output errors. $ src/scripts/avrotool-dev.sh $ src/scripts/avrotool-dev.sh foo $ src/scripts/avrotool-dev.sh compile protocol notenoughargs I feel bad that the tests aren't automated: suggestions? Shell script invoked by ant?
        Hide
        Doug Cutting added a comment -

        This is a great addition!

        > I chose to have two scripts to make it way obvious what's going on.

        I think this will prove confusing. Can you please fold these together? We can add build/classes and build/lib when they exist. This is what Hadoop's done for years.

        > I added a dependency on google collections for avrotool.

        If this is easy to avoid then I'd prefer it. I don't know how we're getting that, but it's not currently an explicit dependency and we should add those more cautiously.

        > Do you think sitting on the name "avrotool" is bad? I could do "avrojtool"?

        I'd vote for either just 'avro' or perhaps 'avroj'. This will be the primary (java-based) command-line interface to Avro, so we needn't further qualify it.

        > I feel bad that the tests aren't automated: suggestions? Shell script invoked by ant?

        Guilt is good! A shell script invoked by ant that checks return status and output would be fabulous.

        Show
        Doug Cutting added a comment - This is a great addition! > I chose to have two scripts to make it way obvious what's going on. I think this will prove confusing. Can you please fold these together? We can add build/classes and build/lib when they exist. This is what Hadoop's done for years. > I added a dependency on google collections for avrotool. If this is easy to avoid then I'd prefer it. I don't know how we're getting that, but it's not currently an explicit dependency and we should add those more cautiously. > Do you think sitting on the name "avrotool" is bad? I could do "avrojtool"? I'd vote for either just 'avro' or perhaps 'avroj'. This will be the primary (java-based) command-line interface to Avro, so we needn't further qualify it. > I feel bad that the tests aren't automated: suggestions? Shell script invoked by ant? Guilt is good! A shell script invoked by ant that checks return status and output would be fabulous.
        Hide
        Philip Zeyliger added a comment -

        I think this will prove confusing. Can you please fold these together? We can add build/classes and build/lib when they exist. This is what Hadoop's done for years.

        Done. (I actually find Hadoop's mode confusing, because I don't always know whether i'm using the jars or the classes dir, so I left in an echo if you're using the "development" version.)

        If this is easy to avoid then I'd prefer it. I don't know how we're getting that, but it's not currently an explicit dependency and we should add those more cautiously.

        Gone. It's from checkstyle ("ant -v clean"), btw, so only a compile-time dependency:

        [ivy:retrieve] == resolving dependencies checkstyle#checkstyle;5.0->com.google.collections#google-collections;0.9 [compile->runtime(*)]

        I'd vote for either just 'avro' or perhaps 'avroj'. This will be the primary (java-based) command-line interface to Avro, so we needn't further qualify it.

        I gave it avroj.

        Guilt is good! A shell script invoked by ant that checks return status and output would be fabulous.

        Did something simple. Will attach new patch soon.

        Show
        Philip Zeyliger added a comment - I think this will prove confusing. Can you please fold these together? We can add build/classes and build/lib when they exist. This is what Hadoop's done for years. Done. (I actually find Hadoop's mode confusing, because I don't always know whether i'm using the jars or the classes dir, so I left in an echo if you're using the "development" version.) If this is easy to avoid then I'd prefer it. I don't know how we're getting that, but it's not currently an explicit dependency and we should add those more cautiously. Gone. It's from checkstyle ("ant -v clean"), btw, so only a compile-time dependency: [ivy:retrieve] == resolving dependencies checkstyle#checkstyle;5.0->com.google.collections#google-collections;0.9 [compile->runtime(*)] I'd vote for either just 'avro' or perhaps 'avroj'. This will be the primary (java-based) command-line interface to Avro, so we needn't further qualify it. I gave it avroj. Guilt is good! A shell script invoked by ant that checks return status and output would be fabulous. Did something simple. Will attach new patch soon.
        Hide
        Philip Zeyliger added a comment -

        Here's a new version with concerns addressed. I had to do some ant stuff to make the shell scripts executable in the tarball.

        Show
        Philip Zeyliger added a comment - Here's a new version with concerns addressed. I had to do some ant stuff to make the shell scripts executable in the tarball.
        Hide
        Philip Zeyliger added a comment -

        BTW, if you want to just follow the latest changes, my history is up at http://github.com/philz/avro/commits/tool .

        Show
        Philip Zeyliger added a comment - BTW, if you want to just follow the latest changes, my history is up at http://github.com/philz/avro/commits/tool .
        Hide
        Doug Cutting added a comment -

        > BTW, if you want to just follow the latest changes, my history is up at http://github.com/philz/avro/commits/tool

        I actually prefer to read the full patch again each time. Really!

        Also, when you upload a new version to Jira, please don't add a version number to the file name. Jira will datestamp it, and that way we don't end up with a big list of files at the top of the page to choose among. (The top one is not always the most recent.)

        You renamed the script, but you did not rename the various comments that still use the term "avrotool". These should all be changed to "avroj", no?

        Most other projects commonly use the bin/ directory for executable scripts, and do not have two copies in releases. I'd prefer that. Is there a reason you've avoided that? I don't see the advantage of having the jar's version in the script: the script runs the code in a relative directory, giving preference to the classes directory over any avro-XX.jar file found.

        The warnings generated by the script will be annoying to developers like me, who will use these scripts a lot. They should at least be prefixed "Warning", but I'd really prefer they were removed. As I said above, the contract of the script is to run the code in the tree where the script lives, giving precedence to built code over shipped code. If an end-user hacks some of our code and rebuilds, the script should run their hacked code without complaint. Many users hack open source code. This is normal, and does not warrant warnings.

        Finally, how about naming it 'avroj' rather than 'avroj.sh'? Ant, forrest, findbugs, etc. all name their scripts without a '.sh'.

        Show
        Doug Cutting added a comment - > BTW, if you want to just follow the latest changes, my history is up at http://github.com/philz/avro/commits/tool I actually prefer to read the full patch again each time. Really! Also, when you upload a new version to Jira, please don't add a version number to the file name. Jira will datestamp it, and that way we don't end up with a big list of files at the top of the page to choose among. (The top one is not always the most recent.) You renamed the script, but you did not rename the various comments that still use the term "avrotool". These should all be changed to "avroj", no? Most other projects commonly use the bin/ directory for executable scripts, and do not have two copies in releases. I'd prefer that. Is there a reason you've avoided that? I don't see the advantage of having the jar's version in the script: the script runs the code in a relative directory, giving preference to the classes directory over any avro-XX.jar file found. The warnings generated by the script will be annoying to developers like me, who will use these scripts a lot. They should at least be prefixed "Warning", but I'd really prefer they were removed. As I said above, the contract of the script is to run the code in the tree where the script lives, giving precedence to built code over shipped code. If an end-user hacks some of our code and rebuilds, the script should run their hacked code without complaint. Many users hack open source code. This is normal, and does not warrant warnings. Finally, how about naming it 'avroj' rather than 'avroj.sh'? Ant, forrest, findbugs, etc. all name their scripts without a '.sh'.
        Hide
        Philip Zeyliger added a comment -

        I'll update the patch with your suggestions as soon as I have a chance. I have no objections to "bin/avroj", except that it hides the fact that scripts are, well, source, and not binary, and they have a language, so they should have an extension. What I did was make "src/scripts/avroj.sh" be filtered (to replace @VERSION@), renamed (to avroj), and executable in the build/avro-VERSION directory as part of "ant package". Should tests go in "src/test/bin"?

        Anyway, I'll make it bin/avroj as soon as I have a chance.

        Show
        Philip Zeyliger added a comment - I'll update the patch with your suggestions as soon as I have a chance. I have no objections to "bin/avroj", except that it hides the fact that scripts are, well, source, and not binary, and they have a language, so they should have an extension. What I did was make "src/scripts/avroj.sh" be filtered (to replace @VERSION@), renamed (to avroj), and executable in the build/avro-VERSION directory as part of "ant package". Should tests go in "src/test/bin"? Anyway, I'll make it bin/avroj as soon as I have a chance.
        Hide
        Philip Zeyliger added a comment -

        One more question:

        please don't add a version number to the file name

        I was doing this because I saw people doing it on the Hadoop JIRA. I tried it for AVRO-154, and JIRA still has a long list of files, you just have to scrutinize the date column to figure out which to click. Is that what you prefer?

        Show
        Philip Zeyliger added a comment - One more question: please don't add a version number to the file name I was doing this because I saw people doing it on the Hadoop JIRA. I tried it for AVRO-154 , and JIRA still has a long list of files, you just have to scrutinize the date column to figure out which to click. Is that what you prefer?
        Hide
        Doug Cutting added a comment -

        > I tried it for AVRO-154, and JIRA still has a long list of files, you just have to scrutinize the date column to figure out which to click. Is that what you prefer?

        Yes, I prefer that. The older version is grayed out, making it easy to find the most recent.

        Show
        Doug Cutting added a comment - > I tried it for AVRO-154 , and JIRA still has a long list of files, you just have to scrutinize the date column to figure out which to click. Is that what you prefer? Yes, I prefer that. The older version is grayed out, making it easy to find the most recent.
        Hide
        Philip Zeyliger added a comment -

        Back from travels; clearing out the queue!

        You renamed the script, but you did not rename the various comments that still use the term "avrotool". These should all be changed to "avroj", no?

        Yep, fixed. The package is still org.apache.avro.tool, which I prefer over "avroj", but if you have a strong preference, the search and replace would be trivial.

        Most other projects commonly use the bin/ directory for executable scripts, and do not have two copies in releases.

        I've moved the script into bin/avroj. I've removed the VERSION substitution.

        The warnings generated by the script will be annoying to developers like me, who will use these scripts a lot.

        Removed the warning.

        avroj vs. avroj.sh

        It's "avroj" now. The test script still has a ".sh" suffix on it, since it's really in the src/ tree.

        BTW, are you aware of anything that auto-generates stubs like "avroj"? It seems like every Java project in the world has to have one of these. Inevitably, we'll be adding things similar to $HADOOP_CLASSPATH to this script soon. I'd use a canned script if I knew of one that was liked.

        The diff is also updated to fix some trivial merge conflicts in build.xml.

        Show
        Philip Zeyliger added a comment - Back from travels; clearing out the queue! You renamed the script, but you did not rename the various comments that still use the term "avrotool". These should all be changed to "avroj", no? Yep, fixed. The package is still org.apache.avro.tool, which I prefer over "avroj", but if you have a strong preference, the search and replace would be trivial. Most other projects commonly use the bin/ directory for executable scripts, and do not have two copies in releases. I've moved the script into bin/avroj. I've removed the VERSION substitution. The warnings generated by the script will be annoying to developers like me, who will use these scripts a lot. Removed the warning. avroj vs. avroj.sh It's "avroj" now. The test script still has a ".sh" suffix on it, since it's really in the src/ tree. BTW, are you aware of anything that auto-generates stubs like "avroj"? It seems like every Java project in the world has to have one of these. Inevitably, we'll be adding things similar to $HADOOP_CLASSPATH to this script soon. I'd use a canned script if I knew of one that was liked. The diff is also updated to fix some trivial merge conflicts in build.xml.
        Hide
        Philip Zeyliger added a comment -

        Doug,

        Care to take another look?

        Thanks!

        Show
        Philip Zeyliger added a comment - Doug, Care to take another look? Thanks!
        Hide
        Doug Cutting added a comment -

        Philip, this looks good to me. Thanks!

        Before we commit this, should we consider how AVRO-163 impacts it? If our primary release is to a src tarball, should we also make an avroj binary release? We could distribute avroj as a giant tarball with all required libraries included. Then we almost wouldn't need a shell script, since it would just be 'java -jar avroj.jar ...'. Thoughts?

        Show
        Doug Cutting added a comment - Philip, this looks good to me. Thanks! Before we commit this, should we consider how AVRO-163 impacts it? If our primary release is to a src tarball, should we also make an avroj binary release? We could distribute avroj as a giant tarball with all required libraries included. Then we almost wouldn't need a shell script, since it would just be 'java -jar avroj.jar ...'. Thoughts?
        Hide
        Philip Zeyliger added a comment -

        Doug,

        I'm +1 on the approach in AVRO-163. (Quoting you

        We could package avroj as an executable jar that includes all of the jars we depend on, and release that as a separate binary artifact.

        Yep, I think having a jar file that contains everything would be a boon.

        I'd suggest we commit this as is, and then create a new JIRA to re-formulate avroj as avroj.jar. Seems like that would be easier after the dust in AVRO-163 settles.

        – Philip

        Show
        Philip Zeyliger added a comment - Doug, I'm +1 on the approach in AVRO-163 . (Quoting you We could package avroj as an executable jar that includes all of the jars we depend on, and release that as a separate binary artifact. Yep, I think having a jar file that contains everything would be a boon. I'd suggest we commit this as is, and then create a new JIRA to re-formulate avroj as avroj.jar. Seems like that would be easier after the dust in AVRO-163 settles. – Philip
        Hide
        Philip Zeyliger added a comment -

        I'm working on updating the build.xml file to create the "big jar" that includes everything, so stay tuned.

        Show
        Philip Zeyliger added a comment - I'm working on updating the build.xml file to create the "big jar" that includes everything, so stay tuned.
        Hide
        Philip Zeyliger added a comment -

        Attached patch drops the shell script and combines everything in build/lib and build/classes into one jar, called "avroj-$

        {version}

        .jar". Nothing on the java side has changed (with the possible exception of my cleaning up some trailing whitespace on lines added in this patch) but build.xml has changed.

        Show
        Philip Zeyliger added a comment - Attached patch drops the shell script and combines everything in build/lib and build/classes into one jar, called "avroj-$ {version} .jar". Nothing on the java side has changed (with the possible exception of my cleaning up some trailing whitespace on lines added in this patch) but build.xml has changed.
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Philip!

        I changed two javadoc comments, to make them more user-focused and less developer-focused.

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Philip! I changed two javadoc comments, to make them more user-focused and less developer-focused.
        Hide
        Doug Cutting added a comment -

        It just occurred to me that, if we're going to distribute this binary, we should include the Apache license and the licenses of the included libraries. Also, a "version" command would be nice to have, that prints the current version. Shall I file these as separate issues?

        Show
        Doug Cutting added a comment - It just occurred to me that, if we're going to distribute this binary, we should include the Apache license and the licenses of the included libraries. Also, a "version" command would be nice to have, that prints the current version. Shall I file these as separate issues?
        Hide
        Philip Zeyliger added a comment -

        I filed AVRO-173 (version) and AVRO-174 (license). If you're aware of any Apache-precedent for the license files, please advice in AVRO-174.

        Thanks.

        Show
        Philip Zeyliger added a comment - I filed AVRO-173 (version) and AVRO-174 (license). If you're aware of any Apache-precedent for the license files, please advice in AVRO-174 . Thanks.

          People

          • Assignee:
            Philip Zeyliger
            Reporter:
            Philip Zeyliger
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development