Avro
  1. Avro
  2. AVRO-886

Support doc strings in IDL for protocols and message interfaces

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: 1.6.0
    • Component/s: java
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      IDL parsing updated to support capturing doc strings for both Protocol and Message entities.
    • Tags:
      IDL, Code generation

      Description

      I would like to be able to add documentation to the IDL that will get parsed with more types than Enum, Fixed and Record. Specifically, I'd like to support doc strings for protocols and message interfaces. One purpose is to be able to write the documentation into the generated code.

      The specific use case is to use information in the documentation strings to auto generate java annotations in the generated Java code. This is done with our own specifics compiler.

      Here is an example 'marked up' IDL file...

      /** class=@AccessControl(group="normal") */
      @namespace("com.aol.interfaces.echo")
      protocol EchoService {

      import idl "Errors.avdl";

      /** Message structure for the echo service */
      record Message

      { /** the string to be echo'd */ string echome; map<string> echoes; }

      /** method=@AccessControl(source="MyService") */
      string echoString(string msg) throws com.aol.interfaces.error.ServiceError;
      Message echoMessage(Message msg) throws com.aol.interfaces.error.ServiceError;

      void publishMessage(string msg) oneway;
      }

      1. AVRO-866.patch
        14 kB
        Doug Cutting
      2. avro-doc-v3.patch
        14 kB
        George Fletcher
      3. idl-escapes.patch
        1 kB
        Doug Cutting
      4. avroTrace-bug.patch
        0.6 kB
        George Fletcher
      5. newline-in-doc-test.patch
        1 kB
        Doug Cutting
      6. avro-doc-v2.patch
        19 kB
        George Fletcher

        Activity

        Hide
        Doug Cutting added a comment -

        I think getting protocol and message documentation into javadoc would be a good thing.

        That said, if generating protocol annotations is the desired end-goal, then it does not seem appropriate to use documentation strings. Rather, I'd suggest that protocols and messages in Java be extended to support arbitrary properties (like schemas and fields already do). Then the IDL parser would need to be changed to parse these (as it already does for schemas and fields). Finally, we could alter the templates to emit annotations from the value of the "java-annotations" property of protocols, messages, schemas and fields. Does that make sense?

        Show
        Doug Cutting added a comment - I think getting protocol and message documentation into javadoc would be a good thing. That said, if generating protocol annotations is the desired end-goal, then it does not seem appropriate to use documentation strings. Rather, I'd suggest that protocols and messages in Java be extended to support arbitrary properties (like schemas and fields already do). Then the IDL parser would need to be changed to parse these (as it already does for schemas and fields). Finally, we could alter the templates to emit annotations from the value of the "java-annotations" property of protocols, messages, schemas and fields. Does that make sense?
        Hide
        George Fletcher added a comment -

        Yes, that makes a lot of sense. Ideally, I'd like to see both options implemented. I realize that leveraging the doc block as a way to get annotations is a work around (a.k.a. hack... when I started down that path, I was hoping it was something I could do without changing the avro code.

        So it seems there are really two things to tackle...
        1. Modify the IDL parser to capture doc for both the protocols and messages and save it.
        2. Modify the IDL parser to allow properties for the protocol and message definitions and then expose these properties via the parsed code.

        I'm assuming that the property information would need to appear in the JSON version of the protocol? This then would affect all languages that depend on the JSON encoding (e.g. python). Correct?

        Show
        George Fletcher added a comment - Yes, that makes a lot of sense. Ideally, I'd like to see both options implemented. I realize that leveraging the doc block as a way to get annotations is a work around (a.k.a. hack ... when I started down that path, I was hoping it was something I could do without changing the avro code. So it seems there are really two things to tackle... 1. Modify the IDL parser to capture doc for both the protocols and messages and save it. 2. Modify the IDL parser to allow properties for the protocol and message definitions and then expose these properties via the parsed code. I'm assuming that the property information would need to appear in the JSON version of the protocol? This then would affect all languages that depend on the JSON encoding (e.g. python). Correct?
        Hide
        Doug Cutting added a comment -

        The JSON is meant to be extensible:

        "Attributes not defined in this document are permitted as metadata, but must not affect the format of serialized data."

        from http://avro.apache.org/docs/current/spec.html#schemas

        So no changes are required to other implementations if we start using a "java-annotations" property.

        Show
        Doug Cutting added a comment - The JSON is meant to be extensible: "Attributes not defined in this document are permitted as metadata, but must not affect the format of serialized data." from http://avro.apache.org/docs/current/spec.html#schemas So no changes are required to other implementations if we start using a "java-annotations" property.
        Hide
        George Fletcher added a comment -

        So I believe I've got code working against 1.6.0-SNAPSHOT (revision 1170735) to handle doc strings for protocol and message entities. I'll submit a patch.

        I ran into a couple interesting issues that are probably undiscovered bugs in the existing implementations.

        1. The SpecificCompiler.java javaEscape() method is currently escaping double-quotes even if they are already escaped in the string. I changed the replace() call with replaceAll("([\\\\])\"", "$1\\\\\"").replaceFirst("\"", "\\\\\"") which basically will replace a double-quote with \" as long as it is not already escaped.

        2. JSON doc strings need newlines to be escaped. If a doc string was multi-line and contained newlines, the JSON parser complained. I fixed these by escaping the newlines when ever generated the JSON objects (e.g. toJson()).

        Show
        George Fletcher added a comment - So I believe I've got code working against 1.6.0-SNAPSHOT (revision 1170735) to handle doc strings for protocol and message entities. I'll submit a patch. I ran into a couple interesting issues that are probably undiscovered bugs in the existing implementations. 1. The SpecificCompiler.java javaEscape() method is currently escaping double-quotes even if they are already escaped in the string. I changed the replace() call with replaceAll("([ \\\\])\"", "$1\\\\\"").replaceFirst(" \"", "\\\\\"") which basically will replace a double-quote with \" as long as it is not already escaped. 2. JSON doc strings need newlines to be escaped. If a doc string was multi-line and contained newlines, the JSON parser complained. I fixed these by escaping the newlines when ever generated the JSON objects (e.g. toJson()).
        Hide
        George Fletcher added a comment -

        Did not attache the patch

        Show
        George Fletcher added a comment - Did not attache the patch
        Hide
        George Fletcher added a comment -

        Patch to avro 1.6.0 that improves the IDL parser to capture doc strings for both protocols and messages.

        Show
        George Fletcher added a comment - Patch to avro 1.6.0 that improves the IDL parser to capture doc strings for both protocols and messages.
        Hide
        George Fletcher added a comment -

        Attached file avro-doc.patch to this jira ticket

        Show
        George Fletcher added a comment - Attached file avro-doc.patch to this jira ticket
        Hide
        George Fletcher added a comment -

        Minor update to the patch that restores newlines in the multi-line doc strings when written by the velocity templates.

        Show
        George Fletcher added a comment - Minor update to the patch that restores newlines in the multi-line doc strings when written by the velocity templates.
        Hide
        George Fletcher added a comment -

        Not sure how to delete attached files, so if interested in the patch, please use avro-doc-v2.patch to pick up all the latest tweaks. Thanks.

        Show
        George Fletcher added a comment - Not sure how to delete attached files, so if interested in the patch, please use avro-doc-v2.patch to pick up all the latest tweaks. Thanks.
        Hide
        Doug Cutting added a comment -

        > JSON doc strings need newlines to be escaped.

        Jackson does not usually emit JSON it cannot parse. Here's a patch that adds a test of parsing newlines in a doc string. The modified test passes for me. Where are you seeing failures?

        Show
        Doug Cutting added a comment - > JSON doc strings need newlines to be escaped. Jackson does not usually emit JSON it cannot parse. Here's a patch that adds a test of parsing newlines in a doc string. The modified test passes for me. Where are you seeing failures?
        Hide
        George Fletcher added a comment -

        Right and that is the problem I ran into with multi-line comments in the IDL. In the patch you provided, the newline (\n) is escaped as in
        n. As long as the JSON string has the newlines escaped everything is fine. The problem that I found is that the IDL parsing logic does NOT escape the newlines in a multi-line comment. This is true for enum, fixed and record schemas.

        The code generation phased worked fine, but when it came time to compile the generated java code, the Protocol.parse() method failed because the "doc" string in the JSON structure did not have the newlines properly escaped.

        For example, try adding the following multi-line comment to the avroTrace.avdl file.

        /**

        • An individual span is the basic unit of testing.
        • The record is used by both client and server.
          */
          record Span {

        The some of the tests in the ipc module will fail because the doc string contains an un-escaped newline. I've uploaded the simple patch for the avroTrace.avdl file.

        Hence in my patch, I explicitly escape any newlines in the doc strings when generating the JSON format, and convert them back to newlines when "getting" the doc (protocol.getDoc()) so that comments in the generated code look much better.

        Show
        George Fletcher added a comment - Right and that is the problem I ran into with multi-line comments in the IDL. In the patch you provided, the newline (\n) is escaped as in n. As long as the JSON string has the newlines escaped everything is fine. The problem that I found is that the IDL parsing logic does NOT escape the newlines in a multi-line comment. This is true for enum, fixed and record schemas. The code generation phased worked fine, but when it came time to compile the generated java code, the Protocol.parse() method failed because the "doc" string in the JSON structure did not have the newlines properly escaped. For example, try adding the following multi-line comment to the avroTrace.avdl file. /** An individual span is the basic unit of testing. The record is used by both client and server. */ record Span { The some of the tests in the ipc module will fail because the doc string contains an un-escaped newline. I've uploaded the simple patch for the avroTrace.avdl file. Hence in my patch, I explicitly escape any newlines in the doc strings when generating the JSON format, and convert them back to newlines when "getting" the doc (protocol.getDoc()) so that comments in the generated code look much better.
        Hide
        George Fletcher added a comment -

        Argh... looks like I need wiki help The first sentence of my last comment should read...

        Right and that is the problem I ran into with multi-line comments in the IDL. In the patch you provided, the newline (\n) is escaped as in

        \\n
        Show
        George Fletcher added a comment - Argh... looks like I need wiki help The first sentence of my last comment should read... Right and that is the problem I ran into with multi-line comments in the IDL. In the patch you provided, the newline (\n) is escaped as in \\n
        Hide
        George Fletcher added a comment -

        For good measure, if you make the comment of the Span record in the avroTrace.avdl file look like this

            /**
             * An individual span is the basic unit of testing.
             * The record is used by both \"client\" and \"server\".
             */
            record Span {
        

        you will trigger the other bug I ran into while adding doc support for Protocols and Messages. Bascially, the SpecificCompiler.java in it's javaEscape() method doesn't ignore already escaped double-quotes and hence replaces

        \" with \\"

        which causes an improperly escaped string in the generated java sources. In the case the code doesn't even compile. The resulting generated avroTrace.java file contains this statement.

        public static final org.apache.avro.Protocol PROTOCOL = org.apache.avro.Protocol.parse("{\"protocol\":\"AvroTrace\",\"namespace\":\"org.apache.avro.ipc.trace\",\"types\":[{\"type\":\"enum\",\"name\":\"SpanEvent\",\"symbols\":[\"SERVER_RECV\",\"SERVER_SEND\",\"CLIENT_RECV\",\"CLIENT_SEND\"]},{\"type\":\"fixed\",\"name\":\"ID\",\"size\":8},{\"type\":\"record\",\"name\":\"TimestampedEvent\",\"fields\":[{\"name\":\"timeStamp\",\"type\":\"long\"},{\"name\":\"event\",\"type\":[\"SpanEvent\",\"string\"]}]},{\"type\":\"record\",\"name\":\"Span\",\"doc\":\"* An individual span is the basic unit of testing.\n   * The record is used by both \\\\"client\\\\" and \\\\"server\\\\".\",\"fields\":[{\"name\":\"traceID\",\"type\":\"ID\"},{\"name\":\"spanID\",\"type\":\"ID\"},{\"name\":\"parentSpanID\",\"type\":[\"ID\",\"null\"]},{\"name\":\"messageName\",\"type\":\"string\"},{\"name\":\"requestPayloadSize\",\"type\":\"long\"},{\"name\":\"responsePayloadSize\",\"type\":\"long\"},{\"name\":\"requestorHostname\",\"type\":[\"string\",\"null\"]},{\"name\":\"responderHostname\",\"type\":[\"string\",\"null\"]},{\"name\":\"events\",\"type\":{\"type\":\"array\",\"items\":\"TimestampedEvent\"}},{\"name\":\"complete\",\"type\":\"boolean\"}]}],\"messages\":{\"getAllSpans\":{\"request\":[],\"response\":{\"type\":\"array\",\"items\":\"Span\"}},\"getSpansInRange\":{\"request\":[{\"name\":\"start\",\"type\":\"long\"},{\"name\":\"end\",\"type\":\"long\"}],\"response\":{\"type\":\"array\",\"items\":\"Span\"}}}}");
        

        Scroll until you find the "doc" string for the "Span" record. I used a fresh checkout of 1.6.0 for this test that did NOT contain my patch.

        I'm sorry that my initial comments weren't very clear.

        Show
        George Fletcher added a comment - For good measure, if you make the comment of the Span record in the avroTrace.avdl file look like this /** * An individual span is the basic unit of testing. * The record is used by both \"client\" and \"server\". */ record Span { you will trigger the other bug I ran into while adding doc support for Protocols and Messages. Bascially, the SpecificCompiler.java in it's javaEscape() method doesn't ignore already escaped double-quotes and hence replaces \" with \\" which causes an improperly escaped string in the generated java sources. In the case the code doesn't even compile. The resulting generated avroTrace.java file contains this statement. public static final org.apache.avro.Protocol PROTOCOL = org.apache.avro.Protocol.parse("{\"protocol\":\"AvroTrace\",\"namespace\":\"org.apache.avro.ipc.trace\",\"types\":[{\"type\":\"enum\",\"name\":\"SpanEvent\",\"symbols\":[\"SERVER_RECV\",\"SERVER_SEND\",\"CLIENT_RECV\",\"CLIENT_SEND\"]},{\"type\":\"fixed\",\"name\":\"ID\",\"size\":8},{\"type\":\"record\",\"name\":\"TimestampedEvent\",\"fields\":[{\"name\":\"timeStamp\",\"type\":\"long\"},{\"name\":\"event\",\"type\":[\"SpanEvent\",\"string\"]}]},{\"type\":\"record\",\"name\":\"Span\",\"doc\":\"* An individual span is the basic unit of testing.\n * The record is used by both \\\\"client\\\\" and \\\\"server\\\\".\",\"fields\":[{\"name\":\"traceID\",\"type\":\"ID\"},{\"name\":\"spanID\",\"type\":\"ID\"},{\"name\":\"parentSpanID\",\"type\":[\"ID\",\"null\"]},{\"name\":\"messageName\",\"type\":\"string\"},{\"name\":\"requestPayloadSize\",\"type\":\"long\"},{\"name\":\"responsePayloadSize\",\"type\":\"long\"},{\"name\":\"requestorHostname\",\"type\":[\"string\",\"null\"]},{\"name\":\"responderHostname\",\"type\":[\"string\",\"null\"]},{\"name\":\"events\",\"type\":{\"type\":\"array\",\"items\":\"TimestampedEvent\"}},{\"name\":\"complete\",\"type\":\"boolean\"}]}],\"messages\":{\"getAllSpans\":{\"request\":[],\"response\":{\"type\":\"array\",\"items\":\"Span\"}},\"getSpansInRange\":{\"request\":[{\"name\":\"start\",\"type\":\"long\"},{\"name\":\"end\",\"type\":\"long\"}],\"response\":{\"type\":\"array\",\"items\":\"Span\"}}}}"); Scroll until you find the "doc" string for the "Span" record. I used a fresh checkout of 1.6.0 for this test that did NOT contain my patch. I'm sorry that my initial comments weren't very clear.
        Hide
        Doug Cutting added a comment -

        George: you're right, there's a bug in the escaping of schemas included in specific compiler output. I've attached a one-line change that fixes this. With this, I don't believe you'll need to make all the escape-related changes in your patch.

        Should I file a separate issue for this, or would you just like to include this change in your patch?

        Show
        Doug Cutting added a comment - George: you're right, there's a bug in the escaping of schemas included in specific compiler output. I've attached a one-line change that fixes this. With this, I don't believe you'll need to make all the escape-related changes in your patch. Should I file a separate issue for this, or would you just like to include this change in your patch?
        Hide
        George Fletcher added a comment -

        I'm happy to update my patch to include this simplification.

        Show
        George Fletcher added a comment - I'm happy to update my patch to include this simplification.
        Hide
        George Fletcher added a comment -

        Simplified JSON string escaping logic (thanks Doug!)

        Show
        George Fletcher added a comment - Simplified JSON string escaping logic (thanks Doug!)
        Hide
        George Fletcher added a comment -

        Ok, new patch file (avro-doc-v3.patch) uploaded.

        Show
        George Fletcher added a comment - Ok, new patch file (avro-doc-v3.patch) uploaded.
        Hide
        George Fletcher added a comment -

        This time with ASF license grant

        Show
        George Fletcher added a comment - This time with ASF license grant
        Hide
        Doug Cutting added a comment -

        This is looking great. A few nits:

        • The changes to output/unicode.avpr were garbled.
        • There were some whitespace/indendation changes in Protocol.java that weren't needed.

        I've attached a new version of a patch that fixes these details. I'll commit this tomorrow unless there are objections.

        Show
        Doug Cutting added a comment - This is looking great. A few nits: The changes to output/unicode.avpr were garbled. There were some whitespace/indendation changes in Protocol.java that weren't needed. I've attached a new version of a patch that fixes these details. I'll commit this tomorrow unless there are objections.
        Hide
        Doug Cutting added a comment -

        I committed this. Thanks, George!

        Show
        Doug Cutting added a comment - I committed this. Thanks, George!

          People

          • Assignee:
            George Fletcher
            Reporter:
            George Fletcher
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development