Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.1
    • Fix Version/s: 1.7.0
    • Component/s: c++
    • Labels:

      Description

      This patch adds support for reading and resolving schemas that use namespaces for named types. Looking for a code review and application of the patch to codebase.

      Node is modified to take an additional NamespaceConcept attribute in the template parameters. An additional stack has been added to the compiler context to track when namespaces have been entered or left for resolving further types/symbols. Symbolic resolution is now done against the "fullname" of the type and not just by the name. Some string gymnastics were needed in other areas of codegen in order to handle the new symbols.

      Added very trivial tests to AvrogencppTests for schema generation by avrogencpp and added a test schema (tweet) that has namespaces, a record definition inside a namespace, and then a later symbolic reference by name within an outer namespace. Also patched to work with gen-cppcode.py output. Note that NodeImpl::printBasicInfo intentionally does not output the namespace since that caused downstream breakage of gen-cppcode.py expected format.

      github project was forked at https://github.com/spindlelabs/avro

      1. AVRO-1026.patch
        32 kB
        Keh-Li Sheng
      2. AVRO-1026_json.patch
        1 kB
        Keh-Li Sheng
      3. AVRO-1026_with_AVRO-956.patch
        38 kB
        Keh-Li Sheng
      4. AVRO-1026-2.patch
        33 kB
        Thiruvalluvan M. G.

        Issue Links

          Activity

          Hide
          Steve Roehrs added a comment -

          This change has introduced a bug in NodeImpl.cc that corrupts JSON schemas when writing them to disk or dumping to a stream. This manifested when using DataFileWriter to generate a data file, and the schema stored in the header was corrupted from the original schema provided.
          e.g

          This:

          { “type”: “fixed”, “name”: “Unsigned16”, “size”: 2 }

          Becomes this when written out :

          { “type”: “fixed”, “size”: 2, “name” : “Unsigned16”, }

          Note the extraneous comma after the ‘name’ attribute, which makes it invalid JSON, and results in the data file being unreadable by AVRO.

          Since this change appears closed, I will raise a new ticket for the bug.

          Show
          Steve Roehrs added a comment - This change has introduced a bug in NodeImpl.cc that corrupts JSON schemas when writing them to disk or dumping to a stream. This manifested when using DataFileWriter to generate a data file, and the schema stored in the header was corrupted from the original schema provided. e.g This: { “type”: “fixed”, “name”: “Unsigned16”, “size”: 2 } Becomes this when written out : { “type”: “fixed”, “size”: 2, “name” : “Unsigned16”, } Note the extraneous comma after the ‘name’ attribute, which makes it invalid JSON, and results in the data file being unreadable by AVRO. Since this change appears closed, I will raise a new ticket for the bug.
          Hide
          Keh-Li Sheng added a comment -

          Thiruvalluvan M. G. - thanks for the patch! Looking forward to it landing in 1.7.0

          Show
          Keh-Li Sheng added a comment - Thiruvalluvan M. G. - thanks for the patch! Looking forward to it landing in 1.7.0
          Hide
          Thiruvalluvan M. G. added a comment -

          Committed revision 1340768.

          Show
          Thiruvalluvan M. G. added a comment - Committed revision 1340768.
          Hide
          Thiruvalluvan M. G. added a comment -

          Verified that the new patch works on:

          • Windows 7 (Visual C++ Express 2010)
          • Cygwin on Windows 7
          • Ubuntu 12.04
          Show
          Thiruvalluvan M. G. added a comment - Verified that the new patch works on: Windows 7 (Visual C++ Express 2010) Cygwin on Windows 7 Ubuntu 12.04
          Hide
          Thiruvalluvan M. G. added a comment -

          Here is a modified version of Keh-Li Sheng's patch, where instead of introducing a new template parameter, I've changed the NameConcept to use a new class Name instead of std::string.

          I think the namespace belongs to the NameConcept instead of its own concept. Hence this change.

          The original tests pass on Windows 7 and Cygwin.

          The code generator does not currently encode the namespace into the C++ types it generates. It should not be a problem in most cases; but if there is a name clash the generated code will not compile because of duplicate type definitions. I tried to encode the namespace into the names of the generated types (by replacing dot with underscore). It results in too many long names, which would be too tiresome for the users to use. We have to solve it in some other way.

          Show
          Thiruvalluvan M. G. added a comment - Here is a modified version of Keh-Li Sheng's patch, where instead of introducing a new template parameter, I've changed the NameConcept to use a new class Name instead of std::string. I think the namespace belongs to the NameConcept instead of its own concept. Hence this change. The original tests pass on Windows 7 and Cygwin. The code generator does not currently encode the namespace into the C++ types it generates. It should not be a problem in most cases; but if there is a name clash the generated code will not compile because of duplicate type definitions. I tried to encode the namespace into the names of the generated types (by replacing dot with underscore). It results in too many long names, which would be too tiresome for the users to use. We have to solve it in some other way.
          Hide
          Keh-Li Sheng added a comment -

          Adding submit patch tag for more visibility. Previously commented on by @thiru_mg but nothing has happened since.

          Show
          Keh-Li Sheng added a comment - Adding submit patch tag for more visibility. Previously commented on by @thiru_mg but nothing has happened since.
          Hide
          Keh-Li Sheng added a comment -

          @thiru_mg: I merged latest the latest upstream trunk and reapplied the changes from the last patch. Please try AVRO-1026withAVRO-956.patch on your trunk.

          I've also thought a bit more about your suggestion to push name into namespace and I could go either way again. In some cases it's nice to know if there's a namespace but in others it is a burden to inspect for it.

          Show
          Keh-Li Sheng added a comment - @thiru_mg: I merged latest the latest upstream trunk and reapplied the changes from the last patch. Please try AVRO-1026 with AVRO-956 .patch on your trunk. I've also thought a bit more about your suggestion to push name into namespace and I could go either way again. In some cases it's nice to know if there's a namespace but in others it is a burden to inspect for it.
          Hide
          Keh-Li Sheng added a comment -

          I forked from github and didn't update to the latest trunk before creating the patch. I'm resolving the merge now and will see if I can produce a clean patch.

          I debated this a bit when implementing it and I chose in favor of keeping name/namespace/fullname as separate concepts since there were a lot of downstream effects to pushing namespaces into the name field - primarily in avrogencpp and gen-cppcode.py where name is used quite frequently as a field name or class/struct specifier (in which case the '.' in namespaces would be an invalid character). That meant additional logic would be required to strip the name of its fully qualified path, whereas it is more straightforward to know how to compose the full path when needed.

          Additionally, if there was a future desire to "nest" declarations within C++ namespaces that tracked the schema namespace, having a separate namespace attribute would make that easier as well.

          I'm not quite sure I understand what you mean by "backward compatibility" since this change does not require namespace attributes from existing schemas (all the existing schemas in jsonschemas seem to parse and validate and tests run the same as before), it merely knows what to do with them when it encounters them.

          Show
          Keh-Li Sheng added a comment - I forked from github and didn't update to the latest trunk before creating the patch. I'm resolving the merge now and will see if I can produce a clean patch. I debated this a bit when implementing it and I chose in favor of keeping name/namespace/fullname as separate concepts since there were a lot of downstream effects to pushing namespaces into the name field - primarily in avrogencpp and gen-cppcode.py where name is used quite frequently as a field name or class/struct specifier (in which case the '.' in namespaces would be an invalid character). That meant additional logic would be required to strip the name of its fully qualified path, whereas it is more straightforward to know how to compose the full path when needed. Additionally, if there was a future desire to "nest" declarations within C++ namespaces that tracked the schema namespace, having a separate namespace attribute would make that easier as well. I'm not quite sure I understand what you mean by "backward compatibility" since this change does not require namespace attributes from existing schemas (all the existing schemas in jsonschemas seem to parse and validate and tests run the same as before), it merely knows what to do with them when it encounters them.
          Hide
          Thiruvalluvan M. G. added a comment -

          The patch does not apply cleanly on the current trunk. I think this patch was generated before AVRO-956. We have made two important changes in the recent past to get rid of Flex/Bison dependency and to make it compile natively on Windows using Visual Studio.

          I've not studied this patch deep. So my question may be naive.

          Is it possible to make the name attribute include the namespace as well instead of adding a new attribute? If the namespace is not given, the name will fall back to the simple name as it is now. If namespace is explicitly given or implied, it will have a fully qualified name. That way backward compatibility with existing schemas and code will be maintained.

          Show
          Thiruvalluvan M. G. added a comment - The patch does not apply cleanly on the current trunk. I think this patch was generated before AVRO-956 . We have made two important changes in the recent past to get rid of Flex/Bison dependency and to make it compile natively on Windows using Visual Studio. I've not studied this patch deep. So my question may be naive. Is it possible to make the name attribute include the namespace as well instead of adding a new attribute? If the namespace is not given, the name will fall back to the simple name as it is now. If namespace is explicitly given or implied, it will have a fully qualified name. That way backward compatibility with existing schemas and code will be maintained.
          Hide
          Keh-Li Sheng added a comment -

          Adding a patch for JsonCodec.cc to understand namespace/fullname symbolic references inside a union type (ie. the JSON field name is the type name).

          Contains within it a typo patch from https://issues.apache.org/jira/browse/AVRO-954

          Show
          Keh-Li Sheng added a comment - Adding a patch for JsonCodec.cc to understand namespace/fullname symbolic references inside a union type (ie. the JSON field name is the type name). Contains within it a typo patch from https://issues.apache.org/jira/browse/AVRO-954
          Hide
          Keh-Li Sheng added a comment -

          Parallels improvement to C implementation

          Show
          Keh-Li Sheng added a comment - Parallels improvement to C implementation
          Hide
          Keh-Li Sheng added a comment -

          Attaching patch

          Show
          Keh-Li Sheng added a comment - Attaching patch

            People

            • Assignee:
              Keh-Li Sheng
              Reporter:
              Keh-Li Sheng
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development