Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: c
    • Labels:
      None

      Description

      Embedded JSON Parser for C

      • Re-entrant and thread safe
      • Supports multiple parsers running at the same time
      • Uses APR memory pools for memory management
      • No dependencies on yacc/bison/flex/etc
      • Creates a light-weight DOM of JSON text
      • Completely written by me so there's no licensing issues (using the public domain lemon code generator)
      1. AVRO-58.patch
        194 kB
        Matt Massie

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        3m 56s 1 Matt Massie 20/Jun/09 03:46
        Patch Available Patch Available Resolved Resolved
        3d 17h 32m 1 Doug Cutting 23/Jun/09 21:18
        Resolved Resolved Closed Closed
        21d 2h 32m 1 Doug Cutting 14/Jul/09 23:50
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Matt Massie made changes -
        Attachment AVRO-60.patch [ 12413469 ]
        Matt Massie made changes -
        Attachment AVRO-60.patch [ 12413469 ]
        Doug Cutting made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Assignee Matt Massie [ massie ]
        Fix Version/s 1.0.0 [ 12313858 ]
        Resolution Fixed [ 1 ]
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Matt!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Matt!
        Hide
        Matt Massie added a comment -

        Created AVRO-60 assigned to myself to make sure this issue is tracked (shouldn't take more than an hour or so).

        I was suggesting that it should print it to a string, then read the original file into a string [ ... ]

        Good idea. Right now, the JSON printer (JSON_print()) is used solely for debugging.

        There are many things I could add to the JSON parser but I wanted to keep it light for now and let my AVRO schema work inform where I focus my JSON work. When I start working on the JSON import/export function I'm going to replace JSON_print() with code that uses my I/O abstraction anyway (so that I can write JSON to files, sockets or disk) instead of using a FILE pointer. I'm sure I'll be adding many more JSON unit tests as development moves forward.

        Show
        Matt Massie added a comment - Created AVRO-60 assigned to myself to make sure this issue is tracked (shouldn't take more than an hour or so). I was suggesting that it should print it to a string, then read the original file into a string [ ... ] Good idea. Right now, the JSON printer (JSON_print()) is used solely for debugging. There are many things I could add to the JSON parser but I wanted to keep it light for now and let my AVRO schema work inform where I focus my JSON work. When I start working on the JSON import/export function I'm going to replace JSON_print() with code that uses my I/O abstraction anyway (so that I can write JSON to files, sockets or disk) instead of using a FILE pointer. I'm sure I'll be adding many more JSON unit tests as development moves forward.
        Hide
        Doug Cutting added a comment -

        > I planned to include the fix in subsequent patches [ ... ]

        That's fine. Maybe file the issue in Jira now, assigned to you and depending on this issue?

        > I found it too verbose during the unit test [ ... ]

        I was suggesting that it should print it to a string, then read the original file into a string, and compare the strings for equality. Yes, you'd need to make sure that your test JSON files use the same whitespace convention as your printer for this to work, but that should not be too hard. The point is to not just test the parser but also to test that the printer generates the correct, parseable output.

        Show
        Doug Cutting added a comment - > I planned to include the fix in subsequent patches [ ... ] That's fine. Maybe file the issue in Jira now, assigned to you and depending on this issue? > I found it too verbose during the unit test [ ... ] I was suggesting that it should print it to a string, then read the original file into a string, and compare the strings for equality. Yes, you'd need to make sure that your test JSON files use the same whitespace convention as your printer for this to work, but that should not be too hard. The point is to not just test the parser but also to test that the printer generates the correct, parseable output.
        Hide
        Matt Massie added a comment -

        It doesn't look like this generates or parses string escapes correctly yet. That's important for default values.

        I have a TODO in my tokenizer to handle string escapes fully. I planned to include the fix in subsequent patches; however, if you like, I can resubmit this patch with string escapes fully implemented. I'm currently working on the AVRO DOM code and planned to add JSON updates with that patch but I'll context switch if necessary. I submitted the JSON code to prevent submitting a jumbo patch.

        The unit tests would be stronger if they didn't just parse but also printed the parsed value and compared it to the input.

        If you look at line 95 of test_json_parser.c you'll see that the line

        /* JSON_print (stderr, value); */
        

        which pretty prints the JSON DOM. I found it too verbose during the unit test but it was helpful for debugging.

        In the future, I plan to update the unit test to output the column and line of the syntax error. It would also be straight-forward to add a function for comparing JSON DOMs for equivalence but comparing text with text isn't as useful because of whitespace.

        Show
        Matt Massie added a comment - It doesn't look like this generates or parses string escapes correctly yet. That's important for default values. I have a TODO in my tokenizer to handle string escapes fully. I planned to include the fix in subsequent patches; however, if you like, I can resubmit this patch with string escapes fully implemented. I'm currently working on the AVRO DOM code and planned to add JSON updates with that patch but I'll context switch if necessary. I submitted the JSON code to prevent submitting a jumbo patch. The unit tests would be stronger if they didn't just parse but also printed the parsed value and compared it to the input. If you look at line 95 of test_json_parser.c you'll see that the line /* JSON_print (stderr, value); */ which pretty prints the JSON DOM. I found it too verbose during the unit test but it was helpful for debugging. In the future, I plan to update the unit test to output the column and line of the syntax error. It would also be straight-forward to add a function for comparing JSON DOMs for equivalence but comparing text with text isn't as useful because of whitespace.
        Hide
        Doug Cutting added a comment -

        Overall this looks good to me. A few comments:

        • It doesn't look like this generates or parses string escapes correctly yet. That's important for default values.
        • The unit tests would be stronger if they didn't just parse but also printed the parsed value and compared it to the input.
        Show
        Doug Cutting added a comment - Overall this looks good to me. A few comments: It doesn't look like this generates or parses string escapes correctly yet. That's important for default values. The unit tests would be stronger if they didn't just parse but also printed the parsed value and compared it to the input.
        Hide
        Matt Massie added a comment -

        Please let me know if there is any difficulty applying this patch. While I did generate this patch using git, it is just a standard patch which can be applied using the "patch" command like so...

        % patch -p1 < AVRO-58.patch

        The -p1 parameter to patch just tells it to strip the prefix "a/" and "b/" from the patch. Subversion generated patches are -p0. In the future, I'll make sure my patches are -p0 to prevent this confusion. I'm happy to reformat the patch; I haven't submitted a reformatted patch yet in order to avoid having two functionally identical patches.

        Show
        Matt Massie added a comment - Please let me know if there is any difficulty applying this patch. While I did generate this patch using git, it is just a standard patch which can be applied using the "patch" command like so... % patch -p1 < AVRO-58 .patch The -p1 parameter to patch just tells it to strip the prefix "a/" and "b/" from the patch. Subversion generated patches are -p0. In the future, I'll make sure my patches are -p0 to prevent this confusion. I'm happy to reformat the patch; I haven't submitted a reformatted patch yet in order to avoid having two functionally identical patches.
        Hide
        Scott Banachowski added a comment -

        Sounds good. Will investigate using your parser for future revisions, thanks Matt.

        By the way, I did not have a version of flex/bison with "pure" option, so I found another way to get re-entrancy is the c++ option. With that, it puts all the context into a class instead of static vars, so you can create multiple independent instances (which is the approach I used, of course that option doesn't help in the C case).

        Show
        Scott Banachowski added a comment - Sounds good. Will investigate using your parser for future revisions, thanks Matt. By the way, I did not have a version of flex/bison with "pure" option, so I found another way to get re-entrancy is the c++ option. With that, it puts all the context into a class instead of static vars, so you can create multiple independent instances (which is the approach I used, of course that option doesn't help in the C case).
        Hide
        Matt Massie added a comment -

        Yes, it will work with C++. I forgot to put the

        extern "C" {

        declaration in but once that's done I see no reason for this parser to not work in a C++ application (although I haven't tested it). The reason I didn't use bison/flex is that I wanted the parser to be re-entrant, thread-safe and embeddable since we need to support long-running apps that are processing Avro data from sockets, files and disk simultaneously. Newer versions of bison/flex have a %pure parser option (if I remember correctly) but then you have to manage bison/flex version dependencies.

        You could test the parser pretty easily by linking to libavro and including the json.h header file. You'll need to add the

        extern "C" {

        declaration. There is no documentation on the JSON parser (for now) but all the public methods you would need are in "json.h". The test_json_parser unit test should be helpful as well. I'm happy to answer any questions or help with code if you need it. If you like, I could even do C++ tests myself.

        Show
        Matt Massie added a comment - Yes, it will work with C++. I forgot to put the extern "C" { declaration in but once that's done I see no reason for this parser to not work in a C++ application (although I haven't tested it). The reason I didn't use bison/flex is that I wanted the parser to be re-entrant, thread-safe and embeddable since we need to support long-running apps that are processing Avro data from sockets, files and disk simultaneously. Newer versions of bison/flex have a %pure parser option (if I remember correctly) but then you have to manage bison/flex version dependencies. You could test the parser pretty easily by linking to libavro and including the json.h header file. You'll need to add the extern "C" { declaration. There is no documentation on the JSON parser (for now) but all the public methods you would need are in "json.h". The test_json_parser unit test should be helpful as well. I'm happy to answer any questions or help with code if you need it. If you like, I could even do C++ tests myself.
        Hide
        Scott Banachowski added a comment -

        Will this parser be able to work for the C++ version as well? Currently C++ is using bison/flex, but we should leverage the same parser for both if possible.

        Show
        Scott Banachowski added a comment - Will this parser be able to work for the C++ version as well? Currently C++ is using bison/flex, but we should leverage the same parser for both if possible.
        Matt Massie made changes -
        Description Embedded JSON Parser for C

        * Re-entrant and thread safe
        * Supports multiple parsers running at the same time
        * Uses APR memory pools for memory management
        * No dependencies on yacc/bison/flex/etc
        * Creates a light-weight DOM of JSON text
        * Completely written by me so there's no licensing issues (using the public domain lemon parser)


        Embedded JSON Parser for C

        * Re-entrant and thread safe
        * Supports multiple parsers running at the same time
        * Uses APR memory pools for memory management
        * No dependencies on yacc/bison/flex/etc
        * Creates a light-weight DOM of JSON text
        * Completely written by me so there's no licensing issues (using the public domain lemon code generator)


        Hide
        Matt Massie added a comment -

        I'm sorry for the confusion.

        It will apply by running

        % patch -p1 < AVRO-58.patch

        Show
        Matt Massie added a comment - I'm sorry for the confusion. It will apply by running % patch -p1 < AVRO-58 .patch
        Hide
        Thiruvalluvan M. G. added a comment -

        The patch seems to be git-diff rather than the usual svn-diff. How does apply this patch?

        Show
        Thiruvalluvan M. G. added a comment - The patch seems to be git-diff rather than the usual svn-diff. How does apply this patch?
        Matt Massie made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Matt Massie added a comment -

        JSON Parser for C

        Show
        Matt Massie added a comment - JSON Parser for C
        Matt Massie made changes -
        Field Original Value New Value
        Attachment AVRO-58.patch [ 12411297 ]
        Hide
        Matt Massie added a comment -

        This patch contains the code for a JSON parser in C. This parser creates a light-weight DOM which will be used for processing AVRO metadata and JSON import/export. There are a number of unit tests for the code and adding new JSON tests is a easy as dropping a new test into a directory.

        Show
        Matt Massie added a comment - This patch contains the code for a JSON parser in C. This parser creates a light-weight DOM which will be used for processing AVRO metadata and JSON import/export. There are a number of unit tests for the code and adding new JSON tests is a easy as dropping a new test into a directory.
        Matt Massie created issue -

          People

          • Assignee:
            Matt Massie
            Reporter:
            Matt Massie
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1h
              1h
              Remaining:
              Remaining Estimate - 1h
              1h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development