Avro
  1. Avro
  2. AVRO-464

Rework internals of records and schemas for greater performance

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 1.5.2, 1.6.0
    • Component/s: c
    • Labels:
      None

      Description

      As far as I can tell, there's no need for the integer->string hash tables in schemas or records ... they can be arrays instead.

        Activity

        Hide
        Douglas Creager added a comment -

        The generic value implementation (AVRO-837) is much more compact than the old (and now deprecated) datum implementation, so I'm going to mark this as won't-fix. The generic value implementation is also quite a bit faster — tests/performance is showing it an order of magnitude faster.

        Show
        Douglas Creager added a comment - The generic value implementation ( AVRO-837 ) is much more compact than the old (and now deprecated) datum implementation, so I'm going to mark this as won't-fix. The generic value implementation is also quite a bit faster — tests/performance is showing it an order of magnitude faster.
        Hide
        Doug Cutting added a comment -

        What's the status of this issue? There was a commit against it, but it's still open.

        Show
        Doug Cutting added a comment - What's the status of this issue? There was a commit against it, but it's still open.
        Hide
        Bruce Mitchener added a comment -

        This ended up including a first pass at the atom table stuff as well. Going to check in the current iteration of this shortly...

        Show
        Bruce Mitchener added a comment - This ended up including a first pass at the atom table stuff as well. Going to check in the current iteration of this shortly...
        Hide
        Bruce Mitchener added a comment -

        I am broadening the scope of this. I have some larger changes underway and it will be easier as a single patch.

        The initial changes led to a change from 4.2 to 3.7 seconds for serializing the same record object 10,000,000 times without validation. I think we can do better though.

        Show
        Bruce Mitchener added a comment - I am broadening the scope of this. I have some larger changes underway and it will be easier as a single patch. The initial changes led to a change from 4.2 to 3.7 seconds for serializing the same record object 10,000,000 times without validation. I think we can do better though.
        Hide
        Bruce Mitchener added a comment -

        But right now, the code is iterating 0 to num entries so that still would not work. My thought was to progressively rework this from where it is to using an open addressed hash instead and then it can be more efficient and still support sparse entries in the future. I think we can get rid of a fair bit of overhead in the current implementation.

        I will have an experimental patch soon...

        Show
        Bruce Mitchener added a comment - But right now, the code is iterating 0 to num entries so that still would not work. My thought was to progressively rework this from where it is to using an open addressed hash instead and then it can be more efficient and still support sparse entries in the future. I think we can get rid of a fair bit of overhead in the current implementation. I will have an experimental patch soon...
        Hide
        Matt Massie added a comment -

        That's correct given the current API that only allows you to append elements.

        My thought was that we might want to support sparse arrays in the future and the current integer hash tables easily support that.

        Show
        Matt Massie added a comment - That's correct given the current API that only allows you to append elements. My thought was that we might want to support sparse arrays in the future and the current integer hash tables easily support that.

          People

          • Assignee:
            Douglas Creager
            Reporter:
            Bruce Mitchener
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development