Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-3253

Speed up primitive type array creation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • c
    • Linux x86_64.

    Description

      I want to speed up the array creation for primitive types.

      For example, when my array has 100 000 or more elements current interface to append individual elements is not efficient as it has to loop over those elements and allow me to assign assign the value to each element individually. I would like to just memcpy() the source buffer contents into the avro value type instead.

      I've been looking at the C source code and found `test_data_structures.c`. The raw array functions looked like a good candidate to start hacking. I did this in the test_array():

      avro_raw_array_ensure_size(&array, count);
      void *ptr = avro_raw_array_get_raw(&array, 0);
      memcpy(ptr, buf, array.allocated_size);
      array.element_count = count;
      

      With buf as data source containing 1 000 000 longs I can see 5x improvement in the time it takes to populate the array with this code. Is there a reason why such approach would be bad?

      I'm not sure how to use the resulting array, might need to deep a bit deeper into the code.
      I might be looking into the AVRO_GENERIC_ARRAY_CLASS instead and try to abuse the set_bytes/give_bytes methods, currently set to NULL, to provide the new interface.

      Attachments

        Activity

          People

            Unassigned Unassigned
            hinxx Hinko Kocevar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: