Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5274

[JavaScript] Wrong array type for countBy

    XMLWordPrintableJSON

Details

    Description

      The countBy function is not returning correct histograms, as it seems to select the wrong array type for the indexing.

      The following line in countBy seems to be causing the problems:

      const countByteLength = Math.ceil(Math.log(vector.dictionary.length) / Math.log(256));

      For example, if the dictionary length is 3, yet the indices length is 1 million, the result of this expression will be 1, which will lead to a Uint8Array being used, again resulting in overflows.

      Codepen example
      https://codepen.io/Yngve92/pen/mYdWrr

      If I switch the expression to: const countByteLength = Math.ceil(Math.log(vector.length) / Math.log(256)); it seems to be working all right, but I am not sure if this is correct.

      The expression is on L63, L189 in src/compute/dataframe.ts.

       

      PR submitted: https://github.com/apache/arrow/pull/4265 

      Attachments

        Issue Links

          Activity

            People

              yngve-sk Yngve Kristiansen
              yngve-sk Yngve Kristiansen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 5m Original Estimate - 5m
                  5m
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h