Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12911

[Python] Export scalar aggregate options to pc.sum (sum of zero rows gives null; should give 0)

    XMLWordPrintableJSON

Details

    Description

      >>> pa.compute.sum(pa.array([], pa.int64()))
      <pyarrow.Int64Scalar: None>
      

      I'd expect 0.

      I can't think of any reason for NULL, except that SQL returns NULL. But I can't figure out why SQL returns NULL. Does anybody know? Any textbook – and https://en.wikipedia.org/wiki/Summation – specifies 0.

      Pandas and Numpy return 0. Also, Apache Arrow c_glib implementation returns 0 – and even tests for it: https://github.com/apache/arrow/blob/master/c_glib/test/test-int8-array.rb#L60

      Workaround is to replace all NULLs with 0 after running the computation.

      Attachments

        Issue Links

          Activity

            People

              yibocai Yibo Cai
              adamhooper Adam Hooper
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m