Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12495

DataFrame API: groupby(dropna=False) still drops NAs when grouping on multiple columns or indexes

    XMLWordPrintableJSON

    Details

      Description

      df.groupby(['foo', 'bar'], dropna=False).sum()
      

      This will still drop NAs in the output.

      This is due to pandas bug 36470 "BUG: groupby(..., dropna=False) excludes NA values when grouping on MultiIndex levels".

      We implement groupby by moving all grouped data into the index and requiring Index() partitioning, so we will always run into this issue, even when the user is grouping on columns, not indexes.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                bhulette Brian Hulette
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 10m
                  2h 10m