Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12367

SeriesGroupBy corr and cov do not raise the expected error at pipeline construction time

Details

    Description

      SeriesGroupBy.corr should raise an error at construction time because it needs multiple Series:

      In [4]: df.groupby('A').B.corr()
      ---------------------------------------------------------------------------
      TypeError                                 Traceback (most recent call last)
      <ipython-input-4-d760b6077290> in <module>
      ----> 1 df.groupby('A').B.corr()
      
      ~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in wrapper(*args, **kwargs)
          815                 return self.apply(curried)
          816 
      --> 817             return self._python_apply_general(curried, self._obj_with_exclusions)
          818 
          819         wrapper.__name__ = name
      
      ~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in _python_apply_general(self, f, data)
          926             data after applying f
          927         """
      --> 928         keys, values, mutated = self.grouper.apply(f, data, self.axis)
          929 
          930         return self._wrap_applied_output(
      
      ~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/ops.py in apply(self, f, data, axis)
          236             # group might be modified
          237             group_axes = group.axes
      --> 238             res = f(group)
          239             if not _is_indexed_like(res, group_axes, axis):
          240                 mutated = True
      
      ~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in curried(x)
          804 
          805             def curried(x):
      --> 806                 return f(x, *args, **kwargs)
          807 
          808             # preserve the name so we can detect it when calling plot methods,
      
      TypeError: corr() missing 1 required positional argument: 'other'
      

      But this isn't raised when called on an empty dataset (perhaps an upstream bug), so we don't raise it during proxy generation. It will not fail until the pipeline is running.

      Attachments

        Activity

          People

            Unassigned Unassigned
            bhulette Brian Hulette
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: