Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12531

ib.show does not handle deferred dataframe instances

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P2
    • Resolution: Fixed
    • Affects Version/s: 2.31.0
    • Fix Version/s: 2.32.0
    • Component/s: dsl-dataframe
    • Labels:
      None

      Description

      When passed a deferred dataframe instance (e.g. ib.show(counts.nlargest(20, keep='all'))), ib.show calls len() and ends up raising a WontImplementError:

      ---------------------------------------------------------------------------
      WontImplementError                        Traceback (most recent call last)
      <ipython-input-9-56c2dd81898d> in <module>
      ----> 1 ib.show(counts.nlargest(20, keep='all'))
      
      2 frames
      /usr/local/lib/python3.7/dist-packages/apache_beam/runners/interactive/utils.py in run_within_progress_indicator(*args, **kwargs)
          245   def run_within_progress_indicator(*args, **kwargs):
          246     with ProgressIndicator('Processing...', 'Done.'):
      --> 247       return func(*args, **kwargs)
          248 
          249   return run_within_progress_indicator
      
      /usr/local/lib/python3.7/dist-packages/apache_beam/runners/interactive/interactive_beam.py in show(include_window_info, visualize_data, n, duration, *pcolls)
          441     else:
          442       try:
      --> 443         flatten_pcolls.extend(iter(pcoll_container))
          444       except TypeError:
          445         raise ValueError(
      
      /usr/local/lib/python3.7/dist-packages/apache_beam/dataframe/frames.py in __len__(self)
          695         "len(df) is not currently supported because it produces a non-deferred "
          696         "result. Consider using df.length() instead.",
      --> 697         reason="non-deferred-result")
          698 
          699   @property  # type: ignore
      
      WontImplementError: len(df) is not currently supported because it produces a non-deferred result. Consider using df.length() instead.
      For more information see https://s.apache.org/dataframe-non-deferred-result.
      

      We should support this case, or at least fail gracefully.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rohdesam Sam Rohde
                Reporter:
                bhulette Brian Hulette
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 6h
                  6h