Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-6356

Python FileBasedCacheManager does not respect PCoder for PCollection being cached

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: examples-python
    • Labels:
      None

      Description

      FileBasedCacheManager used by Python's InteractiveRunner does not preserve PCoder for elements of a PCollection being cached on disk. I suggest that the cache on-disk format to be changed to TFRecords (which are supported by Beam) and FileBasedCacheManager would store the desired PCoder for cached collections.
      Currently, it is not possible to work with dynamically-generated protocol buffer messages in interactive runner mode because of pickling errors.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              leontyev_google Hennadiy Leontyev
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 168h
                168h
                Remaining:
                Time Spent - 4h 10m Remaining Estimate - 163h 50m
                163h 50m
                Logged:
                Time Spent - 4h 10m Remaining Estimate - 163h 50m
                4h 10m