[SPARK-27810] PySpark breaks Cloudpickle serialization of collections.namedtuple objects - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.4.3
Fix Version/s: None
Component/s: PySpark
Labels:
None

Description

After importing pyspark, cloudpickle is no longer able to properly serialize objects inheriting from collections.namedtuple, and drops all other class data such that calls to isinstance will fail.

Here's a minimal reproduction of the issue:

import collections
import cloudpickle
import pyspark

class A(object):
pass

class B(object):
pass

class C(A, B, collections.namedtuple('C', ['field'])):
pass

c = C(1)

def print_bases(obj):
bases = obj._class.bases_
for base in bases:
print(base)

print('original objects:')
print_bases(c)

print('\ncloudpickled objects:')
c2 = cloudpickle.loads(cloudpickle.dumps(c))
print_bases(c2)

This prints:

original objects:
<class '_main_.A'>
<class '_main_.B'>
<class 'collections.C'>

cloudpickled objects:
<class 'tuple'>

Effectively dropping all other types. It appears this issue is being caused by the _hijack_namedtuple function, which replaces the namedtuple class with another one.

Note that I can workaround this issue by setting collections.namedtuple.__hijack = 1 before importing pyspark, so I feel pretty confident this is what's causing the issue.

This issue comes up when working with TensorFlow feature columns, which derive from collections.namedtuple among other classes.

Cloudpickle also supports collections.namedtuple serialization, but doesn't appear to need to replace the class. Possibly PySpark can do something similar?

Attachments

Issue Links

duplicates

SPARK-32079 PySpark <> Beam pickling issues for collections.namedtuple

Resolved

relates to

SPARK-22674 PySpark breaks serialization of namedtuple subclasses

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Travis Addair

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 23/May/19 01:13

Updated:: 12/Dec/22 18:10

Resolved:: 23/Nov/21 03:23