Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47202

AttributeError: module 'pandas' has no attribute 'Timstamp'

    XMLWordPrintableJSON

Details

    Description

      When using the pyspark.sql.types.TimestampType, if your value is a datetime.datetime object with a tzinfo, this typo breaks things.

       

      I believe this commit introduced the bug 9 months ago

       

      Full stack trace below:

       

      File "/databricks/spark/python/pyspark/worker.py", line 1490, in main process() File "/databricks/spark/python/pyspark/worker.py", line 1482, in process serializer.dump_stream(out_iter, outfile) File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 531, in dump_stream return ArrowStreamSerializer.dump_stream( File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 107, in dump_stream for batch in iterator: File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 525, in init_stream_yield_batches batch = self._create_batch(series) File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 511, in _create_batch arrs.append(self._create_array(s, t, arrow_cast=self._arrow_cast)) File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 284, in _create_array series = conv(series) File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1060, in <lambda> return lambda pser: pser.apply( # type: ignore[return-value] File "/databricks/python/lib/python3.10/site-packages/pandas/core/series.py", line 4771, in apply return SeriesApply(self, func, convert_dtype, args, kwargs).apply() File "/databricks/python/lib/python3.10/site-packages/pandas/core/apply.py", line 1123, in apply return self.apply_standard() File "/databricks/python/lib/python3.10/site-packages/pandas/core/apply.py", line 1174, in apply_standard mapped = lib.map_infer( File "pandas/_libs/lib.pyx", line 2924, in pandas._libs.lib.map_infer File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1061, in <lambda> lambda x: conv(x) if x is not None else None # type: ignore[misc] File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 889, in convert_array return [ File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 890, in <listcomp> _element_conv(v) if v is not None else None # type: ignore[misc] File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1010, in convert_struct return { File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1011, in <dictcomp> name: conv(v) if conv is not None and v is not None else v File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1032, in convert_timestamp ts = pd.Timstamp(value) File "/databricks/python/lib/python3.10/site-packages/pandas/__init__.py", line 264, in __getattr__ raise AttributeError(f"module 'pandas' has no attribute '{name}'") AttributeError: module 'pandas' has no attribute 'Timstamp'
       

       

      Attachments

        Activity

          People

            arzavjain Arzav Jain
            arzavjain Arzav Jain
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 0.5h
                0.5h
                Remaining:
                Remaining Estimate - 0.5h
                0.5h
                Logged:
                Time Spent - Not Specified
                Not Specified