Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
The nightly "kartothek" integration builds are failing.
More specifically, the test_update_dataset_from_ddf_empty is failing with:
=================================== FAILURES =================================== ___________________ test_update_dataset_from_ddf_empty[True] ___________________ store_factory = functools.partial(<function get_store_from_url at 0x7f1434733050>, 'hfs:///tmp/pytest-of-root/pytest-0/test_update_dataset_from_ddf_e0/store') shuffle = True @pytest.mark.parametrize("shuffle", [True, False]) def test_update_dataset_from_ddf_empty(store_factory, shuffle): with pytest.raises(ValueError, match="Cannot store empty datasets"): update_dataset_from_ddf( > dask.dataframe.from_delayed([], meta=(("a", int),)), store_factory, dataset_uuid="output_dataset_uuid", table="core", shuffle=shuffle, partition_on=["a"], ).compute() tests/io/dask/dataframe/test_update.py:57: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ dfs = [], meta = (('a', <class 'int'>),), divisions = None prefix = 'from-delayed', verify_meta = True @insert_meta_param_description def from_delayed( dfs, meta=None, divisions=None, prefix="from-delayed", verify_meta=True ): """Create Dask DataFrame from many Dask Delayed objects Parameters ---------- dfs : list of Delayed An iterable of ``dask.delayed.Delayed`` objects, such as come from ``dask.delayed`` These comprise the individual partitions of the resulting dataframe. $META divisions : tuple, str, optional Partition boundaries along the index. For tuple, see https://docs.dask.org/en/latest/dataframe-design.html#partitions For string 'sorted' will compute the delayed values to find index values. Assumes that the indexes are mutually sorted. If None, then won't use index information prefix : str, optional Prefix to prepend to the keys. verify_meta : bool, optional If True check that the partitions have consistent metadata, defaults to True. """ from dask.delayed import Delayed if isinstance(dfs, Delayed): dfs = [dfs] dfs = [ delayed(df) if not isinstance(df, Delayed) and hasattr(df, "key") else df for df in dfs ] for df in dfs: if not isinstance(df, Delayed): raise TypeError("Expected Delayed object, got %s" % type(df).__name__) > parent_meta = delayed(make_meta)(dfs[0]).compute() E IndexError: list index out of range /opt/conda/envs/arrow/lib/python3.7/site-packages/dask/dataframe/io/io.py:591: IndexError
(from https://github.com/ursacomputing/crossbow/runs/2756067090)
Not directly sure if this is a kartothek issue or a pyarrow issue. But also created an issue on their side: https://github.com/JDASoftwareGroup/kartothek/issues/475
Attachments
Issue Links
- duplicates
-
ARROW-12977 [CI] [Python] Error: Cannot store empty datasets
- Closed
- links to