Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-14318

[Doc][Python] Error building dataset docs

Details

    Description

      I get this error locally, even after removing what seems like leftovers from previous doc builds:

      >>>-------------------------------------------------------------------------
      Exception in /home/antoine/arrow/dev/docs/source/python/dataset.rst at block ending on line 522
      Specify :okexcept: as an option in the ipython:: block to suppress this message
      ---------------------------------------------------------------------------
      ArrowInvalid                              Traceback (most recent call last)
      <ipython-input-58-affbef2c47b2> in <module>
      ----> 1 ds.write_dataset(table, dataset_root, format="parquet")
      
      ~/arrow/dev/python/pyarrow/dataset.py in write_dataset(data, base_dir, basename_template, format, partitioning, partitioning_flavor, schema, filesystem, file_options, use_threads, max_partitions, file_visitor)
          859         scanner = data
          860 
      --> 861     _filesystemdataset_write(
          862         scanner, base_dir, basename_template, filesystem, partitioning,
          863         file_options, max_partitions, file_visitor
      
      ~/arrow/dev/python/pyarrow/_dataset.pyx in pyarrow._dataset._filesystemdataset_write()
      
      ~/arrow/dev/python/pyarrow/error.pxi in pyarrow.lib.check_status()
      
      ArrowInvalid: Could not write to /tmp/sample_dataset as the directory is not empty and existing_data_behavior is to error
      /home/antoine/arrow/dev/cpp/src/arrow/dataset/dataset_writer.cc:508  EnsureDestinationValid(write_options)
      /home/antoine/arrow/dev/cpp/src/arrow/dataset/file_base.cc:424  internal::DatasetWriter::Make(write_options)
      /home/antoine/arrow/dev/cpp/src/arrow/compute/exec/exec_plan.cc:433  MakeExecNode(this->factory_name, plan, std::move(inputs), *this->options, registry)
      /home/antoine/arrow/dev/cpp/src/arrow/dataset/file_base.cc:395  compute::Declaration::Sequence( { {"scan", ScanNodeOptions{dataset, scanner->options()}}, {"filter", compute::FilterNodeOptions{scanner->options()->filter}}, {"project", compute::ProjectNodeOptions{std::move(exprs), std::move(names)}}, {"write", WriteNodeOptions{write_options, scanner->options()->projected_schema}}, }) .AddToPlan(plan.get())
      
      <<<-------------------------------------------------------------------------
      

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              apitrou Antoine Pitrou
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h 20m
                  5h 20m

                  Slack

                    Issue deployment