Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17217

[Docs][Python] Building documentation requires pandas

    XMLWordPrintableJSON

Details

    Description

      The build instructions for documentation guide users to apply conda_env_sphinx.txt in order to build the documentation, but this file does not include pandas, which triggers the following build error:

      (test-nightlies) todd@pop-os:~/arrow/docs$ make html
      sphinx-build -b html -d _build/doctrees  -j8 source _build/html
      Running Sphinx v5.1.0
      [autosummary] generating autosummary for: c_glib/index.rst, cpp/api.rst, cpp/api/array.rst, cpp/api/async.rst, cpp/api/builder.rst, cpp/api/c_abi.rst, cpp/api/compute.rst, cpp/api/cuda.rst, cpp/api/dataset.rst, cpp/api/datatype.rst, ..., python/json.rst, python/memory.rst, python/numpy.rst, python/orc.rst, python/pandas.rst, python/parquet.rst, python/plasma.rst, python/timestamps.rst, r/index.rst, status.rst
      loading intersphinx inventory from https://docs.python.org/3/objects.inv...
      loading intersphinx inventory from https://numpy.org/doc/stable/objects.inv...
      loading intersphinx inventory from https://pandas.pydata.org/docs/objects.inv...
      building [mo]: targets for 0 po files that are out of date
      building [html]: targets for 842 source files that are out of date
      updating environment: [new config] 842 added, 0 changed, 0 removed
      reading sources... [  7%] cpp/examples/compute_and_write_example .. developers/cpp/fuzzing
      Sphinx parallel build error:
      RuntimeError: Non Expected exception in `/home/todd/arrow/docs/source/python/pandas.rst` line 38
      make: *** [Makefile:81: html] Error 2

      Adding pandas to conda_env_sphinx.txt and re-installing packages from that file result in successful builds:

      (test-nightlies) todd@pop-os:~/arrow/docs$ conda install -c conda-forge --file ../ci/conda_env_sphinx.txt 
      Collecting package metadata (current_repodata.json): done
      Solving environment: done## Package Plan ##  environment location: /home/todd/miniconda3/envs/test-nightlies  added / updated specs:
          - breathe
          - doxygen
          - ipython
          - numpydoc
          - pandas
          - pydata-sphinx-theme==0.8
          - pytest-cython
          - sphinx-copybutton
          - sphinx-design
          - sphinx[version='>=4.2']
      The following NEW packages will be INSTALLED:  pandas             conda-forge/linux-64::pandas-1.4.3-py39h1832856_0
        python-dateutil    conda-forge/noarch::python-dateutil-2.8.2-pyhd8ed1ab_0
      Proceed ([y]/n)? yPreparing transaction: done
      Verifying transaction: done
      Executing transaction: done
      (test-nightlies) todd@pop-os:~/arrow/docs$ make html
      sphinx-build -b html -d _build/doctrees  -j8 source _build/html
      Running Sphinx v5.1.0
      [autosummary] generating autosummary for: c_glib/index.rst, cpp/api.rst, cpp/api/array.rst, cpp/api/async.rst, cpp/api/builder.rst, cpp/api/c_abi.rst, cpp/api/compute.rst, cpp/api/cuda.rst, cpp/api/dataset.rst, cpp/api/datatype.rst, ..., python/json.rst, python/memory.rst, python/numpy.rst, python/orc.rst, python/pandas.rst, python/parquet.rst, python/plasma.rst, python/timestamps.rst, r/index.rst, status.rst
      loading intersphinx inventory from https://docs.python.org/3/objects.inv...
      loading intersphinx inventory from https://numpy.org/doc/stable/objects.inv...
      loading intersphinx inventory from https://pandas.pydata.org/docs/objects.inv...
      building [mo]: targets for 0 po files that are out of date
      building [html]: targets for 842 source files that are out of date
      updating environment: [new config] 842 added, 0 changed, 0 removed
      reading sources... [  7%] cpp/examples/compute_and_write_example .. developersreading sources... [ 18%] python/api/flight .. python/generated/pyarrow.Date32reading sources... [ 22%] python/generated/pyarrow.Date32Scalar .. python/genereading sources... [ 25%] python/generated/pyarrow.HadoopFileSystem.get_space_reading sources... [ 29%] python/generated/pyarrow.MonthDayNanoIntervalArray .reading sources... [ 33%] python/generated/pyarrow.TimestampType .. python/genreading sources... [ 37%] python/generated/pyarrow.compute.MatchSubstringOptioreading sources... [ 40%] python/generated/pyarrow.compute.all .. python/generreading sources... [ 44%] python/generated/pyarrow.compute.ascii_trim_whitespareading sources... [ 48%] python/generated/pyarrow.compute.day_of_week .. pythreading sources... [ 51%] python/generated/pyarrow.compute.is_null .. python/greading sources... [ 55%] python/generated/pyarrow.compute.milliseconds_betweereading sources... [ 59%] python/generated/pyarrow.compute.second .. python/gereading sources... [ 62%] python/generated/pyarrow.compute.us_week .. python/greading sources... [ 66%] python/generated/pyarrow.compute.variance .. python/reading sources... [ 70%] python/generated/pyarrow.dataset.CsvFileFormat .. pyreading sources... [ 74%] python/generated/pyarrow.date64 .. python/generated/reading sources... [ 77%] python/generated/pyarrow.flight.FlightServerError ..reading sources... [ 81%] python/generated/pyarrow.fs.LocalFileSystem .. pythoreading sources... [ 85%] python/generated/pyarrow.ipc.open_stream .. python/greading sources... [ 88%] python/generated/pyarrow.parquet.ParquetLogicalType reading sources... [ 92%] python/generated/pyarrow.struct .. python/generated/reading sources... [ 96%] python/generated/pyarrow.types.is_signed_integer .. reading sources... [100%] python/json .. status                               
      /home/todd/miniconda3/envs/test-nightlies/lib/python3.9/site-packages/pyarrow/parquet/__init__.py:docstring of pyarrow.parquet.write_to_dataset:94: WARNING: Literal block ends without a blank line; unexpected unindent.
      WARNING: don't know which module to import for autodocumenting 'BufferReader' (try placing a "module" or "currentmodule" directive in the document, or giving an explicit module name)
      WARNING: don't know which module to import for autodocumenting 'BufferWriter' (try placing a "module" or "currentmodule" directive in the document, or giving an explicit module name)
      WARNING: don't know which module to import for autodocumenting 'Context' (try placing a "module" or "currentmodule" directive in the document, or giving an explicit module name)
      WARNING: don't know which module to import for autodocumenting 'CudaBuffer' (try placing a "module" or "currentmodule" directive in the document, or giving an explicit module name)
      WARNING: don't know which module to import for autodocumenting 'HostBuffer' (try placing a "module" or "currentmodule" directive in the document, or giving an explicit module name)
      WARNING: don't know which module to import for autodocumenting 'IpcMemHandle' (try placing a "module" or "currentmodule" directive in the document, or giving an explicit module name)
      WARNING: don't know which module to import for autodocumenting 'new_host_buffer' (try placing a "module" or "currentmodule" directive in the document, or giving an explicit module name)
      WARNING: don't know which module to import for autodocumenting 'read_message' (try placing a "module" or "currentmodule" directive in the document, or giving an explicit module name)
      WARNING: don't know which module to import for autodocumenting 'read_record_batch' (try placing a "module" or "currentmodule" directive in the document, or giving an explicit module name)
      WARNING: don't know which module to import for autodocumenting 'serialize_record_batch' (try placing a "module" or "currentmodule" directive in the document, or giving an explicit module name)
      /home/todd/arrow/docs/source/cpp/api/dataset.rst:62: WARNING: Parsing of expression failed. Using fallback parser. Error was:
      Error in postfix expression, expected primary expression or type.
      If primary expression:
        Invalid C++ declaration: Expected identifier in nested name. [error at 59]
          std::function< Status(FileWriter *)> writer_pre_finish   = [](FileWriter*) {returnStatus::OK();}
          -----------------------------------------------------------^
      If type:
        Invalid C++ declaration: Expected identifier in nested name. [error at 59]
          std::function< Status(FileWriter *)> writer_pre_finish   = [](FileWriter*) {returnStatus::OK();}
          -----------------------------------------------------------^/home/todd/arrow/docs/source/cpp/api/dataset.rst:62: WARNING: Parsing of expression failed. Using fallback parser. Error was:
      Error in postfix expression, expected primary expression or type.
      If primary expression:
        Invalid C++ declaration: Expected identifier in nested name. [error at 60]
          std::function< Status(FileWriter *)> writer_post_finish   = [](FileWriter*) {returnStatus::OK();}
          ------------------------------------------------------------^
      If type:
        Invalid C++ declaration: Expected identifier in nested name. [error at 60]
          std::function< Status(FileWriter *)> writer_post_finish   = [](FileWriter*) {returnStatus::OK();}
          ------------------------------------------------------------^/home/todd/arrow/docs/source/cpp/api/dataset.rst:69: WARNING: Duplicate C++ declaration, also defined at cpp/api/dataset:69.
      Declaration is '.. cpp:function:: virtual Result< std::shared_ptr< FileFragment > > MakeFragment (FileSource source, compute::Expression partition_expression, std::shared_ptr< Schema > physical_schema)'.
      /home/todd/arrow/docs/source/cpp/api/flight.rst:159: WARNING: Duplicate C++ declaration, also defined at cpp/api/flight:159.
      Declaration is '.. cpp:function:: virtual arrow::Result< FlightPayload > GetSchemaPayload ()=0'.
      /home/todd/arrow/docs/source/cpp/api/flightsql.rst:48: WARNING: doxygenfunction: Unable to resolve function "arrow::flight::sql::CreateStatementQueryTicket" with arguments "None".
      Candidate function could not be parsed. Parsing error is
      Error when parsing function declaration.
      If the function has no return type:
        Error in declarator or parameters-and-qualifiers
        Invalid C++ declaration: Expecting "(" in parameters-and-qualifiers. [error at 24]
          ARROW_FLIGHT_SQL_EXPORT arrow::Result< std::string > CreateStatementQueryTicket (const std::string &statement_handle)
          ------------------------^
      If the function has a return type:
        Error in declarator or parameters-and-qualifiers
        If pointer to member declarator:
          Invalid C++ declaration: Expected '::' in pointer to member (function). [error at 53]
            ARROW_FLIGHT_SQL_EXPORT arrow::Result< std::string > CreateStatementQueryTicket (const std::string &statement_handle)
            -----------------------------------------------------^
        If declarator-id:
          Invalid C++ declaration: Expecting "(" in parameters-and-qualifiers. [error at 53]
            ARROW_FLIGHT_SQL_EXPORT arrow::Result< std::string > CreateStatementQueryTicket (const std::string &statement_handle)
            -----------------------------------------------------^
      looking for now-outdated files... none found
      pickling environment... done
      checking consistency... done
      preparing documents... done
      writing output... [  8%] cpp/examples/row_columnar_conversion .. developers/guwriting output... [ 24%] python/generated/pyarrow.DurationScalar .. python/genwriting output... [ 28%] python/generated/pyarrow.Int64Array .. python/generatwriting output... [ 32%] python/generated/pyarrow.SerializedPyObject .. pythonwriting output... [ 36%] python/generated/pyarrow.compress .. python/generatedwriting output... [ 40%] python/generated/pyarrow.compute.StrptimeOptions .. pwriting output... [ 44%] python/generated/pyarrow.compute.ascii_lpad .. pythonwriting output... [ 48%] python/generated/pyarrow.compute.cos .. python/generawriting output... [ 52%] python/generated/pyarrow.compute.indices_nonzero .. pwriting output... [ 56%] python/generated/pyarrow.compute.max_element_wise .. writing output... [ 60%] python/generated/pyarrow.compute.round .. python/genewriting output... [ 64%] python/generated/pyarrow.compute.unique .. python/genwriting output... [ 68%] python/generated/pyarrow.compute.week .. python/generwriting output... [ 72%] python/generated/pyarrow.dataset.DirectoryPartitioninwriting output... [ 76%] python/generated/pyarrow.deserialize_components .. pywriting output... [ 80%] python/generated/pyarrow.flight.FlightWriteSizeExceedwriting output... [ 84%] python/generated/pyarrow.get_include .. python/generawriting output... [ 88%] python/generated/pyarrow.large_list .. python/generatwriting output... [ 92%] python/generated/pyarrow.parquet.read_table .. pythonwriting output... [ 96%] python/generated/pyarrow.types.is_float16 .. python/gwriting output... [100%] python/generated/pyarrow.uint64 .. status            
      generating indices... genindex done
      highlighting module code... [100%] pyarrow.types                              
      writing additional pages... search done
      copying images... [ 47%] developers/images/python_tutorial_jira_description.jpcopying images... [ 57%] developers/images/python_tutorial_github_find_in_filecopying images... [ 60%] developers/images/python_tutorial_github_pr_notice.jpcopying images... [ 97%] format/FlightSql/CommandPreparedStatementQuery.mmd.svcopying images... [100%] python/py_arch_overview.svg                          
      copying downloadable files... [100%] ../../python/examples/parquet_encryption/sample_vault_kms_client.py
      copying static files... done
      copying extra files... done
      dumping search index in English (code: en)... done
      dumping object inventory... done
      build succeeded, 16 warnings.The HTML pages are in _build/html.Build finished. The HTML pages are in _build/html.
      (test-nightlies) todd@pop-os:~/arrow/docs$ 
       

      Note also that docs/requirements.txt also does not include pandas. While I haven't tested the pip dependency path, I presume it is similarly impacted and should be updated at the same time.

      Attachments

        Issue Links

          Activity

            People

              toddfarmer Todd Farmer
              toddfarmer Todd Farmer
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m