Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18123

[Python] Cannot use multi-byte characters in file names in write_table

    XMLWordPrintableJSON

Details

    Description

      Error when specifying a file path containing multi-byte characters in pyarrow.parquet.write_table.

      For example, use 例.parquet as the file path.

      Python 3.10.7 (main, Oct  5 2022, 14:33:54) [GCC 10.2.1 20210110] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import pandas as pd
      >>> import numpy as np
      >>> import pyarrow as pa
      >>> df = pd.DataFrame({'one': [-1, np.nan, 2.5],
      ...                    'two': ['foo', 'bar', 'baz'],
      ...                    'three': [True, False, True]},
      ...                    index=list('abc'))
      >>> table = pa.Table.from_pandas(df)
      >>> import pyarrow.parquet as pq
      >>> pq.write_table(table, '例.parquet')
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File
      "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py",
      line 2920, in write_table
          with ParquetWriter(
        File
      "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py",
      line 911, in __init__
          filesystem, path = _resolve_filesystem_and_path(
        File "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/fs.py", line
      184, in _resolve_filesystem_and_path
          filesystem, path = FileSystem.from_uri(path)
        File "pyarrow/_fs.pyx", line 463, in pyarrow._fs.FileSystem.from_uri
        File "pyarrow/error.pxi", line 144, in
      pyarrow.lib.pyarrow_internal_check_status
        File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
      pyarrow.lib.ArrowInvalid: Cannot parse URI: '例.parquet'
      

      Attachments

        Issue Links

          Activity

            People

              milesgranger Miles Granger
              eitsupi SHIMA Tatsuya
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 10m
                  2h 10m