Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1445

[Python] Segfault when using libhdfs3 in pyarrow using latest API

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.6.0
    • Fix Version/s: None
    • Component/s: Python
    • Labels:
      None

      Description

      I'm encoutering a segfault when using libhdfs3 with pyarrow.

      My script is:

      import pyarrow
      
      def main():
          hdfs = pyarrow.hdfs.connect("<host>", <port>, "<username>", driver='libhdfs')
          print hdfs.ls('<my path>')
          hdfs3a = pyarrow.HdfsClient("<host>", <port>, "<username>", driver='libhdfs3')
          print hdfs3a.ls('<my path>')
          hdfs3b = pyarrow.hdfs.connect("<host>", <port>, "<username>", driver='libhdfs3')
          print hdfs3b.ls('<my path>')
      
      main()
      

      The first two hdfs connections yield the correct list. The third yields:

      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      #  SIGSEGV (0xb) at pc=0x00007f69c0c8b57f, pid=88070, tid=140092200666880
      #
      # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
      # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed oops)
      # Problematic frame:
      # C  [libc.so.6+0x13357f]  __strlen_sse42+0xf
      

      It dumps an error report file too.

      I created my conda environment with:

      conda create -n parquet
      source activate parquet
      conda install pyarrow libhdfs3 -c conda-forge
      

      The packages used are:

      arrow-cpp                 0.6.0               np113py27_1    conda-forge
      boost-cpp                 1.64.0                        1    conda-forge
      bzip2                     1.0.6                         1    conda-forge
      ca-certificates           2017.7.27.1                   0    conda-forge
      certifi                   2017.7.27.1              py27_0    conda-forge
      curl                      7.54.1                        0    conda-forge
      icu                       58.1                          1    conda-forge
      krb5                      1.14.2                        0    conda-forge
      libgcrypt                 1.8.0                         0    conda-forge
      libgpg-error              1.27                          0    conda-forge
      libgsasl                  1.8.0                         1    conda-forge
      libhdfs3                  2.3                           0    conda-forge
      libiconv                  1.14                          4    conda-forge
      libntlm                   1.4                           0    conda-forge
      libssh2                   1.8.0                         1    conda-forge
      libuuid                   1.0.3                         1    conda-forge
      libxml2                   2.9.4                         4    conda-forge
      mkl                       2017.0.3                      0  
      ncurses                   5.9                          10    conda-forge
      numpy                     1.13.1                   py27_0  
      openssl                   1.0.2l                        0    conda-forge
      pandas                    0.20.3                   py27_1    conda-forge
      parquet-cpp               1.3.0.pre                     1    conda-forge
      pip                       9.0.1                    py27_0    conda-forge
      protobuf                  3.3.2                    py27_0    conda-forge
      pyarrow                   0.6.0               np113py27_1    conda-forge
      python                    2.7.13                        1    conda-forge
      python-dateutil           2.6.1                    py27_0    conda-forge
      pytz                      2017.2                   py27_0    conda-forge
      readline                  6.2                           0    conda-forge
      setuptools                36.2.2                   py27_0    conda-forge
      six                       1.10.0                   py27_1    conda-forge
      sqlite                    3.13.0                        1    conda-forge
      tk                        8.5.19                        2    conda-forge
      wheel                     0.29.0                   py27_0    conda-forge
      xz                        5.2.3                         0    conda-forge
      zlib                      1.2.11                        0    conda-forge
      

      I've set my ARROW_LIBHDFS_DIR to point at the location of the libhdfs3.so file.
      I've populated my CLASSPATH as per the documentation.

      Please advise.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jporritt James Porritt
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: