Uploaded image for project: 'Hadoop Distributed Data Store'
  1. Hadoop Distributed Data Store
  2. HDDS-2443

Python client/interface for Ozone

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Ozone Client
    • Labels:
      None

      Description

      This Jira will be used to track development for python client/interface of Ozone.

      Original ideas: item#25 in https://cwiki.apache.org/confluence/display/HADOOP/Ozone+project+ideas+for+new+contributors

      Ozone Client(Python) for Data Science Notebook such as Jupyter.

      1. Size: Large
      2. PyArrow: https://pypi.org/project/pyarrow/
      3. Python -> libhdfs HDFS JNI library (HDFS, S3,...) -> Java client API Impala uses  libhdfs

      Path to try:

      1. s3 interface: Ozone s3 gateway(already supported) + AWS python client (boto3)
      2. python native RPC
      3. pyarrow + libhdfs, which use the Java client under the hood.
      4. python + C interface of go / rust ozone library. I created POC go / rust clients earlier which can be improved if the libhdfs interface is not good enough. [By Marton Elek]

        Attachments

        1. pyarrow_ozone_test.docx
          16 kB
          mingchao zhao
        2. pyarrow_ozone_test.docx
          18 kB
          mingchao zhao
        3. Ozone with pyarrow.html
          33 kB
          YiSheng Lien
        4. OzoneS3.py
          2 kB
          Li Cheng

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              timmylicheng Li Cheng
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated: