Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13034

[Python][Docs] Update outdated examples for hdfs/azure on the Parquet doc page

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 5.0.0
    • Python

    Description

      From https://github.com/apache/arrow/issues/10492

      • The chapter "Writing to Partitioned Datasets" still presents a "solution" with "hdfs.connect" but since it's mentioned as deprecated no more a good idea to mention it.
      • The chapter "Reading a Parquet File from Azure Blob storage" is based on the package "azure.storage.blob" ... but an old one and the actual "azure-sdk-for-python" doesn't have any-more methods like get_blob_to_stream(). Possible to update this part with new blob storage possibilities, and also another mentioning the same concept with Delta Lake (similar principle but since there are differences ...)

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h