Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5825

[Python] Exceptions swallowed in ParquetManifest._visit_directories

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Python
    • Labels:

      Description

      ParquetManifest._visit_directories uses a ThreadPoolExecutor to visit partitioned parquet datasets concurrently, it waits for them to finish but doesn't check if the respective futures have failed or not. This is quite tricky to detect and debug as an exception is either raised later as a a side-effect or (perhaps worse) it passes silently.

      Observed on 0.12.1 but appears to be on latest master too.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gsakkis George Sakkis
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: