Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6609

Investigate Creation of FileSystem Configuration for Hive Parquet Files: FileNotFoundException when reading a parquet file

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Currently when reading a parquet file in Hive we try to speed things up by doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When retrieving the FileSystem Configuration to use in HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined for the HiveStoragePlugin. This could cause a misconfiguration in the HiveStoragePlugin to influence the configuration of our FileSystem.

      Currently it is unclear if this was desired behavior or not. If it is desired we need to document why it was done. If it is not desired we need to fix the issue.

      This may be the root cause of the issue discovered by chun

      To reproduce the issue: 1) two or more nodes cluster; 2) enable impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 6) ctas from a large enough hive table as source to recreate the table/file; 7) query the table from node A should work; 8) query from node B as same user should reproduce the issue.

        Attachments

          Activity

            People

            • Assignee:
              timothyfarkas Timothy Farkas
              Reporter:
              timothyfarkas Timothy Farkas
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: