Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2424

Ignore hidden files in directory path

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      When streaming data to the DFS some records can be incomplete during the temporary write phase for the last file(s). These file typically have a different extension like '.tmp' or can be marked hidden with a prefix of '.' .

      Querying the directory path will Drill will then cause a query error as some records may not be complete in the temporary files. Having the ability to have Drill ignore hidden files and/or to only read files of designated extension in the workspace will resolve this problem.

      Example is using Flume to stream JSON files to a directory structure, the HDFS sink creates .tmp files (can be hidden with . prefix) that contains incomplete JSON objects till the file is closed and the .tmp extension (or prefix) is removed. Attempting to query the directory structure with Drill then results in errors due to the incomplete JSON object(s) in the tmp files.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cchang@maprtech.com Chun Chang
            aengelbrecht Andries Engelbrecht
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment