Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3625

Dynamic Format Detection in DFS backend for unmapped file extensions / files without extensions

    XMLWordPrintableJSON

Details

    Description

      When querying a json file that doesn't have a ".json" extension such as ".log" I get this exception:

      0: jdbc:drill:zk=local> select * from dfs.down.`auditOut.log` limit 1;
      Aug 11, 2015 4:01:38 PM org.apache.calcite.sql.validate.SqlValidatorException <init>
      SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 'dfs.down.auditOut.log' not found
      Aug 11, 2015 4:01:38 PM org.apache.calcite.runtime.CalciteException <init>
      SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line 1, column 17: Table 'dfs.down.auditOut.log' not found
      Error: PARSE ERROR: From line 1, column 15 to line 1, column 17: Table 'dfs.down.auditOut.log' not found
      
      [Error Id: 5610210b-3eb2-497f-9443-c725b29733b6 on <host>:31010] (state=,code=0)
      

      However when renaming the file to have a .json extension then the query succeeds.

      Now while I could reconfigure the DFS plugin to associate all files with *.log extension to be mapped to json, this doesn't seem like the right thing to do. I could rename the file to have a .json extension of course which is the better thing to do but this highlights another question, why doesn't this just work as-is?

      Hence I'd like to raise this as a feature request that when an unmapped extension or file without any extension is encountered Drill should do a few quick checks on the file type and then use the appropriate storage backend for the file.

      Adding this "Dynamic Format Detection" as I have dubbed it would tie in nicely with Drill's style and existing features like the dynamic schema detection already used for json.

      This may also come in handy for dealing with outputs from MapReduce jobs where the files may be named part-m-NNNNN or part-r-NNNNN without any extension and for example if those files were text then the text storage backend could be immediately invoked upon them in Drill.

      Attachments

        Activity

          People

            Unassigned Unassigned
            harisekhon Hari Sekhon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: