Details

      Description

      If you have some json data like this...

          {
            "0.0.1":{
              "version":"0.0.1",
              "date_created":"2014-03-15"
            },
            "0.1.2":{
              "version":"0.1.2",
              "date_created":"2014-05-21"
            }
          }
      

      ... there is no way to select any of the rows since their identifiers contain dots and when trying to select them, Drill throws the following error:

      Error: SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference "0.0.1"; a field reference identifier must not have the form of a qualified name

      This must be fixed since there are many json data files containing dots in some of the keys (e.g. when specifying version numbers etc)

        Issue Links

          Activity

          Hide
          zfong Zelaine Fong added a comment -

          Although the error is slightly different, DRILL-3922 looks like it might be related.

          Show
          zfong Zelaine Fong added a comment - Although the error is slightly different, DRILL-3922 looks like it might be related.
          Hide
          arina Arina Ielchiieva added a comment -

          Some useful notes provided by Paul Rogers:

          1. Escape mechanism for dots in names. (See this for how it is done in JSON. Can we support table[“1.2.3”] syntax in Drill?)

          2. Dots in column names: Identify the issue (Drill attaches meaning to the dots.) Research SQL escape characters. How do other products handle this? Given that these names come from JSON, can we use Javascript syntax (table[“1.2.3”])? How do we ensure current behavior does not break? Experiment to see what works.

          Show
          arina Arina Ielchiieva added a comment - Some useful notes provided by Paul Rogers : 1. Escape mechanism for dots in names. (See this for how it is done in JSON . Can we support table [“1.2.3”] syntax in Drill?) 2. Dots in column names: Identify the issue (Drill attaches meaning to the dots.) Research SQL escape characters. How do other products handle this? Given that these names come from JSON, can we use Javascript syntax (table [“1.2.3”] )? How do we ensure current behavior does not break? Experiment to see what works.
          Hide
          vvysotskyi Volodymyr Vysotskyi added a comment - - edited

          Currently Drill has inconsistent behaviour when querying the file with quotes. Query

          select * from test_table t
          

          fails, but query

          select `rk.q` as `rk.q` from test_table t
          

          returns correct result for the file

          {"rk.q": "a", "m": {"a.b":"1", "a":{"b":"2"}, "c":"3"}}
          

          The difference between these two cases is that for the second case filed reference was created using the method FieldReference.getWithQuotedRef(field.getName()) which does not check the field name. In the first case constructor with check was used.

          Nested field may be selected by few ways:
          t.m.c or t.m['c'].
          Without checking the field name, query

          select t.m.`a.b`, t.m.a.b, t.m['a.b'] from test_table t
          

          returns correct result.
          Mysql, for example, also allows quoted field with dots.

          Preferred solution is to remove the check for field with dots.
          But user may forget to add quotes for the field with dots, so query may return result that does not expected by user.

          Other solution is to add session option that allows to use fields with dots and depending on this option check the field or not. By default the value of this option will be disabled (the same behaviour as now). So user will be responsible for the queries with forgotten quotes.

          Show
          vvysotskyi Volodymyr Vysotskyi added a comment - - edited Currently Drill has inconsistent behaviour when querying the file with quotes. Query select * from test_table t fails, but query select `rk.q` as `rk.q` from test_table t returns correct result for the file {"rk.q": "a", "m": {"a.b":"1", "a":{"b":"2"}, "c":"3"}} The difference between these two cases is that for the second case filed reference was created using the method FieldReference.getWithQuotedRef(field.getName()) which does not check the field name. In the first case constructor with check was used . Nested field may be selected by few ways: t.m.c or t.m['c'] . Without checking the field name, query select t.m.`a.b`, t.m.a.b, t.m['a.b'] from test_table t returns correct result. Mysql, for example, also allows quoted field with dots. Preferred solution is to remove the check for field with dots. But user may forget to add quotes for the field with dots, so query may return result that does not expected by user. Other solution is to add session option that allows to use fields with dots and depending on this option check the field or not. By default the value of this option will be disabled (the same behaviour as now). So user will be responsible for the queries with forgotten quotes.
          Hide
          paul-rogers Paul Rogers added a comment -

          In the example case, what happens for:

          SELECT `rk.q`, `m.a.b` FROM test_table;
          

          To work, is it necessary to do this:

          SELECT `rk.q`, m.`a.b` FROM test_table;
          

          That is, is a dot inside back-ticks considered part of the name, but those outside considered separators?

          I ask because I often do the following, and it works:

          select `name`, `monisid`, `validitydate` from `dfs.data`.`gen.json`
          

          That is, in the table name, dots inside backticks are, in fact, separators. So, do column and table names have different syntax rules?

          Can we spell out the syntax rules for these three cases:

          • Column names in the planner
          • Table names in the planner
          • Column names discovered at runtime

          All that said, we should certainly make SELECT * work as the name expansion is done in the execution engine, not the planner.

          Show
          paul-rogers Paul Rogers added a comment - In the example case, what happens for: SELECT `rk.q`, `m.a.b` FROM test_table; To work, is it necessary to do this: SELECT `rk.q`, m.`a.b` FROM test_table; That is, is a dot inside back-ticks considered part of the name, but those outside considered separators? I ask because I often do the following, and it works: select `name`, `monisid`, `validitydate` from `dfs.data`.`gen.json` That is, in the table name, dots inside backticks are, in fact, separators. So, do column and table names have different syntax rules? Can we spell out the syntax rules for these three cases: Column names in the planner Table names in the planner Column names discovered at runtime All that said, we should certainly make SELECT * work as the name expansion is done in the execution engine, not the planner.
          Hide
          vvysotskyi Volodymyr Vysotskyi added a comment -

          Drill handles field names and schema paths in different ways. For schema paths Drill uses Calcite. The required schema is returned by the method getSchema(). So for the input `dfs.data` needful chema will be returned from the first iteration of the cycle. For the case dfs.data in the first iteration will be assigned schema dfs and in the next iteration schema data.

          Query

          SELECT `rk.q`, `m.a.b` FROM test_table;
          

          returns

          ---------------------------------------------------------------------------------------------------------
          | rk.q<VARCHAR(OPTIONAL)>                           | m.a.b<INT(OPTIONAL)>                              |
          ---------------------------------------------------------------------------------------------------------
          | a                                                 | null                                              |
          ---------------------------------------------------------------------------------------------------------
          

          For this case Drill is looking for the field with name m.a.b.

          Query

          SELECT t.`rk.q`, t.m.`a.b` FROM test_table t;
          

          returns correct result:

          ---------------------------------------------------------------------------------------------------------
          | rk.q<VARCHAR(OPTIONAL)>                           | EXPR$1<VARCHAR(OPTIONAL)>                         |
          ---------------------------------------------------------------------------------------------------------
          | a                                                 | 1                                                 |
          ---------------------------------------------------------------------------------------------------------
          

          So yes, dot inside backticks is considered as a part of the field name and outside is considered as a separator.

          Show
          vvysotskyi Volodymyr Vysotskyi added a comment - Drill handles field names and schema paths in different ways. For schema paths Drill uses Calcite. The required schema is returned by the method getSchema() . So for the input `dfs.data` needful chema will be returned from the first iteration of the cycle. For the case dfs.data in the first iteration will be assigned schema dfs and in the next iteration schema data . Query SELECT `rk.q`, `m.a.b` FROM test_table; returns --------------------------------------------------------------------------------------------------------- | rk.q<VARCHAR(OPTIONAL)> | m.a.b<INT(OPTIONAL)> | --------------------------------------------------------------------------------------------------------- | a | null | --------------------------------------------------------------------------------------------------------- For this case Drill is looking for the field with name m.a.b . Query SELECT t.`rk.q`, t.m.`a.b` FROM test_table t; returns correct result: --------------------------------------------------------------------------------------------------------- | rk.q<VARCHAR(OPTIONAL)> | EXPR$1<VARCHAR(OPTIONAL)> | --------------------------------------------------------------------------------------------------------- | a | 1 | --------------------------------------------------------------------------------------------------------- So yes, dot inside backticks is considered as a part of the field name and outside is considered as a separator.
          Hide
          paul-rogers Paul Rogers added a comment -

          Putting on my user hat, I don't think users of Drill (or even non-planner developers like me) will understand that column names behave differently than table/schema names.

          For example, from the description, I cannot predict what happens here:

          Data: dfs.ds.foo.json, contents: { "a" : { "b.c": 10 } }
          
          SELECT `dfs`.`ds`.`foo.json`.`a`.`b.c` FROM `dfs`.`ds`.`foo.json` (1)
          SELECT `dfs.ds.foo.json.a`.`b.c` FROM `dfs.ds`.`foo.json` (2)
          SELECT `dfs.ds.foo.json.a.b.c` FROM `dfs.ds.foo.json` (3)
          

          Case (1) should work, right? Each component of the path names are enclosed in back-ticks, the dots are unquoted. `b.c` is quoted so is a complete field name. Similarly the file name, `foo.json` is quoted so clearly .json is part of the file name.

          By this logic, case (2) should not work. The quotes enclose parts of a path and so the dots in those names should not be treated as delimiters, but rather as part of the name. That is the SELECT list should be:

          Table: `dfs.ds.foo.a`
          Column: `b.c`
          Schema: `dfs.ds`
          Table: `foo.json`

          Since the two tables do not agree, the query should fail in the planner. Even if it didn't, it should return null because no table column matches `b.c`. And yet your explanation suggests that some part of the quoted name will be considered separate components.

          If so, then Drill is magic, it knows when dots are part of the name (file name) and so (3) should work also. But, it won't for the reasons you state.

          Ideally, we'd enforce form (2): dots inside quotes are part of the name; they are not separators. But, it seems if we do that we might break existing queries. Or, can this actually work?

          Let me throw in two more complications:

          • The directory containing foo.json has a dot: "my.files"
          • The workspace name itself contains a dot: "my.ws"

          Can I do the above? If not, why not? What SQL syntax would I use? Maybe:

          SELECT `my.ws`.`/my.files/foo.json`.`a`.`b.c` FROM `dfs`.`ds`.`foo.json` (4)
          

          Seems we've rather gotten ourselves into a muddle by allowing separator dots in names.

          About here I guess we should ask, what do Hive and Parquet do? They must have solved this issue.

          Show
          paul-rogers Paul Rogers added a comment - Putting on my user hat, I don't think users of Drill (or even non-planner developers like me) will understand that column names behave differently than table/schema names. For example, from the description, I cannot predict what happens here: Data: dfs.ds.foo.json, contents: { "a" : { "b.c" : 10 } } SELECT `dfs`.`ds`.`foo.json`.`a`.`b.c` FROM `dfs`.`ds`.`foo.json` (1) SELECT `dfs.ds.foo.json.a`.`b.c` FROM `dfs.ds`.`foo.json` (2) SELECT `dfs.ds.foo.json.a.b.c` FROM `dfs.ds.foo.json` (3) Case (1) should work, right? Each component of the path names are enclosed in back-ticks, the dots are unquoted. `b.c` is quoted so is a complete field name. Similarly the file name, `foo.json` is quoted so clearly .json is part of the file name. By this logic, case (2) should not work. The quotes enclose parts of a path and so the dots in those names should not be treated as delimiters, but rather as part of the name. That is the SELECT list should be: Table: `dfs.ds.foo.a` Column: `b.c` Schema: `dfs.ds` Table: `foo.json` Since the two tables do not agree, the query should fail in the planner. Even if it didn't, it should return null because no table column matches `b.c` . And yet your explanation suggests that some part of the quoted name will be considered separate components. If so, then Drill is magic, it knows when dots are part of the name (file name) and so (3) should work also. But, it won't for the reasons you state. Ideally, we'd enforce form (2): dots inside quotes are part of the name; they are not separators. But, it seems if we do that we might break existing queries. Or, can this actually work? Let me throw in two more complications: The directory containing foo.json has a dot: "my.files" The workspace name itself contains a dot: "my.ws" Can I do the above? If not, why not? What SQL syntax would I use? Maybe: SELECT `my.ws`.`/my.files/foo.json`.`a`.`b.c` FROM `dfs`.`ds`.`foo.json` (4) Seems we've rather gotten ourselves into a muddle by allowing separator dots in names. About here I guess we should ask, what do Hive and Parquet do? They must have solved this issue.
          Hide
          vvysotskyi Volodymyr Vysotskyi added a comment -

          On the one hand user knows that tmp workspace inside dfs plugin, so user expects that query

          use dfs.tmp;
          

          should work. On the other hand query

          show schemas;
          

          returns the schema name dfs.tmp. So user also expects that the query with such schema name should work.

          use `dfs.tmp`;
          

          Both these cases work since schema name and its path are the same.

          Schemas names with dots currently does not work in Drill. It is due to the handling schema paths in this way

          Queries

          SELECT `dfs`.`ds`.`foo.json`.`a`.`b.c` FROM `dfs`.`ds`.`foo.json` (1)
          SELECT `dfs.ds.foo.json.a`.`b.c` FROM `dfs.ds`.`foo.json` (2)
          SELECT `dfs.ds.foo.json.a.b.c` FROM `dfs.ds.foo.json` (3)
          

          will not work since Drill allows only table names or aliases before the field names.
          Considering only from clause, third case would not work, since Drill assumes that `dfs.ds.foo.json` is the schema name only.
          Queries on the directories with dots and table names with dots also works correctly.

          Hive has an option that allows dots in the columns names (by default dots in the columns is allowed).
          Parquet also allows field names with dots.
          Also current version of Drill can create parquet files with dots in field names, but Drill will fail when querying this file.

          Show
          vvysotskyi Volodymyr Vysotskyi added a comment - On the one hand user knows that tmp workspace inside dfs plugin, so user expects that query use dfs.tmp; should work. On the other hand query show schemas; returns the schema name dfs.tmp . So user also expects that the query with such schema name should work. use `dfs.tmp`; Both these cases work since schema name and its path are the same. Schemas names with dots currently does not work in Drill. It is due to the handling schema paths in this way Queries SELECT `dfs`.`ds`.`foo.json`.`a`.`b.c` FROM `dfs`.`ds`.`foo.json` (1) SELECT `dfs.ds.foo.json.a`.`b.c` FROM `dfs.ds`.`foo.json` (2) SELECT `dfs.ds.foo.json.a.b.c` FROM `dfs.ds.foo.json` (3) will not work since Drill allows only table names or aliases before the field names. Considering only from clause, third case would not work, since Drill assumes that `dfs.ds.foo.json` is the schema name only. Queries on the directories with dots and table names with dots also works correctly. Hive has an option that allows dots in the columns names (by default dots in the columns is allowed). Parquet also allows field names with dots. Also current version of Drill can create parquet files with dots in field names, but Drill will fail when querying this file.
          Hide
          paul-rogers Paul Rogers added a comment -

          Thanks for the explanation! Can you suggest a solution (from the user's view) that offers the best trade-off between a number of requirements?

          Requirements:

          • Allow dots in any name (schema, workspace, table, column, directory, file).
          • Consistent behavior for all names.
          • Compatible with existing queries.
          • Familiar to users of similar systems (Hive, Impala, etc.)

          Maybe propose a syntax and user-visible rules that achieve the above (as best we can.) Once we agree on those rules, we can move on to discuss how we'll implement the rules given Calcite and the Drill execution classes.

          Show
          paul-rogers Paul Rogers added a comment - Thanks for the explanation! Can you suggest a solution (from the user's view) that offers the best trade-off between a number of requirements? Requirements: Allow dots in any name (schema, workspace, table, column, directory, file). Consistent behavior for all names. Compatible with existing queries. Familiar to users of similar systems (Hive, Impala, etc.) Maybe propose a syntax and user-visible rules that achieve the above (as best we can.) Once we agree on those rules, we can move on to discuss how we'll implement the rules given Calcite and the Drill execution classes.
          Hide
          vvysotskyi Volodymyr Vysotskyi added a comment -

          I think we shouldn't allow dots in the names of plugins and workspaces since Drill user can control these names and schema names are stored without quotes. So for example when we will have a plugin with name `df.s`, a plugin with name `df` and workspace in it with name `s`, schema name df.s will refer to the two schemes.

          For the table, directory or file names, dots are allowed. The path to the table in the query is specifying using quotes. The same for table names with dots.

          Currently, dots are allowed for column aliases and they are quoted also.
          But Drill may create a table using the column alias with dots and will fail at the execution stage when it will try to do a select *.

          The solution that I am proposed in my comment does not change the current syntax of Drill queries, I think the current syntax is fine. The proposed solution allows Drill does not fail when it will discover such columns in the table.

          Show
          vvysotskyi Volodymyr Vysotskyi added a comment - I think we shouldn't allow dots in the names of plugins and workspaces since Drill user can control these names and schema names are stored without quotes. So for example when we will have a plugin with name `df.s` , a plugin with name `df` and workspace in it with name `s` , schema name df.s will refer to the two schemes. For the table, directory or file names, dots are allowed. The path to the table in the query is specifying using quotes. The same for table names with dots. Currently, dots are allowed for column aliases and they are quoted also. But Drill may create a table using the column alias with dots and will fail at the execution stage when it will try to do a select * . The solution that I am proposed in my comment does not change the current syntax of Drill queries, I think the current syntax is fine. The proposed solution allows Drill does not fail when it will discover such columns in the table.
          Hide
          paul-rogers Paul Rogers added a comment -

          Let's see if I understand the proposal:

          • Dots not allowed in storage plugin or workspace names.
          • As a result, does in quoted workspace names are assumed to be sparators: `dfs.x` = `dfs`.`x`
          • Dots in table names must be quoted. `dfs.x`.`my.table.json`
          • Dots in columns must be quoted. SELECT a, b.c, b.`d.e` ...

          Is this correct?

          If so, then it seems fine.

          Are you also proposing to support array syntax? SELECT a, b.c, b[`d.e`] ...?

          You mentioned we use one code for names in the planner, another (SchemaPath) for runtime. What rules to we use at runtime? Can that code handle column names with dots? That is, do we ever start with a map "b" and a column "d.e", combine them to get "a.d.e" and try to split them again to get "a", "d" and "e"? If so, how do we fix that?

          Once we get all these questions resolved, I'd suggest posting a summary of the rules to the dev list so that others can take a look.

          Show
          paul-rogers Paul Rogers added a comment - Let's see if I understand the proposal: Dots not allowed in storage plugin or workspace names. As a result, does in quoted workspace names are assumed to be sparators: `dfs.x` = `dfs`.`x` Dots in table names must be quoted. `dfs.x`.`my.table.json` Dots in columns must be quoted. SELECT a, b.c, b.`d.e` ... Is this correct? If so, then it seems fine. Are you also proposing to support array syntax? SELECT a, b.c, b [`d.e`] ... ? You mentioned we use one code for names in the planner, another ( SchemaPath ) for runtime. What rules to we use at runtime? Can that code handle column names with dots? That is, do we ever start with a map "b" and a column "d.e", combine them to get "a.d.e" and try to split them again to get "a", "d" and "e"? If so, how do we fix that? Once we get all these questions resolved, I'd suggest posting a summary of the rules to the dev list so that others can take a look.
          Hide
          vvysotskyi Volodymyr Vysotskyi added a comment -

          Yes, these statements are correct.

          Drill also supports array syntax b['d.e']. I used it as an example of using field names with dots in the query.

          No, I meant that we are using the same FieldReference class, but in planner we are using checkSimpleString() method and at the execution stage we are using constructor FieldReference() that checks input string for dots.
          The code does not combine the map name and nested field names. Nested field names are stored as children of the NameSegment rootSegment in SchemaPath (it is a superclass of FieldReference).

          So I propose to remove that check.

          Show
          vvysotskyi Volodymyr Vysotskyi added a comment - Yes, these statements are correct. Drill also supports array syntax b['d.e'] . I used it as an example of using field names with dots in the query. No, I meant that we are using the same FieldReference class, but in planner we are using checkSimpleString() method and at the execution stage we are using constructor FieldReference() that checks input string for dots. The code does not combine the map name and nested field names. Nested field names are stored as children of the NameSegment rootSegment in SchemaPath (it is a superclass of FieldReference). So I propose to remove that check.
          Hide
          paul-rogers Paul Rogers added a comment -

          On the planner side, we represent field references with the FieldReference class you mentioned. FieldReference extends SchemaPath. These classes break names down into one object per name part.

          Assume we have SELECT a.b.c, a."d.e" ...

          Within the FieldReference itself, we hold onto the name using a PathSegment which has two subclasses: ArraySegment and NameSegment. So, as you noted, in the planner, we can tell the difference between the two cases (using functional notation):

          a.b.c: FieldReference(NameSegment("a", NameSegment("b", NameSegment("c"))))
          a."d.e": FieldReference(NameSegment("a", NameSegment("d.e")))
          

          So far so good. Bug, SchemaPath provides the getAsUnescapedPath() method which concatenates the parts of the name using dots. We end up with two FieldReference instances. Calling getAsUnescapedPath() on each produces a.b.c and a.d.e. So, if anything actually uses this unescaped path, we end up with an ambiguity: does "a.d.e" represent one field, two fields or three fields? We cannot tell.

          Now, if this method was only used for debugging (line toString()), it would be fine. But, in fact, many operators refer to this method, especially when creating the run-time representation of a field schema: MaterializedField:

          From StreamingAggBatch:

                final MaterializedField outputField = MaterializedField.create(
                              ne.getRef().getAsUnescapedPath(), expr.getMajorType());
          

          In our examples, we end up with two materialized fields: one called "a.b.c", the other "a.d.e", so the ambiguity persists.

          As it turns out, each MaterializedField represents one field or structured column. So, our map "a" is represented by a MaterializedField, "b" by another, "c" by yet another and "d.e" by another. So, each should correspond to a single name part.

          But, the code doesn't work that way, it actually builds up the full unescaped name.

          Now, I suspect that the code here is old and inconsistent. It should be that creating a materialized field pulls out only one name part. But, the code actually concatenates. My suspicion increases when I see methods like these in MaterializedField:

            public String getPath() { return getName(); }
            public String getLastName() { return getName(); }
            public String getName() { return name; }
          

          That first one really worries me: it is asking for the "path", which means dotted name. There are many references to this name. Does this mean the code expects to get a string (not a NameSegment) that holds the composite name. If so, we are in trouble.

          Now, as it turns out, it seems that the "modern" form of MaterializedSchema is that each hold just one name part. So:

          MaterializedField(name="a", children = (
            MaterializedField(name="b", children = (
              MaterializedField(name = c))),
            MaterializedField(name="d.e")))
          

          I wonder, because the code appears to be written assuming that a MaterializedField had a path name, does any code still rely on this fact, then split the name at dots to get fields?

          If not, can we remove the getPath(), and getLastPath() methods to make it clear that each MaterializedField corresponds to a single NameSegment?

          And, if we do that, should we remove all calls to NameSegment.getAsUnescapedPath() to make clear that we never (except for display) want dotted, combined path name?

          By carefully looking at the above issues, we can be sure that no old code in Drill tries to concatenate "a" and "d.e" to get the name "a.d.e" which it then splits into "a", "d" and "e".

          A quick search for ".split(" found a number of places where we split names on a dot, including in the Parquet Metadata file:

                  public Object deserializeKey(String key, com.fasterxml.jackson.databind.DeserializationContext ctxt)
                      throws IOException, com.fasterxml.jackson.core.JsonProcessingException {
                    return new Key(key.split("\\."));
                  }
          

          Are there others? Do these need to be fixed?

          Show
          paul-rogers Paul Rogers added a comment - On the planner side, we represent field references with the FieldReference class you mentioned. FieldReference extends SchemaPath . These classes break names down into one object per name part. Assume we have SELECT a.b.c, a."d.e" ... Within the FieldReference itself, we hold onto the name using a PathSegment which has two subclasses: ArraySegment and NameSegment . So, as you noted, in the planner, we can tell the difference between the two cases (using functional notation): a.b.c: FieldReference(NameSegment( "a" , NameSegment( "b" , NameSegment( "c" )))) a. "d.e" : FieldReference(NameSegment( "a" , NameSegment( "d.e" ))) So far so good. Bug, SchemaPath provides the getAsUnescapedPath() method which concatenates the parts of the name using dots. We end up with two FieldReference instances. Calling getAsUnescapedPath() on each produces a.b.c and a.d.e . So, if anything actually uses this unescaped path, we end up with an ambiguity: does "a.d.e" represent one field, two fields or three fields? We cannot tell. Now, if this method was only used for debugging (line toString() ), it would be fine. But, in fact, many operators refer to this method, especially when creating the run-time representation of a field schema: MaterializedField : From StreamingAggBatch : final MaterializedField outputField = MaterializedField.create( ne.getRef().getAsUnescapedPath(), expr.getMajorType()); In our examples, we end up with two materialized fields: one called "a.b.c", the other "a.d.e", so the ambiguity persists. As it turns out, each MaterializedField represents one field or structured column. So, our map "a" is represented by a MaterializedField , "b" by another, "c" by yet another and "d.e" by another. So, each should correspond to a single name part. But, the code doesn't work that way, it actually builds up the full unescaped name. Now, I suspect that the code here is old and inconsistent. It should be that creating a materialized field pulls out only one name part. But, the code actually concatenates. My suspicion increases when I see methods like these in MaterializedField : public String getPath() { return getName(); } public String getLastName() { return getName(); } public String getName() { return name; } That first one really worries me: it is asking for the "path", which means dotted name. There are many references to this name. Does this mean the code expects to get a string (not a NameSegment ) that holds the composite name. If so, we are in trouble. Now, as it turns out, it seems that the "modern" form of MaterializedSchema is that each hold just one name part. So: MaterializedField(name= "a" , children = ( MaterializedField(name= "b" , children = ( MaterializedField(name = c))), MaterializedField(name= "d.e" ))) I wonder, because the code appears to be written assuming that a MaterializedField had a path name, does any code still rely on this fact, then split the name at dots to get fields? If not, can we remove the getPath() , and getLastPath() methods to make it clear that each MaterializedField corresponds to a single NameSegment ? And, if we do that, should we remove all calls to NameSegment.getAsUnescapedPath() to make clear that we never (except for display) want dotted, combined path name? By carefully looking at the above issues, we can be sure that no old code in Drill tries to concatenate "a" and "d.e" to get the name "a.d.e" which it then splits into "a", "d" and "e". A quick search for ".split(" found a number of places where we split names on a dot, including in the Parquet Metadata file: public Object deserializeKey( String key, com.fasterxml.jackson.databind.DeserializationContext ctxt) throws IOException, com.fasterxml.jackson.core.JsonProcessingException { return new Key(key.split( "\\." )); } Are there others? Do these need to be fixed?
          Hide
          vvysotskyi Volodymyr Vysotskyi added a comment -

          Thanks for such detailed analysis.

          I agree with you that such deserializing of ColumnTypeMetadata_v3.Key objects will cause problems for the fields that contain dots in their names. To solve this issue I propose to change the structure of the ColumnTypeMetadata_v3.Key class. Instead of using an array with the components of the field name we should use SchemaPath and serialise it as a string obtained by calling SchemaPath.toExpr(). With this change, we also should update parquet metadata version.

          A more complex problem is connected with MaterializedField class. SchemaPath was removed from MaterializedField class in PR-373. One of the reasons for this refactoring was the assumption that MaterializedField should have no knowledge of its parents. Some code in Drill supposes that MaterializedField.getPath() returns field path including its parents.
          For example in this line MaterializedField instance will be created with the name col.getAsUnescapedPath(). In this line the name with parent field names was used. Using only the field name in the MaterializedField will cause problems since the field at the root level may have the same name as the field, nested in the map.
          So full field path should be used in the MaterializedField class in this case.

          The SchemaPath.getSimplePath(field.getPath()) code is used in many places, but it does not return the same SchemaPath that was used to create MaterializedField instance.
          We should change the implementation of MaterializedField in such a way that this code returns the same SchemaPath which was used to create MaterializedField instance.

          I think we should store a separate field String path in MaterializedField class with value SchemaPath.toExpr() and replace all SchemaPath.getAsUnescapedPath() calls by the SchemaPath.toExpr().

          • when the MaterializedField instance is created using the path SchemaPath.toExpr(), the name will be assigned as the last name of the SchemaPath.
          • when MaterializedField instance is created using the name, the path will be the same as the name with backticks.

          The less preferred solution is the revert of commit PR-373. In this case dots in the field names will be handled correctly. But such solution will make the transition to using Apache Arrow more complex (but MaterializedField was replaced by Flatbuffer Field, so the transition is already too complex).

          Show
          vvysotskyi Volodymyr Vysotskyi added a comment - Thanks for such detailed analysis. I agree with you that such deserializing of ColumnTypeMetadata_v3.Key objects will cause problems for the fields that contain dots in their names. To solve this issue I propose to change the structure of the ColumnTypeMetadata_v3.Key class. Instead of using an array with the components of the field name we should use SchemaPath and serialise it as a string obtained by calling SchemaPath.toExpr() . With this change, we also should update parquet metadata version. A more complex problem is connected with MaterializedField class. SchemaPath was removed from MaterializedField class in PR-373 . One of the reasons for this refactoring was the assumption that MaterializedField should have no knowledge of its parents. Some code in Drill supposes that MaterializedField.getPath() returns field path including its parents. For example in this line MaterializedField instance will be created with the name col.getAsUnescapedPath() . In this line the name with parent field names was used. Using only the field name in the MaterializedField will cause problems since the field at the root level may have the same name as the field, nested in the map. So full field path should be used in the MaterializedField class in this case. The SchemaPath.getSimplePath(field.getPath()) code is used in many places, but it does not return the same SchemaPath that was used to create MaterializedField instance. We should change the implementation of MaterializedField in such a way that this code returns the same SchemaPath which was used to create MaterializedField instance. I think we should store a separate field String path in MaterializedField class with value SchemaPath.toExpr() and replace all SchemaPath.getAsUnescapedPath() calls by the SchemaPath.toExpr() . when the MaterializedField instance is created using the path SchemaPath.toExpr() , the name will be assigned as the last name of the SchemaPath . when MaterializedField instance is created using the name, the path will be the same as the name with backticks. The less preferred solution is the revert of commit PR-373 . In this case dots in the field names will be handled correctly. But such solution will make the transition to using Apache Arrow more complex (but MaterializedField was replaced by Flatbuffer Field , so the transition is already too complex).
          Hide
          paul-rogers Paul Rogers added a comment - - edited

          Wonderful detailed analysis! You caught many detailed issues that my quick scan missed.

          The solution for Parquet metadata seems good. I'm not an expert in that area, but a few unit tests will validate the change once you make it. Bumping the version number will solve the forward/backward compatibility issues (using the mechanism from DRILL-5660.)

          The MaterializedField issue is harder. Fortunately, some of the nested-name issues might not be actual issues.

          For example, your example of ScanBatch.Mutator:362 should be OK as long as the caller knows call this method for top-level columns. This line is used to build up a record batch during reading such as in JSON or Parquet. The problem is if the container is a map. In this case, the caller should be calling AbstractMapVector.addOrGet() to add the field rather than adding it at the top level using the Mutator.

          Are there other cases where the code assembles a path then tears it down again? Or, parses a path?

          Otherwise, we can find all uses of MaterializedField.getPath(), verify that the really only use the leaf name, and replace them with getName(). The same is true of getLastName().

          Show
          paul-rogers Paul Rogers added a comment - - edited Wonderful detailed analysis! You caught many detailed issues that my quick scan missed. The solution for Parquet metadata seems good. I'm not an expert in that area, but a few unit tests will validate the change once you make it. Bumping the version number will solve the forward/backward compatibility issues (using the mechanism from DRILL-5660 .) The MaterializedField issue is harder. Fortunately, some of the nested-name issues might not be actual issues. For example, your example of ScanBatch.Mutator:362 should be OK as long as the caller knows call this method for top-level columns. This line is used to build up a record batch during reading such as in JSON or Parquet. The problem is if the container is a map. In this case, the caller should be calling AbstractMapVector.addOrGet() to add the field rather than adding it at the top level using the Mutator . Are there other cases where the code assembles a path then tears it down again? Or, parses a path? Otherwise, we can find all uses of MaterializedField.getPath() , verify that the really only use the leaf name, and replace them with getName() . The same is true of getLastName() .
          Hide
          mandoskippy John Omernik added a comment -

          Lots to process here...

          Let me add a simple thing though...

          "As a user I have a new data set, I have no idea what's in it, from Drill (that's the key) I should be able to select * from directory, and if it's a known format (JSON, Parquet, CSV etc) I should get results back. I know I can do select `field.one`, `field.two` from directory and get it, but say it's a parquet file created in Spark... there is no way for me explore that data in Drill ... I need select *

          Show
          mandoskippy John Omernik added a comment - Lots to process here... Let me add a simple thing though... "As a user I have a new data set, I have no idea what's in it, from Drill (that's the key) I should be able to select * from directory, and if it's a known format (JSON, Parquet, CSV etc) I should get results back. I know I can do select `field.one`, `field.two` from directory and get it, but say it's a parquet file created in Spark... there is no way for me explore that data in Drill ... I need select *
          Hide
          vvysotskyi Volodymyr Vysotskyi added a comment -

          John Omernik, the goal of this Jira is to fix the issue which you have mentioned.
          With the fix for this Jira query

          select * from `test.json`;
          

          where test.json is the file from Jira description (it also has dots in the field names) will return correct result:

          +--------------------------------------------------+--------------------------------------------------+
          |                      0.0.1                       |                      0.1.2                       |
          +--------------------------------------------------+--------------------------------------------------+
          | {"version":"0.0.1","date_created":"2014-03-15"}  | {"version":"0.1.2","date_created":"2014-05-21"}  |
          +--------------------------------------------------+--------------------------------------------------+
          
          Show
          vvysotskyi Volodymyr Vysotskyi added a comment - John Omernik , the goal of this Jira is to fix the issue which you have mentioned. With the fix for this Jira query select * from `test.json`; where test.json is the file from Jira description (it also has dots in the field names) will return correct result: +--------------------------------------------------+--------------------------------------------------+ | 0.0.1 | 0.1.2 | +--------------------------------------------------+--------------------------------------------------+ | {"version":"0.0.1","date_created":"2014-03-15"} | {"version":"0.1.2","date_created":"2014-05-21"} | +--------------------------------------------------+--------------------------------------------------+
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user vvysotskyi opened a pull request:

          https://github.com/apache/drill/pull/909

          DRILL-4264: Allow field names to include dots

          1. Removed checking the field name for dots.
          2. Replaced using `SchemaPath.getAsUnescapedPath()` method by `SchemaPath.getRootSegmentPath()` and `SchemaPathUtil.getMaterializedFieldFromSchemaPath()` where it is needed.
          3. Replaced using `MaterializedField.getPath()` and `MaterializedField.getLastName()` methods by `MaterializedField.getName()` method and checked the correctness of the behaviour.
          4. Added tests

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/vvysotskyi/drill DRILL-4264

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/drill/pull/909.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #909


          commit 4ba59488a96fb79455b192ed960a728481ceaf93
          Author: Volodymyr Vysotskyi <vvovyk@gmail.com>
          Date: 2017-07-05T19:08:59Z

          DRILL-4264: Allow field names to include dots


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user vvysotskyi opened a pull request: https://github.com/apache/drill/pull/909 DRILL-4264 : Allow field names to include dots 1. Removed checking the field name for dots. 2. Replaced using `SchemaPath.getAsUnescapedPath()` method by `SchemaPath.getRootSegmentPath()` and `SchemaPathUtil.getMaterializedFieldFromSchemaPath()` where it is needed. 3. Replaced using `MaterializedField.getPath()` and `MaterializedField.getLastName()` methods by `MaterializedField.getName()` method and checked the correctness of the behaviour. 4. Added tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/vvysotskyi/drill DRILL-4264 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/909.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #909 commit 4ba59488a96fb79455b192ed960a728481ceaf93 Author: Volodymyr Vysotskyi <vvovyk@gmail.com> Date: 2017-07-05T19:08:59Z DRILL-4264 : Allow field names to include dots
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134539810

          — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java —
          @@ -359,30 +361,109 @@ public Mutator(OperatorExecContext oContext, BufferAllocator allocator, VectorCo
          public <T extends ValueVector> T addField(MaterializedField field,
          Class<T> clazz) throws SchemaChangeException {
          // Check if the field exists.

          • ValueVector v = fieldVectorMap.get(field.getPath());
          • if (v == null || v.getClass() != clazz) {
            + ValueVector vector = fieldVectorMap.get(field.getName());
            + ValueVector childVector = vector;
            + // if a vector does not exist yet, creates value vector, or if it exists and has map type, omit this code
            + if (vector == null || (vector.getClass() != clazz
            + && (vector.getField().getType().getMinorType() != MinorType.MAP
            + || field.getType().getMinorType() != MinorType.MAP))) {
            // Field does not exist--add it to the map and the output container.
          • v = TypeHelper.getNewVector(field, allocator, callBack);
          • if (!clazz.isAssignableFrom(v.getClass())) {
            + vector = TypeHelper.getNewVector(field, allocator, callBack);
            + childVector = vector;
            + // gets inner field if the map was created the first time
            + if (field.getType().getMinorType() == MinorType.MAP) { + childVector = getChildVectorByField(vector, field); + }

            else if (!clazz.isAssignableFrom(vector.getClass()))

            { throw new SchemaChangeException( String.format( "The class that was provided, %s, does not correspond to the " + "expected vector type of %s.", - clazz.getSimpleName(), v.getClass().getSimpleName())); + clazz.getSimpleName(), vector.getClass().getSimpleName())); }
          • final ValueVector old = fieldVectorMap.put(field.getPath(), v);
            + final ValueVector old = fieldVectorMap.put(field.getName(), vector);
            if (old != null) { old.clear(); container.remove(old); }
          • container.add(v);
            + container.add(vector);
            // Added new vectors to the container--mark that the schema has changed.
            schemaChanged = true;
            }
            + // otherwise, checks that field and existing vector have a map type
              • End diff –

          I was suggesting that the work here may be produced on the nested fields thru the map.
          I agree with you that it would be correct to deal with the desired field. So thanks for pointing this, I reverted the changes in this method.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134539810 — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java — @@ -359,30 +361,109 @@ public Mutator(OperatorExecContext oContext, BufferAllocator allocator, VectorCo public <T extends ValueVector> T addField(MaterializedField field, Class<T> clazz) throws SchemaChangeException { // Check if the field exists. ValueVector v = fieldVectorMap.get(field.getPath()); if (v == null || v.getClass() != clazz) { + ValueVector vector = fieldVectorMap.get(field.getName()); + ValueVector childVector = vector; + // if a vector does not exist yet, creates value vector, or if it exists and has map type, omit this code + if (vector == null || (vector.getClass() != clazz + && (vector.getField().getType().getMinorType() != MinorType.MAP + || field.getType().getMinorType() != MinorType.MAP))) { // Field does not exist--add it to the map and the output container. v = TypeHelper.getNewVector(field, allocator, callBack); if (!clazz.isAssignableFrom(v.getClass())) { + vector = TypeHelper.getNewVector(field, allocator, callBack); + childVector = vector; + // gets inner field if the map was created the first time + if (field.getType().getMinorType() == MinorType.MAP) { + childVector = getChildVectorByField(vector, field); + } else if (!clazz.isAssignableFrom(vector.getClass())) { throw new SchemaChangeException( String.format( "The class that was provided, %s, does not correspond to the " + "expected vector type of %s.", - clazz.getSimpleName(), v.getClass().getSimpleName())); + clazz.getSimpleName(), vector.getClass().getSimpleName())); } final ValueVector old = fieldVectorMap.put(field.getPath(), v); + final ValueVector old = fieldVectorMap.put(field.getName(), vector); if (old != null) { old.clear(); container.remove(old); } container.add(v); + container.add(vector); // Added new vectors to the container--mark that the schema has changed. schemaChanged = true; } + // otherwise, checks that field and existing vector have a map type End diff – I was suggesting that the work here may be produced on the nested fields thru the map. I agree with you that it would be correct to deal with the desired field. So thanks for pointing this, I reverted the changes in this method.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134540459

          — Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/CompareFunctionsProcessor.java —
          @@ -147,10 +147,10 @@ public Boolean visitCastExpression(CastExpression e, LogicalExpression valueArg)

          @Override
          public Boolean visitConvertExpression(ConvertExpression e, LogicalExpression valueArg) throws RuntimeException {

          • if (e.getConvertFunction() == ConvertExpression.CONVERT_FROM) {
            + if (ConvertExpression.CONVERT_FROM.equals(e.getConvertFunction())) {
              • End diff –

          Since both these classes almost the same, I moved mutual code to the abstract class.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134540459 — Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/CompareFunctionsProcessor.java — @@ -147,10 +147,10 @@ public Boolean visitCastExpression(CastExpression e, LogicalExpression valueArg) @Override public Boolean visitConvertExpression(ConvertExpression e, LogicalExpression valueArg) throws RuntimeException { if (e.getConvertFunction() == ConvertExpression.CONVERT_FROM) { + if (ConvertExpression.CONVERT_FROM.equals(e.getConvertFunction())) { End diff – Since both these classes almost the same, I moved mutual code to the abstract class.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134795741

          — Diff: logical/src/main/java/org/apache/drill/common/expression/SchemaPath.java —
          @@ -115,6 +112,33 @@ public static SchemaPath create(NamePart namePart) {
          }

          /**
          + * Parses input string and returns

          {@code SchemaPath} instance.
          + *
          + * @param expr input string to be parsed
          + * @return {@code SchemaPath}

          instance
          + */
          + public static SchemaPath parseFromString(String expr) {
          — End diff –

          It parses a string using the same rules which are used for the field in the query. If a string contains dot outside backticks, or there are no backticks in the string, will be created `SchemaPath` with the `NameSegment` which contains one else `NameSegment`, etc. If a string contains [] then `ArraySegment` will be created.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134795741 — Diff: logical/src/main/java/org/apache/drill/common/expression/SchemaPath.java — @@ -115,6 +112,33 @@ public static SchemaPath create(NamePart namePart) { } /** + * Parses input string and returns {@code SchemaPath} instance. + * + * @param expr input string to be parsed + * @return {@code SchemaPath} instance + */ + public static SchemaPath parseFromString(String expr) { — End diff – It parses a string using the same rules which are used for the field in the query. If a string contains dot outside backticks, or there are no backticks in the string, will be created `SchemaPath` with the `NameSegment` which contains one else `NameSegment`, etc. If a string contains [] then `ArraySegment` will be created.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134764401

          — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/SchemaPathUtil.java —
          @@ -0,0 +1,59 @@
          +/*
          +* Licensed to the Apache Software Foundation (ASF) under one or more
          +* contributor license agreements. See the NOTICE file distributed with
          +* this work for additional information regarding copyright ownership.
          +* The ASF licenses this file to you under the Apache License, Version 2.0
          +* (the "License"); you may not use this file except in compliance with
          +* the License. You may obtain a copy of the License at
          +*
          +* http://www.apache.org/licenses/LICENSE-2.0
          +*
          +* Unless required by applicable law or agreed to in writing, software
          +* distributed under the License is distributed on an "AS IS" BASIS,
          +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          +* See the License for the specific language governing permissions and
          +* limitations under the License.
          +*/
          +package org.apache.drill.exec.vector.complex;
          +
          +import org.apache.drill.common.expression.PathSegment;
          +import org.apache.drill.common.expression.SchemaPath;
          +import org.apache.drill.common.types.TypeProtos;
          +import org.apache.drill.common.types.Types;
          +import org.apache.drill.exec.record.MaterializedField;
          +
          +public class SchemaPathUtil {
          — End diff –

          Removed this class, since all code where it was used, uses simple name path so it is not needed anymore.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134764401 — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/SchemaPathUtil.java — @@ -0,0 +1,59 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to you under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ +package org.apache.drill.exec.vector.complex; + +import org.apache.drill.common.expression.PathSegment; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.record.MaterializedField; + +public class SchemaPathUtil { — End diff – Removed this class, since all code where it was used, uses simple name path so it is not needed anymore.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134788919

          — Diff: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/TupleAccessor.java —
          @@ -48,9 +48,21 @@

          MaterializedField column(int index);

          • MaterializedField column(String name);
            + /**
            + * Returns {@code MaterializedField} instance from schema using the name path specified in param.
            + *
            + * @param name full name path of the column in the schema
            + * @return {@code MaterializedField}

            instance
            + */
            + MaterializedField column(String[] name);

              • End diff –

          Thanks, reverted my changes.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134788919 — Diff: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/TupleAccessor.java — @@ -48,9 +48,21 @@ MaterializedField column(int index); MaterializedField column(String name); + /** + * Returns {@code MaterializedField} instance from schema using the name path specified in param. + * + * @param name full name path of the column in the schema + * @return {@code MaterializedField} instance + */ + MaterializedField column(String[] name); End diff – Thanks, reverted my changes.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134778939

          — Diff: exec/java-exec/src/test/java/org/apache/drill/exec/expr/TestSchemaPathMaterialization.java —
          @@ -93,4 +93,23 @@ public void testProjectionMultipleFiles() throws Exception

          { .go(); }

          + @Test //DRILL-4264
          + public void testFieldNameWithDot() throws Exception {
          — End diff –

          Added more tests

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134778939 — Diff: exec/java-exec/src/test/java/org/apache/drill/exec/expr/TestSchemaPathMaterialization.java — @@ -93,4 +93,23 @@ public void testProjectionMultipleFiles() throws Exception { .go(); } + @Test // DRILL-4264 + public void testFieldNameWithDot() throws Exception { — End diff – Added more tests
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134689741

          — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java —
          @@ -362,16 +363,16 @@ protected boolean setupNewSchema() throws SchemaChangeException

          { final TransferPair tp = vvIn.makeTransferPair(vvOut); transfers.add(tp); }
          • } else if (value != null && value.intValue() > 1) { // subsequent wildcards should do a copy of incoming valuevectors + }

            else if (value != null && value > 1) { // subsequent wildcards should do a copy of incoming valuevectors
            int k = 0;
            for (final VectorWrapper<?> wrapper : incoming) {
            final ValueVector vvIn = wrapper.getValueVector();

          • final SchemaPath originalPath = SchemaPath.getSimplePath(vvIn.getField().getPath());
          • if (k > result.outputNames.size()-1) {
            + final SchemaPath originalPath = SchemaPath.getSimplePath(vvIn.getField().getName());
            + if (k > result.outputNames.size() - 1) { assert false; }

            final String name = result.outputNames.get(k++); // get the renamed column names

          • if (name == EMPTY_STRING) {
            + if (EMPTY_STRING.equals(name)) {
              • End diff –

          Thanks, replaced by `name.isEmpty()`, but `EMPTY_STRING` is used in other places, so left it in the class.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134689741 — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java — @@ -362,16 +363,16 @@ protected boolean setupNewSchema() throws SchemaChangeException { final TransferPair tp = vvIn.makeTransferPair(vvOut); transfers.add(tp); } } else if (value != null && value.intValue() > 1) { // subsequent wildcards should do a copy of incoming valuevectors + } else if (value != null && value > 1) { // subsequent wildcards should do a copy of incoming valuevectors int k = 0; for (final VectorWrapper<?> wrapper : incoming) { final ValueVector vvIn = wrapper.getValueVector(); final SchemaPath originalPath = SchemaPath.getSimplePath(vvIn.getField().getPath()); if (k > result.outputNames.size()-1) { + final SchemaPath originalPath = SchemaPath.getSimplePath(vvIn.getField().getName()); + if (k > result.outputNames.size() - 1) { assert false; } final String name = result.outputNames.get(k++); // get the renamed column names if (name == EMPTY_STRING) { + if (EMPTY_STRING.equals(name)) { End diff – Thanks, replaced by `name.isEmpty()`, but `EMPTY_STRING` is used in other places, so left it in the class.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134788014

          — Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetSchema.java —
          @@ -94,12 +102,20 @@ private void updateStructure(int index, PhysicalSchema children) {
          */

          public static class NameSpace<T> {

          • private final Map<String,Integer> nameSpace = new HashMap<>();
            + private final Map<String, Integer> nameSpace = new HashMap<>();
            private final List<T> columns = new ArrayList<>();
          • public int add(String key, T value) {
            + /**
            + * Adds column path with specified value to the columns list
            + * and returns the index of the column in the list.
            + *
            + * @param key full name path of the column in the schema
            + * @param value value to be added to the list
            + * @return index of the column in the list
            + */
            + public int add(String[] key, T value) {
            int index = columns.size();
          • nameSpace.put(key, index);
            + nameSpace.put(SchemaPath.getCompoundPath(key).toExpr(), index);
              • End diff –

          Thanks for the explanation, reverted my changes.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134788014 — Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetSchema.java — @@ -94,12 +102,20 @@ private void updateStructure(int index, PhysicalSchema children) { */ public static class NameSpace<T> { private final Map<String,Integer> nameSpace = new HashMap<>(); + private final Map<String, Integer> nameSpace = new HashMap<>(); private final List<T> columns = new ArrayList<>(); public int add(String key, T value) { + /** + * Adds column path with specified value to the columns list + * and returns the index of the column in the list. + * + * @param key full name path of the column in the schema + * @param value value to be added to the list + * @return index of the column in the list + */ + public int add(String[] key, T value) { int index = columns.size(); nameSpace.put(key, index); + nameSpace.put(SchemaPath.getCompoundPath(key).toExpr(), index); End diff – Thanks for the explanation, reverted my changes.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134782398

          — Diff: exec/java-exec/src/test/java/org/apache/drill/exec/record/TestMaterializedField.java —
          @@ -84,4 +89,22 @@ public void testClone() {

          }

          + @Test // DRILL-4264
          + public void testSchemaPathToMaterializedFieldConverting() {
          — End diff –

          This test was designed to check the `SchemaPathUtil.getMaterializedFieldFromSchemaPath()` method. Since this method removed, I removed this test.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134782398 — Diff: exec/java-exec/src/test/java/org/apache/drill/exec/record/TestMaterializedField.java — @@ -84,4 +89,22 @@ public void testClone() { } + @Test // DRILL-4264 + public void testSchemaPathToMaterializedFieldConverting() { — End diff – This test was designed to check the `SchemaPathUtil.getMaterializedFieldFromSchemaPath()` method. Since this method removed, I removed this test.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134686740

          — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java —
          @@ -359,30 +361,109 @@ public Mutator(OperatorExecContext oContext, BufferAllocator allocator, VectorCo
          public <T extends ValueVector> T addField(MaterializedField field,
          Class<T> clazz) throws SchemaChangeException {
          // Check if the field exists.

          • ValueVector v = fieldVectorMap.get(field.getPath());
          • if (v == null || v.getClass() != clazz) {
            + ValueVector vector = fieldVectorMap.get(field.getName());
            + ValueVector childVector = vector;
            + // if a vector does not exist yet, creates value vector, or if it exists and has map type, omit this code
            + if (vector == null || (vector.getClass() != clazz
            + && (vector.getField().getType().getMinorType() != MinorType.MAP
            + || field.getType().getMinorType() != MinorType.MAP))) {
            // Field does not exist--add it to the map and the output container.
          • v = TypeHelper.getNewVector(field, allocator, callBack);
          • if (!clazz.isAssignableFrom(v.getClass())) {
            + vector = TypeHelper.getNewVector(field, allocator, callBack);
            + childVector = vector;
            + // gets inner field if the map was created the first time
            + if (field.getType().getMinorType() == MinorType.MAP) { + childVector = getChildVectorByField(vector, field); + }

            else if (!clazz.isAssignableFrom(vector.getClass()))

            { throw new SchemaChangeException( String.format( "The class that was provided, %s, does not correspond to the " + "expected vector type of %s.", - clazz.getSimpleName(), v.getClass().getSimpleName())); + clazz.getSimpleName(), vector.getClass().getSimpleName())); }
          • final ValueVector old = fieldVectorMap.put(field.getPath(), v);
            + final ValueVector old = fieldVectorMap.put(field.getName(), vector);
            if (old != null) { old.clear(); container.remove(old); }
          • container.add(v);
            + container.add(vector);
            // Added new vectors to the container--mark that the schema has changed.
            schemaChanged = true;
            }
            + // otherwise, checks that field and existing vector have a map type
            + // and adds child fields from the field to the vector
            + else if (field.getType().getMinorType() == MinorType.MAP
            + && vector.getField().getType().getMinorType() == MinorType.MAP
            + && !field.getChildren().isEmpty()) { + // an incoming field contains only single child since it determines + // full name path of the field in the schema + childVector = addNestedChildToMap((MapVector) vector, Iterables.getLast(field.getChildren())); + schemaChanged = true; + }
          • return clazz.cast(v);
            + return clazz.cast(childVector);
            + }
            +
            + /**
            + * Finds and returns value vector which path corresponds to the specified field.
            + * If required vector is nested in the map, gets and returns this vector from the map.
            + *
            + * @param valueVector vector that should be checked
            + * @param field field that corresponds to required vector
            + * @return value vector whose path corresponds to the specified field
            + *
            + * @throws SchemaChangeException if the field does not correspond to the found vector
            + */
            + private ValueVector getChildVectorByField(ValueVector valueVector,
            + MaterializedField field) throws SchemaChangeException {
            + if (field.getChildren().isEmpty())
            Unknown macro: { + if (valueVector.getField().equals(field)) { + return valueVector; + } else { + throw new SchemaChangeException( + String.format( + "The field that was provided, %s, does not correspond to the " + + "expected vector type of %s.", + field, valueVector.getClass().getSimpleName())); + } + }

            else

            { + // an incoming field contains only single child since it determines + // full name path of the field in the schema + MaterializedField childField = Iterables.getLast(field.getChildren()); + return getChildVectorByField(((MapVector) valueVector).getChild(childField.getName()), childField); + }

            + }
            +
            + /**
            + * Adds new vector with the specified field to the specified map vector.
            + * If a field has few levels of nesting, corresponding maps will be created.
            + *
            + * @param valueVector a map where new vector will be added
            + * @param field a field that corresponds to the vector which should be created
            + * @return vector that was created
            + *
            + * @throws SchemaChangeException if the field has a child and its type is not map
            + */
            + private ValueVector addNestedChildToMap(MapVector valueVector,

              • End diff –

          This method was designed to handle the case when we have a map inside other map and we want to add a field into the nested map. But considering your second comment, this method also is not needed.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134686740 — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java — @@ -359,30 +361,109 @@ public Mutator(OperatorExecContext oContext, BufferAllocator allocator, VectorCo public <T extends ValueVector> T addField(MaterializedField field, Class<T> clazz) throws SchemaChangeException { // Check if the field exists. ValueVector v = fieldVectorMap.get(field.getPath()); if (v == null || v.getClass() != clazz) { + ValueVector vector = fieldVectorMap.get(field.getName()); + ValueVector childVector = vector; + // if a vector does not exist yet, creates value vector, or if it exists and has map type, omit this code + if (vector == null || (vector.getClass() != clazz + && (vector.getField().getType().getMinorType() != MinorType.MAP + || field.getType().getMinorType() != MinorType.MAP))) { // Field does not exist--add it to the map and the output container. v = TypeHelper.getNewVector(field, allocator, callBack); if (!clazz.isAssignableFrom(v.getClass())) { + vector = TypeHelper.getNewVector(field, allocator, callBack); + childVector = vector; + // gets inner field if the map was created the first time + if (field.getType().getMinorType() == MinorType.MAP) { + childVector = getChildVectorByField(vector, field); + } else if (!clazz.isAssignableFrom(vector.getClass())) { throw new SchemaChangeException( String.format( "The class that was provided, %s, does not correspond to the " + "expected vector type of %s.", - clazz.getSimpleName(), v.getClass().getSimpleName())); + clazz.getSimpleName(), vector.getClass().getSimpleName())); } final ValueVector old = fieldVectorMap.put(field.getPath(), v); + final ValueVector old = fieldVectorMap.put(field.getName(), vector); if (old != null) { old.clear(); container.remove(old); } container.add(v); + container.add(vector); // Added new vectors to the container--mark that the schema has changed. schemaChanged = true; } + // otherwise, checks that field and existing vector have a map type + // and adds child fields from the field to the vector + else if (field.getType().getMinorType() == MinorType.MAP + && vector.getField().getType().getMinorType() == MinorType.MAP + && !field.getChildren().isEmpty()) { + // an incoming field contains only single child since it determines + // full name path of the field in the schema + childVector = addNestedChildToMap((MapVector) vector, Iterables.getLast(field.getChildren())); + schemaChanged = true; + } return clazz.cast(v); + return clazz.cast(childVector); + } + + /** + * Finds and returns value vector which path corresponds to the specified field. + * If required vector is nested in the map, gets and returns this vector from the map. + * + * @param valueVector vector that should be checked + * @param field field that corresponds to required vector + * @return value vector whose path corresponds to the specified field + * + * @throws SchemaChangeException if the field does not correspond to the found vector + */ + private ValueVector getChildVectorByField(ValueVector valueVector, + MaterializedField field) throws SchemaChangeException { + if (field.getChildren().isEmpty()) Unknown macro: { + if (valueVector.getField().equals(field)) { + return valueVector; + } else { + throw new SchemaChangeException( + String.format( + "The field that was provided, %s, does not correspond to the " + + "expected vector type of %s.", + field, valueVector.getClass().getSimpleName())); + } + } else { + // an incoming field contains only single child since it determines + // full name path of the field in the schema + MaterializedField childField = Iterables.getLast(field.getChildren()); + return getChildVectorByField(((MapVector) valueVector).getChild(childField.getName()), childField); + } + } + + /** + * Adds new vector with the specified field to the specified map vector. + * If a field has few levels of nesting, corresponding maps will be created. + * + * @param valueVector a map where new vector will be added + * @param field a field that corresponds to the vector which should be created + * @return vector that was created + * + * @throws SchemaChangeException if the field has a child and its type is not map + */ + private ValueVector addNestedChildToMap(MapVector valueVector, End diff – This method was designed to handle the case when we have a map inside other map and we want to add a field into the nested map. But considering your second comment, this method also is not needed.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134683465

          — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java —
          @@ -359,30 +361,109 @@ public Mutator(OperatorExecContext oContext, BufferAllocator allocator, VectorCo
          public <T extends ValueVector> T addField(MaterializedField field,
          Class<T> clazz) throws SchemaChangeException {
          // Check if the field exists.

          • ValueVector v = fieldVectorMap.get(field.getPath());
          • if (v == null || v.getClass() != clazz) {
            + ValueVector vector = fieldVectorMap.get(field.getName());
            + ValueVector childVector = vector;
            + // if a vector does not exist yet, creates value vector, or if it exists and has map type, omit this code
            + if (vector == null || (vector.getClass() != clazz
            + && (vector.getField().getType().getMinorType() != MinorType.MAP
            + || field.getType().getMinorType() != MinorType.MAP))) {
            // Field does not exist--add it to the map and the output container.
          • v = TypeHelper.getNewVector(field, allocator, callBack);
          • if (!clazz.isAssignableFrom(v.getClass())) {
            + vector = TypeHelper.getNewVector(field, allocator, callBack);
            + childVector = vector;
            + // gets inner field if the map was created the first time
            + if (field.getType().getMinorType() == MinorType.MAP) { + childVector = getChildVectorByField(vector, field); + }

            else if (!clazz.isAssignableFrom(vector.getClass()))

            { throw new SchemaChangeException( String.format( "The class that was provided, %s, does not correspond to the " + "expected vector type of %s.", - clazz.getSimpleName(), v.getClass().getSimpleName())); + clazz.getSimpleName(), vector.getClass().getSimpleName())); }
          • final ValueVector old = fieldVectorMap.put(field.getPath(), v);
            + final ValueVector old = fieldVectorMap.put(field.getName(), vector);
            if (old != null) { old.clear(); container.remove(old); }
          • container.add(v);
            + container.add(vector);
            // Added new vectors to the container--mark that the schema has changed.
            schemaChanged = true;
            }
            + // otherwise, checks that field and existing vector have a map type
            + // and adds child fields from the field to the vector
            + else if (field.getType().getMinorType() == MinorType.MAP
            + && vector.getField().getType().getMinorType() == MinorType.MAP
            + && !field.getChildren().isEmpty()) { + // an incoming field contains only single child since it determines + // full name path of the field in the schema + childVector = addNestedChildToMap((MapVector) vector, Iterables.getLast(field.getChildren())); + schemaChanged = true; + }
          • return clazz.cast(v);
            + return clazz.cast(childVector);
            + }
            +
            + /**
            + * Finds and returns value vector which path corresponds to the specified field.
            + * If required vector is nested in the map, gets and returns this vector from the map.
            + *
            + * @param valueVector vector that should be checked
            + * @param field field that corresponds to required vector
            + * @return value vector whose path corresponds to the specified field
            + *
            + * @throws SchemaChangeException if the field does not correspond to the found vector
            + */
            + private ValueVector getChildVectorByField(ValueVector valueVector,
            + MaterializedField field) throws SchemaChangeException {
            + if (field.getChildren().isEmpty())
            Unknown macro: { + if (valueVector.getField().equals(field)) { + return valueVector; + } else { + throw new SchemaChangeException( + String.format( + "The field that was provided, %s, does not correspond to the " + + "expected vector type of %s.", + field, valueVector.getClass().getSimpleName())); + } + }

            else

            { + // an incoming field contains only single child since it determines + // full name path of the field in the schema + MaterializedField childField = Iterables.getLast(field.getChildren()); + return getChildVectorByField(((MapVector) valueVector).getChild(childField.getName()), childField); + }
              • End diff –

          This method was designed to be called only when the vector has just created, so its field does not contain `$bits$` or `$offsets$` fields in the children list. Its goal was to return the last inner vector that corresponds to the last nested field child.
          But considering the previous comment, this method also is not needed anymore.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134683465 — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java — @@ -359,30 +361,109 @@ public Mutator(OperatorExecContext oContext, BufferAllocator allocator, VectorCo public <T extends ValueVector> T addField(MaterializedField field, Class<T> clazz) throws SchemaChangeException { // Check if the field exists. ValueVector v = fieldVectorMap.get(field.getPath()); if (v == null || v.getClass() != clazz) { + ValueVector vector = fieldVectorMap.get(field.getName()); + ValueVector childVector = vector; + // if a vector does not exist yet, creates value vector, or if it exists and has map type, omit this code + if (vector == null || (vector.getClass() != clazz + && (vector.getField().getType().getMinorType() != MinorType.MAP + || field.getType().getMinorType() != MinorType.MAP))) { // Field does not exist--add it to the map and the output container. v = TypeHelper.getNewVector(field, allocator, callBack); if (!clazz.isAssignableFrom(v.getClass())) { + vector = TypeHelper.getNewVector(field, allocator, callBack); + childVector = vector; + // gets inner field if the map was created the first time + if (field.getType().getMinorType() == MinorType.MAP) { + childVector = getChildVectorByField(vector, field); + } else if (!clazz.isAssignableFrom(vector.getClass())) { throw new SchemaChangeException( String.format( "The class that was provided, %s, does not correspond to the " + "expected vector type of %s.", - clazz.getSimpleName(), v.getClass().getSimpleName())); + clazz.getSimpleName(), vector.getClass().getSimpleName())); } final ValueVector old = fieldVectorMap.put(field.getPath(), v); + final ValueVector old = fieldVectorMap.put(field.getName(), vector); if (old != null) { old.clear(); container.remove(old); } container.add(v); + container.add(vector); // Added new vectors to the container--mark that the schema has changed. schemaChanged = true; } + // otherwise, checks that field and existing vector have a map type + // and adds child fields from the field to the vector + else if (field.getType().getMinorType() == MinorType.MAP + && vector.getField().getType().getMinorType() == MinorType.MAP + && !field.getChildren().isEmpty()) { + // an incoming field contains only single child since it determines + // full name path of the field in the schema + childVector = addNestedChildToMap((MapVector) vector, Iterables.getLast(field.getChildren())); + schemaChanged = true; + } return clazz.cast(v); + return clazz.cast(childVector); + } + + /** + * Finds and returns value vector which path corresponds to the specified field. + * If required vector is nested in the map, gets and returns this vector from the map. + * + * @param valueVector vector that should be checked + * @param field field that corresponds to required vector + * @return value vector whose path corresponds to the specified field + * + * @throws SchemaChangeException if the field does not correspond to the found vector + */ + private ValueVector getChildVectorByField(ValueVector valueVector, + MaterializedField field) throws SchemaChangeException { + if (field.getChildren().isEmpty()) Unknown macro: { + if (valueVector.getField().equals(field)) { + return valueVector; + } else { + throw new SchemaChangeException( + String.format( + "The field that was provided, %s, does not correspond to the " + + "expected vector type of %s.", + field, valueVector.getClass().getSimpleName())); + } + } else { + // an incoming field contains only single child since it determines + // full name path of the field in the schema + MaterializedField childField = Iterables.getLast(field.getChildren()); + return getChildVectorByField(((MapVector) valueVector).getChild(childField.getName()), childField); + } End diff – This method was designed to be called only when the vector has just created, so its field does not contain `$bits$` or `$offsets$` fields in the children list. Its goal was to return the last inner vector that corresponds to the last nested field child. But considering the previous comment, this method also is not needed anymore.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134790779

          — Diff: logical/src/main/java/org/apache/drill/common/expression/PathSegment.java —
          @@ -17,11 +17,15 @@
          */
          package org.apache.drill.common.expression;

          -public abstract class PathSegment{
          +public abstract class PathSegment {

          • PathSegment child;
            + private PathSegment child;
              • End diff –

          This field has a setter.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134790779 — Diff: logical/src/main/java/org/apache/drill/common/expression/PathSegment.java — @@ -17,11 +17,15 @@ */ package org.apache.drill.common.expression; -public abstract class PathSegment{ +public abstract class PathSegment { PathSegment child; + private PathSegment child; End diff – This field has a setter.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134688368

          — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatch.java —
          @@ -293,7 +294,7 @@ private StreamingAggregator createAggregatorInternal() throws SchemaChangeExcept
          continue;
          }
          keyExprs[i] = expr;

          • final MaterializedField outputField = MaterializedField.create(ne.getRef().getAsUnescapedPath(), expr.getMajorType());
            + final MaterializedField outputField = SchemaPathUtil.getMaterializedFieldFromSchemaPath(ne.getRef(), expr.getMajorType());
              • End diff –

          Yes, it should. But `MaterializedField` class is in the `vector` module, `SchemaPath` class is in the `drill-logical` module and `vector` module does not have the dependency on the `drill-logical` module.
          Replaced this code by the `MaterializedField.create(ne.getRef().getLastSegment().getNameSegment().getPath(), expr.getMajorType());` since simple name path is used here.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134688368 — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatch.java — @@ -293,7 +294,7 @@ private StreamingAggregator createAggregatorInternal() throws SchemaChangeExcept continue; } keyExprs [i] = expr; final MaterializedField outputField = MaterializedField.create(ne.getRef().getAsUnescapedPath(), expr.getMajorType()); + final MaterializedField outputField = SchemaPathUtil.getMaterializedFieldFromSchemaPath(ne.getRef(), expr.getMajorType()); End diff – Yes, it should. But `MaterializedField` class is in the `vector` module, `SchemaPath` class is in the `drill-logical` module and `vector` module does not have the dependency on the `drill-logical` module. Replaced this code by the `MaterializedField.create(ne.getRef().getLastSegment().getNameSegment().getPath(), expr.getMajorType());` since simple name path is used here.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r134787569

          — Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetSchema.java —
          @@ -83,7 +85,13 @@ private void updateStructure(int index, PhysicalSchema children) {
          public boolean isMap()

          { return mapSchema != null; }

          public PhysicalSchema mapSchema()

          { return mapSchema; }

          public MaterializedField field()

          { return field; }
          • public String fullName() { return fullName; }

            +
            + /**
            + * Returns full name path of the column.

              • End diff –

          I reverted these changes. Also, I commented out the test where this code is used with the map fields.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r134787569 — Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetSchema.java — @@ -83,7 +85,13 @@ private void updateStructure(int index, PhysicalSchema children) { public boolean isMap() { return mapSchema != null; } public PhysicalSchema mapSchema() { return mapSchema; } public MaterializedField field() { return field; } public String fullName() { return fullName; } + + /** + * Returns full name path of the column. End diff – I reverted these changes. Also, I commented out the test where this code is used with the map fields.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user paul-rogers commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r135359090

          — Diff: contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/binary/CompareFunctionsProcessor.java —
          @@ -82,466 +45,37 @@ public static CompareFunctionsProcessor process(FunctionCall call, boolean nullC
          LogicalExpression swapArg = valueArg;
          valueArg = nameArg;
          nameArg = swapArg;

          • evaluator.functionName = COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName);
            + evaluator.setFunctionName(COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName));
            }
          • evaluator.success = nameArg.accept(evaluator, valueArg);
            + evaluator.setSuccess(nameArg.accept(evaluator, valueArg));
            } else if (nullComparatorSupported && call.args.get(0) instanceof SchemaPath) { - evaluator.success = true; - evaluator.path = (SchemaPath) nameArg; + evaluator.setSuccess(true); + evaluator.setPath((SchemaPath) nameArg); }

          return evaluator;
          }

          • public CompareFunctionsProcessor(String functionName) { - this.success = false; - this.functionName = functionName; - this.isEqualityFn = COMPARE_FUNCTIONS_TRANSPOSE_MAP.containsKey(functionName) - && COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName).equals(functionName); - this.isRowKeyPrefixComparison = false; - this.sortOrderAscending = true; - }

            -

          • public byte[] getValue() { - return value; - }

            -

          • public boolean isSuccess() { - return success; - }

            -

          • public SchemaPath getPath() { - return path; - }

            -

          • public String getFunctionName() { - return functionName; - }

            -

          • public boolean isRowKeyPrefixComparison() { - return isRowKeyPrefixComparison; - }

            -

          • public byte[] getRowKeyPrefixStartRow() { - return rowKeyPrefixStartRow; - }

            -

          • public byte[] getRowKeyPrefixStopRow() { - return rowKeyPrefixStopRow; - }

            -

          • public Filter getRowKeyPrefixFilter() { - return rowKeyPrefixFilter; - }

            -

          • public boolean isSortOrderAscending() { - return sortOrderAscending; - }

            -
            @Override

          • public Boolean visitCastExpression(CastExpression e, LogicalExpression valueArg) throws RuntimeException {
          • if (e.getInput() instanceof CastExpression || e.getInput() instanceof SchemaPath) { - return e.getInput().accept(this, valueArg); - }
          • return false;
          • }
            -
          • @Override
          • public Boolean visitConvertExpression(ConvertExpression e, LogicalExpression valueArg) throws RuntimeException {
          • if (e.getConvertFunction() == ConvertExpression.CONVERT_FROM) {
            -
          • String encodingType = e.getEncodingType();
          • int prefixLength = 0;
            -
          • // Handle scan pruning in the following scenario:
          • // The row-key is a composite key and the CONVERT_FROM() function has byte_substr() as input function which is
          • // querying for the first few bytes of the row-key(start-offset 1)
          • // Example WHERE clause:
          • // CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'DATE_EPOCH_BE') < DATE '2015-06-17'
          • if (e.getInput() instanceof FunctionCall) {
            -
          • // We can prune scan range only for big-endian encoded data
          • if (encodingType.endsWith("_BE") == false) { - return false; - }
            -
            - FunctionCall call = (FunctionCall)e.getInput();
            - String functionName = call.getName();
            - if (!functionName.equalsIgnoreCase("byte_substr")) { - return false; - }

            -

          • LogicalExpression nameArg = call.args.get(0);
          • LogicalExpression valueArg1 = call.args.size() >= 2 ? call.args.get(1) : null;
          • LogicalExpression valueArg2 = call.args.size() >= 3 ? call.args.get(2) : null;
            -
          • if (((nameArg instanceof SchemaPath) == false) ||
          • (valueArg1 == null) || ((valueArg1 instanceof IntExpression) == false) ||
          • (valueArg2 == null) || ((valueArg2 instanceof IntExpression) == false)) { - return false; - }
            -
            - boolean isRowKey = ((SchemaPath)nameArg).getAsUnescapedPath().equals(DrillHBaseConstants.ROW_KEY);
            - int offset = ((IntExpression)valueArg1).getInt();
            -
            - if (!isRowKey || (offset != 1)) { - return false; - }

            -

          • this.path = (SchemaPath)nameArg;
          • prefixLength = ((IntExpression)valueArg2).getInt();
          • this.isRowKeyPrefixComparison = true;
          • return visitRowKeyPrefixConvertExpression(e, prefixLength, valueArg);
          • }
            -
          • if (e.getInput() instanceof SchemaPath) {
          • ByteBuf bb = null;
            -
          • switch (encodingType) {
          • case "INT_BE":
          • case "INT":
          • case "UINT_BE":
          • case "UINT":
          • case "UINT4_BE":
          • case "UINT4":
          • if (valueArg instanceof IntExpression
          • && (isEqualityFn || encodingType.startsWith("U"))) { - bb = newByteBuf(4, encodingType.endsWith("_BE")); - bb.writeInt(((IntExpression)valueArg).getInt()); - }
          • break;
          • case "BIGINT_BE":
          • case "BIGINT":
          • case "UINT8_BE":
          • case "UINT8":
          • if (valueArg instanceof LongExpression
          • && (isEqualityFn || encodingType.startsWith("U"))) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((LongExpression)valueArg).getLong()); - }
          • break;
          • case "FLOAT":
          • if (valueArg instanceof FloatExpression && isEqualityFn) { - bb = newByteBuf(4, true); - bb.writeFloat(((FloatExpression)valueArg).getFloat()); - }
          • break;
          • case "DOUBLE":
          • if (valueArg instanceof DoubleExpression && isEqualityFn) { - bb = newByteBuf(8, true); - bb.writeDouble(((DoubleExpression)valueArg).getDouble()); - }
          • break;
          • case "TIME_EPOCH":
          • case "TIME_EPOCH_BE":
          • if (valueArg instanceof TimeExpression) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((TimeExpression)valueArg).getTime()); - }
          • break;
          • case "DATE_EPOCH":
          • case "DATE_EPOCH_BE":
          • if (valueArg instanceof DateExpression) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((DateExpression)valueArg).getDate()); - }
          • break;
          • case "BOOLEAN_BYTE":
          • if (valueArg instanceof BooleanExpression) { - bb = newByteBuf(1, false /* does not matter */); - bb.writeByte(((BooleanExpression)valueArg).getBoolean() ? 1 : 0); - }
          • break;
          • case "DOUBLE_OB":
          • case "DOUBLE_OBD":
          • if (valueArg instanceof DoubleExpression) {
          • bb = newByteBuf(9, true);
          • PositionedByteRange br = new SimplePositionedMutableByteRange(bb.array(), 0, 9);
          • if (encodingType.endsWith("_OBD")) { - org.apache.hadoop.hbase.util.OrderedBytes.encodeFloat64(br, - ((DoubleExpression)valueArg).getDouble(), Order.DESCENDING); - this.sortOrderAscending = false; - }

            else

            { - org.apache.hadoop.hbase.util.OrderedBytes.encodeFloat64(br, - ((DoubleExpression)valueArg).getDouble(), Order.ASCENDING); - }

            + protected ByteBuf getByteBuf(LogicalExpression valueArg, String encodingType) {

              • End diff –

          Sorry, it is not at all clear why we deleted the big wad of code. Nor is it clear why, when resolving functions, we start looking at byte buffers. Resolving function is a setup task; byte buffers are runtime tasks. Can you explain this a bit?

          Show
          githubbot ASF GitHub Bot added a comment - Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r135359090 — Diff: contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/binary/CompareFunctionsProcessor.java — @@ -82,466 +45,37 @@ public static CompareFunctionsProcessor process(FunctionCall call, boolean nullC LogicalExpression swapArg = valueArg; valueArg = nameArg; nameArg = swapArg; evaluator.functionName = COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName); + evaluator.setFunctionName(COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName)); } evaluator.success = nameArg.accept(evaluator, valueArg); + evaluator.setSuccess(nameArg.accept(evaluator, valueArg)); } else if (nullComparatorSupported && call.args.get(0) instanceof SchemaPath) { - evaluator.success = true; - evaluator.path = (SchemaPath) nameArg; + evaluator.setSuccess(true); + evaluator.setPath((SchemaPath) nameArg); } return evaluator; } public CompareFunctionsProcessor(String functionName) { - this.success = false; - this.functionName = functionName; - this.isEqualityFn = COMPARE_FUNCTIONS_TRANSPOSE_MAP.containsKey(functionName) - && COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName).equals(functionName); - this.isRowKeyPrefixComparison = false; - this.sortOrderAscending = true; - } - public byte[] getValue() { - return value; - } - public boolean isSuccess() { - return success; - } - public SchemaPath getPath() { - return path; - } - public String getFunctionName() { - return functionName; - } - public boolean isRowKeyPrefixComparison() { - return isRowKeyPrefixComparison; - } - public byte[] getRowKeyPrefixStartRow() { - return rowKeyPrefixStartRow; - } - public byte[] getRowKeyPrefixStopRow() { - return rowKeyPrefixStopRow; - } - public Filter getRowKeyPrefixFilter() { - return rowKeyPrefixFilter; - } - public boolean isSortOrderAscending() { - return sortOrderAscending; - } - @Override public Boolean visitCastExpression(CastExpression e, LogicalExpression valueArg) throws RuntimeException { if (e.getInput() instanceof CastExpression || e.getInput() instanceof SchemaPath) { - return e.getInput().accept(this, valueArg); - } return false; } - @Override public Boolean visitConvertExpression(ConvertExpression e, LogicalExpression valueArg) throws RuntimeException { if (e.getConvertFunction() == ConvertExpression.CONVERT_FROM) { - String encodingType = e.getEncodingType(); int prefixLength = 0; - // Handle scan pruning in the following scenario: // The row-key is a composite key and the CONVERT_FROM() function has byte_substr() as input function which is // querying for the first few bytes of the row-key(start-offset 1) // Example WHERE clause: // CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'DATE_EPOCH_BE') < DATE '2015-06-17' if (e.getInput() instanceof FunctionCall) { - // We can prune scan range only for big-endian encoded data if (encodingType.endsWith("_BE") == false) { - return false; - } - - FunctionCall call = (FunctionCall)e.getInput(); - String functionName = call.getName(); - if (!functionName.equalsIgnoreCase("byte_substr")) { - return false; - } - LogicalExpression nameArg = call.args.get(0); LogicalExpression valueArg1 = call.args.size() >= 2 ? call.args.get(1) : null; LogicalExpression valueArg2 = call.args.size() >= 3 ? call.args.get(2) : null; - if (((nameArg instanceof SchemaPath) == false) || (valueArg1 == null) || ((valueArg1 instanceof IntExpression) == false) || (valueArg2 == null) || ((valueArg2 instanceof IntExpression) == false)) { - return false; - } - - boolean isRowKey = ((SchemaPath)nameArg).getAsUnescapedPath().equals(DrillHBaseConstants.ROW_KEY); - int offset = ((IntExpression)valueArg1).getInt(); - - if (!isRowKey || (offset != 1)) { - return false; - } - this.path = (SchemaPath)nameArg; prefixLength = ((IntExpression)valueArg2).getInt(); this.isRowKeyPrefixComparison = true; return visitRowKeyPrefixConvertExpression(e, prefixLength, valueArg); } - if (e.getInput() instanceof SchemaPath) { ByteBuf bb = null; - switch (encodingType) { case "INT_BE": case "INT": case "UINT_BE": case "UINT": case "UINT4_BE": case "UINT4": if (valueArg instanceof IntExpression && (isEqualityFn || encodingType.startsWith("U"))) { - bb = newByteBuf(4, encodingType.endsWith("_BE")); - bb.writeInt(((IntExpression)valueArg).getInt()); - } break; case "BIGINT_BE": case "BIGINT": case "UINT8_BE": case "UINT8": if (valueArg instanceof LongExpression && (isEqualityFn || encodingType.startsWith("U"))) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((LongExpression)valueArg).getLong()); - } break; case "FLOAT": if (valueArg instanceof FloatExpression && isEqualityFn) { - bb = newByteBuf(4, true); - bb.writeFloat(((FloatExpression)valueArg).getFloat()); - } break; case "DOUBLE": if (valueArg instanceof DoubleExpression && isEqualityFn) { - bb = newByteBuf(8, true); - bb.writeDouble(((DoubleExpression)valueArg).getDouble()); - } break; case "TIME_EPOCH": case "TIME_EPOCH_BE": if (valueArg instanceof TimeExpression) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((TimeExpression)valueArg).getTime()); - } break; case "DATE_EPOCH": case "DATE_EPOCH_BE": if (valueArg instanceof DateExpression) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((DateExpression)valueArg).getDate()); - } break; case "BOOLEAN_BYTE": if (valueArg instanceof BooleanExpression) { - bb = newByteBuf(1, false /* does not matter */); - bb.writeByte(((BooleanExpression)valueArg).getBoolean() ? 1 : 0); - } break; case "DOUBLE_OB": case "DOUBLE_OBD": if (valueArg instanceof DoubleExpression) { bb = newByteBuf(9, true); PositionedByteRange br = new SimplePositionedMutableByteRange(bb.array(), 0, 9); if (encodingType.endsWith("_OBD")) { - org.apache.hadoop.hbase.util.OrderedBytes.encodeFloat64(br, - ((DoubleExpression)valueArg).getDouble(), Order.DESCENDING); - this.sortOrderAscending = false; - } else { - org.apache.hadoop.hbase.util.OrderedBytes.encodeFloat64(br, - ((DoubleExpression)valueArg).getDouble(), Order.ASCENDING); - } + protected ByteBuf getByteBuf(LogicalExpression valueArg, String encodingType) { End diff – Sorry, it is not at all clear why we deleted the big wad of code. Nor is it clear why, when resolving functions, we start looking at byte buffers. Resolving function is a setup task; byte buffers are runtime tasks. Can you explain this a bit?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r135753493

          — Diff: contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/binary/CompareFunctionsProcessor.java —
          @@ -82,466 +45,37 @@ public static CompareFunctionsProcessor process(FunctionCall call, boolean nullC
          LogicalExpression swapArg = valueArg;
          valueArg = nameArg;
          nameArg = swapArg;

          • evaluator.functionName = COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName);
            + evaluator.setFunctionName(COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName));
            }
          • evaluator.success = nameArg.accept(evaluator, valueArg);
            + evaluator.setSuccess(nameArg.accept(evaluator, valueArg));
            } else if (nullComparatorSupported && call.args.get(0) instanceof SchemaPath) { - evaluator.success = true; - evaluator.path = (SchemaPath) nameArg; + evaluator.setSuccess(true); + evaluator.setPath((SchemaPath) nameArg); }

          return evaluator;
          }

          • public CompareFunctionsProcessor(String functionName) { - this.success = false; - this.functionName = functionName; - this.isEqualityFn = COMPARE_FUNCTIONS_TRANSPOSE_MAP.containsKey(functionName) - && COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName).equals(functionName); - this.isRowKeyPrefixComparison = false; - this.sortOrderAscending = true; - }

            -

          • public byte[] getValue() { - return value; - }

            -

          • public boolean isSuccess() { - return success; - }

            -

          • public SchemaPath getPath() { - return path; - }

            -

          • public String getFunctionName() { - return functionName; - }

            -

          • public boolean isRowKeyPrefixComparison() { - return isRowKeyPrefixComparison; - }

            -

          • public byte[] getRowKeyPrefixStartRow() { - return rowKeyPrefixStartRow; - }

            -

          • public byte[] getRowKeyPrefixStopRow() { - return rowKeyPrefixStopRow; - }

            -

          • public Filter getRowKeyPrefixFilter() { - return rowKeyPrefixFilter; - }

            -

          • public boolean isSortOrderAscending() { - return sortOrderAscending; - }

            -
            @Override

          • public Boolean visitCastExpression(CastExpression e, LogicalExpression valueArg) throws RuntimeException {
          • if (e.getInput() instanceof CastExpression || e.getInput() instanceof SchemaPath) { - return e.getInput().accept(this, valueArg); - }
          • return false;
          • }
            -
          • @Override
          • public Boolean visitConvertExpression(ConvertExpression e, LogicalExpression valueArg) throws RuntimeException {
          • if (e.getConvertFunction() == ConvertExpression.CONVERT_FROM) {
            -
          • String encodingType = e.getEncodingType();
          • int prefixLength = 0;
            -
          • // Handle scan pruning in the following scenario:
          • // The row-key is a composite key and the CONVERT_FROM() function has byte_substr() as input function which is
          • // querying for the first few bytes of the row-key(start-offset 1)
          • // Example WHERE clause:
          • // CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'DATE_EPOCH_BE') < DATE '2015-06-17'
          • if (e.getInput() instanceof FunctionCall) {
            -
          • // We can prune scan range only for big-endian encoded data
          • if (encodingType.endsWith("_BE") == false) { - return false; - }
            -
            - FunctionCall call = (FunctionCall)e.getInput();
            - String functionName = call.getName();
            - if (!functionName.equalsIgnoreCase("byte_substr")) { - return false; - }

            -

          • LogicalExpression nameArg = call.args.get(0);
          • LogicalExpression valueArg1 = call.args.size() >= 2 ? call.args.get(1) : null;
          • LogicalExpression valueArg2 = call.args.size() >= 3 ? call.args.get(2) : null;
            -
          • if (((nameArg instanceof SchemaPath) == false) ||
          • (valueArg1 == null) || ((valueArg1 instanceof IntExpression) == false) ||
          • (valueArg2 == null) || ((valueArg2 instanceof IntExpression) == false)) { - return false; - }
            -
            - boolean isRowKey = ((SchemaPath)nameArg).getAsUnescapedPath().equals(DrillHBaseConstants.ROW_KEY);
            - int offset = ((IntExpression)valueArg1).getInt();
            -
            - if (!isRowKey || (offset != 1)) { - return false; - }

            -

          • this.path = (SchemaPath)nameArg;
          • prefixLength = ((IntExpression)valueArg2).getInt();
          • this.isRowKeyPrefixComparison = true;
          • return visitRowKeyPrefixConvertExpression(e, prefixLength, valueArg);
          • }
            -
          • if (e.getInput() instanceof SchemaPath) {
          • ByteBuf bb = null;
            -
          • switch (encodingType) {
          • case "INT_BE":
          • case "INT":
          • case "UINT_BE":
          • case "UINT":
          • case "UINT4_BE":
          • case "UINT4":
          • if (valueArg instanceof IntExpression
          • && (isEqualityFn || encodingType.startsWith("U"))) { - bb = newByteBuf(4, encodingType.endsWith("_BE")); - bb.writeInt(((IntExpression)valueArg).getInt()); - }
          • break;
          • case "BIGINT_BE":
          • case "BIGINT":
          • case "UINT8_BE":
          • case "UINT8":
          • if (valueArg instanceof LongExpression
          • && (isEqualityFn || encodingType.startsWith("U"))) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((LongExpression)valueArg).getLong()); - }
          • break;
          • case "FLOAT":
          • if (valueArg instanceof FloatExpression && isEqualityFn) { - bb = newByteBuf(4, true); - bb.writeFloat(((FloatExpression)valueArg).getFloat()); - }
          • break;
          • case "DOUBLE":
          • if (valueArg instanceof DoubleExpression && isEqualityFn) { - bb = newByteBuf(8, true); - bb.writeDouble(((DoubleExpression)valueArg).getDouble()); - }
          • break;
          • case "TIME_EPOCH":
          • case "TIME_EPOCH_BE":
          • if (valueArg instanceof TimeExpression) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((TimeExpression)valueArg).getTime()); - }
          • break;
          • case "DATE_EPOCH":
          • case "DATE_EPOCH_BE":
          • if (valueArg instanceof DateExpression) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((DateExpression)valueArg).getDate()); - }
          • break;
          • case "BOOLEAN_BYTE":
          • if (valueArg instanceof BooleanExpression) { - bb = newByteBuf(1, false /* does not matter */); - bb.writeByte(((BooleanExpression)valueArg).getBoolean() ? 1 : 0); - }
          • break;
          • case "DOUBLE_OB":
          • case "DOUBLE_OBD":
          • if (valueArg instanceof DoubleExpression) {
          • bb = newByteBuf(9, true);
          • PositionedByteRange br = new SimplePositionedMutableByteRange(bb.array(), 0, 9);
          • if (encodingType.endsWith("_OBD")) { - org.apache.hadoop.hbase.util.OrderedBytes.encodeFloat64(br, - ((DoubleExpression)valueArg).getDouble(), Order.DESCENDING); - this.sortOrderAscending = false; - }

            else

            { - org.apache.hadoop.hbase.util.OrderedBytes.encodeFloat64(br, - ((DoubleExpression)valueArg).getDouble(), Order.ASCENDING); - }

            + protected ByteBuf getByteBuf(LogicalExpression valueArg, String encodingType) {

              • End diff –

          `org.apache.drill.exec.store.mapr.db.binary.CompareFunctionsProcessor` and `org.apache.drill.exec.store.hbase.CompareFunctionsProcessor` classes contain almost the same code, so I moved mutual code to the abstract class `AbstractCompareFunctionsProcessor`.
          `visitConvertExpression()` method in the `mapr.db.binary.CompareFunctionsProcessor` contains `UTF8_OB` and `UTF8_OBD` cases in the switch block, but this method in `hbase.CompareFunctionsProcessor` class does not contain them, so I added a default case to the method in the abstract class which calls `getByteBuf()` method which considers this difference between the methods.

          As I understand, this code is used for *scan pruning* when used `convert_from` function, so we are using byte buffers there.

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r135753493 — Diff: contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/binary/CompareFunctionsProcessor.java — @@ -82,466 +45,37 @@ public static CompareFunctionsProcessor process(FunctionCall call, boolean nullC LogicalExpression swapArg = valueArg; valueArg = nameArg; nameArg = swapArg; evaluator.functionName = COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName); + evaluator.setFunctionName(COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName)); } evaluator.success = nameArg.accept(evaluator, valueArg); + evaluator.setSuccess(nameArg.accept(evaluator, valueArg)); } else if (nullComparatorSupported && call.args.get(0) instanceof SchemaPath) { - evaluator.success = true; - evaluator.path = (SchemaPath) nameArg; + evaluator.setSuccess(true); + evaluator.setPath((SchemaPath) nameArg); } return evaluator; } public CompareFunctionsProcessor(String functionName) { - this.success = false; - this.functionName = functionName; - this.isEqualityFn = COMPARE_FUNCTIONS_TRANSPOSE_MAP.containsKey(functionName) - && COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName).equals(functionName); - this.isRowKeyPrefixComparison = false; - this.sortOrderAscending = true; - } - public byte[] getValue() { - return value; - } - public boolean isSuccess() { - return success; - } - public SchemaPath getPath() { - return path; - } - public String getFunctionName() { - return functionName; - } - public boolean isRowKeyPrefixComparison() { - return isRowKeyPrefixComparison; - } - public byte[] getRowKeyPrefixStartRow() { - return rowKeyPrefixStartRow; - } - public byte[] getRowKeyPrefixStopRow() { - return rowKeyPrefixStopRow; - } - public Filter getRowKeyPrefixFilter() { - return rowKeyPrefixFilter; - } - public boolean isSortOrderAscending() { - return sortOrderAscending; - } - @Override public Boolean visitCastExpression(CastExpression e, LogicalExpression valueArg) throws RuntimeException { if (e.getInput() instanceof CastExpression || e.getInput() instanceof SchemaPath) { - return e.getInput().accept(this, valueArg); - } return false; } - @Override public Boolean visitConvertExpression(ConvertExpression e, LogicalExpression valueArg) throws RuntimeException { if (e.getConvertFunction() == ConvertExpression.CONVERT_FROM) { - String encodingType = e.getEncodingType(); int prefixLength = 0; - // Handle scan pruning in the following scenario: // The row-key is a composite key and the CONVERT_FROM() function has byte_substr() as input function which is // querying for the first few bytes of the row-key(start-offset 1) // Example WHERE clause: // CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'DATE_EPOCH_BE') < DATE '2015-06-17' if (e.getInput() instanceof FunctionCall) { - // We can prune scan range only for big-endian encoded data if (encodingType.endsWith("_BE") == false) { - return false; - } - - FunctionCall call = (FunctionCall)e.getInput(); - String functionName = call.getName(); - if (!functionName.equalsIgnoreCase("byte_substr")) { - return false; - } - LogicalExpression nameArg = call.args.get(0); LogicalExpression valueArg1 = call.args.size() >= 2 ? call.args.get(1) : null; LogicalExpression valueArg2 = call.args.size() >= 3 ? call.args.get(2) : null; - if (((nameArg instanceof SchemaPath) == false) || (valueArg1 == null) || ((valueArg1 instanceof IntExpression) == false) || (valueArg2 == null) || ((valueArg2 instanceof IntExpression) == false)) { - return false; - } - - boolean isRowKey = ((SchemaPath)nameArg).getAsUnescapedPath().equals(DrillHBaseConstants.ROW_KEY); - int offset = ((IntExpression)valueArg1).getInt(); - - if (!isRowKey || (offset != 1)) { - return false; - } - this.path = (SchemaPath)nameArg; prefixLength = ((IntExpression)valueArg2).getInt(); this.isRowKeyPrefixComparison = true; return visitRowKeyPrefixConvertExpression(e, prefixLength, valueArg); } - if (e.getInput() instanceof SchemaPath) { ByteBuf bb = null; - switch (encodingType) { case "INT_BE": case "INT": case "UINT_BE": case "UINT": case "UINT4_BE": case "UINT4": if (valueArg instanceof IntExpression && (isEqualityFn || encodingType.startsWith("U"))) { - bb = newByteBuf(4, encodingType.endsWith("_BE")); - bb.writeInt(((IntExpression)valueArg).getInt()); - } break; case "BIGINT_BE": case "BIGINT": case "UINT8_BE": case "UINT8": if (valueArg instanceof LongExpression && (isEqualityFn || encodingType.startsWith("U"))) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((LongExpression)valueArg).getLong()); - } break; case "FLOAT": if (valueArg instanceof FloatExpression && isEqualityFn) { - bb = newByteBuf(4, true); - bb.writeFloat(((FloatExpression)valueArg).getFloat()); - } break; case "DOUBLE": if (valueArg instanceof DoubleExpression && isEqualityFn) { - bb = newByteBuf(8, true); - bb.writeDouble(((DoubleExpression)valueArg).getDouble()); - } break; case "TIME_EPOCH": case "TIME_EPOCH_BE": if (valueArg instanceof TimeExpression) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((TimeExpression)valueArg).getTime()); - } break; case "DATE_EPOCH": case "DATE_EPOCH_BE": if (valueArg instanceof DateExpression) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((DateExpression)valueArg).getDate()); - } break; case "BOOLEAN_BYTE": if (valueArg instanceof BooleanExpression) { - bb = newByteBuf(1, false /* does not matter */); - bb.writeByte(((BooleanExpression)valueArg).getBoolean() ? 1 : 0); - } break; case "DOUBLE_OB": case "DOUBLE_OBD": if (valueArg instanceof DoubleExpression) { bb = newByteBuf(9, true); PositionedByteRange br = new SimplePositionedMutableByteRange(bb.array(), 0, 9); if (encodingType.endsWith("_OBD")) { - org.apache.hadoop.hbase.util.OrderedBytes.encodeFloat64(br, - ((DoubleExpression)valueArg).getDouble(), Order.DESCENDING); - this.sortOrderAscending = false; - } else { - org.apache.hadoop.hbase.util.OrderedBytes.encodeFloat64(br, - ((DoubleExpression)valueArg).getDouble(), Order.ASCENDING); - } + protected ByteBuf getByteBuf(LogicalExpression valueArg, String encodingType) { End diff – `org.apache.drill.exec.store.mapr.db.binary.CompareFunctionsProcessor` and `org.apache.drill.exec.store.hbase.CompareFunctionsProcessor` classes contain almost the same code, so I moved mutual code to the abstract class `AbstractCompareFunctionsProcessor`. `visitConvertExpression()` method in the `mapr.db.binary.CompareFunctionsProcessor` contains `UTF8_OB` and `UTF8_OBD` cases in the switch block, but this method in `hbase.CompareFunctionsProcessor` class does not contain them, so I added a default case to the method in the abstract class which calls `getByteBuf()` method which considers this difference between the methods. As I understand, this code is used for * scan pruning * when used `convert_from` function, so we are using byte buffers there.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user paul-rogers commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r135868684

          — Diff: contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/binary/CompareFunctionsProcessor.java —
          @@ -82,466 +45,37 @@ public static CompareFunctionsProcessor process(FunctionCall call, boolean nullC
          LogicalExpression swapArg = valueArg;
          valueArg = nameArg;
          nameArg = swapArg;

          • evaluator.functionName = COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName);
            + evaluator.setFunctionName(COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName));
            }
          • evaluator.success = nameArg.accept(evaluator, valueArg);
            + evaluator.setSuccess(nameArg.accept(evaluator, valueArg));
            } else if (nullComparatorSupported && call.args.get(0) instanceof SchemaPath) { - evaluator.success = true; - evaluator.path = (SchemaPath) nameArg; + evaluator.setSuccess(true); + evaluator.setPath((SchemaPath) nameArg); }

          return evaluator;
          }

          • public CompareFunctionsProcessor(String functionName) { - this.success = false; - this.functionName = functionName; - this.isEqualityFn = COMPARE_FUNCTIONS_TRANSPOSE_MAP.containsKey(functionName) - && COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName).equals(functionName); - this.isRowKeyPrefixComparison = false; - this.sortOrderAscending = true; - }

            -

          • public byte[] getValue() { - return value; - }

            -

          • public boolean isSuccess() { - return success; - }

            -

          • public SchemaPath getPath() { - return path; - }

            -

          • public String getFunctionName() { - return functionName; - }

            -

          • public boolean isRowKeyPrefixComparison() { - return isRowKeyPrefixComparison; - }

            -

          • public byte[] getRowKeyPrefixStartRow() { - return rowKeyPrefixStartRow; - }

            -

          • public byte[] getRowKeyPrefixStopRow() { - return rowKeyPrefixStopRow; - }

            -

          • public Filter getRowKeyPrefixFilter() { - return rowKeyPrefixFilter; - }

            -

          • public boolean isSortOrderAscending() { - return sortOrderAscending; - }

            -
            @Override

          • public Boolean visitCastExpression(CastExpression e, LogicalExpression valueArg) throws RuntimeException {
          • if (e.getInput() instanceof CastExpression || e.getInput() instanceof SchemaPath) { - return e.getInput().accept(this, valueArg); - }
          • return false;
          • }
            -
          • @Override
          • public Boolean visitConvertExpression(ConvertExpression e, LogicalExpression valueArg) throws RuntimeException {
          • if (e.getConvertFunction() == ConvertExpression.CONVERT_FROM) {
            -
          • String encodingType = e.getEncodingType();
          • int prefixLength = 0;
            -
          • // Handle scan pruning in the following scenario:
          • // The row-key is a composite key and the CONVERT_FROM() function has byte_substr() as input function which is
          • // querying for the first few bytes of the row-key(start-offset 1)
          • // Example WHERE clause:
          • // CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'DATE_EPOCH_BE') < DATE '2015-06-17'
          • if (e.getInput() instanceof FunctionCall) {
            -
          • // We can prune scan range only for big-endian encoded data
          • if (encodingType.endsWith("_BE") == false) { - return false; - }
            -
            - FunctionCall call = (FunctionCall)e.getInput();
            - String functionName = call.getName();
            - if (!functionName.equalsIgnoreCase("byte_substr")) { - return false; - }

            -

          • LogicalExpression nameArg = call.args.get(0);
          • LogicalExpression valueArg1 = call.args.size() >= 2 ? call.args.get(1) : null;
          • LogicalExpression valueArg2 = call.args.size() >= 3 ? call.args.get(2) : null;
            -
          • if (((nameArg instanceof SchemaPath) == false) ||
          • (valueArg1 == null) || ((valueArg1 instanceof IntExpression) == false) ||
          • (valueArg2 == null) || ((valueArg2 instanceof IntExpression) == false)) { - return false; - }
            -
            - boolean isRowKey = ((SchemaPath)nameArg).getAsUnescapedPath().equals(DrillHBaseConstants.ROW_KEY);
            - int offset = ((IntExpression)valueArg1).getInt();
            -
            - if (!isRowKey || (offset != 1)) { - return false; - }

            -

          • this.path = (SchemaPath)nameArg;
          • prefixLength = ((IntExpression)valueArg2).getInt();
          • this.isRowKeyPrefixComparison = true;
          • return visitRowKeyPrefixConvertExpression(e, prefixLength, valueArg);
          • }
            -
          • if (e.getInput() instanceof SchemaPath) {
          • ByteBuf bb = null;
            -
          • switch (encodingType) {
          • case "INT_BE":
          • case "INT":
          • case "UINT_BE":
          • case "UINT":
          • case "UINT4_BE":
          • case "UINT4":
          • if (valueArg instanceof IntExpression
          • && (isEqualityFn || encodingType.startsWith("U"))) { - bb = newByteBuf(4, encodingType.endsWith("_BE")); - bb.writeInt(((IntExpression)valueArg).getInt()); - }
          • break;
          • case "BIGINT_BE":
          • case "BIGINT":
          • case "UINT8_BE":
          • case "UINT8":
          • if (valueArg instanceof LongExpression
          • && (isEqualityFn || encodingType.startsWith("U"))) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((LongExpression)valueArg).getLong()); - }
          • break;
          • case "FLOAT":
          • if (valueArg instanceof FloatExpression && isEqualityFn) { - bb = newByteBuf(4, true); - bb.writeFloat(((FloatExpression)valueArg).getFloat()); - }
          • break;
          • case "DOUBLE":
          • if (valueArg instanceof DoubleExpression && isEqualityFn) { - bb = newByteBuf(8, true); - bb.writeDouble(((DoubleExpression)valueArg).getDouble()); - }
          • break;
          • case "TIME_EPOCH":
          • case "TIME_EPOCH_BE":
          • if (valueArg instanceof TimeExpression) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((TimeExpression)valueArg).getTime()); - }
          • break;
          • case "DATE_EPOCH":
          • case "DATE_EPOCH_BE":
          • if (valueArg instanceof DateExpression) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((DateExpression)valueArg).getDate()); - }
          • break;
          • case "BOOLEAN_BYTE":
          • if (valueArg instanceof BooleanExpression) { - bb = newByteBuf(1, false /* does not matter */); - bb.writeByte(((BooleanExpression)valueArg).getBoolean() ? 1 : 0); - }
          • break;
          • case "DOUBLE_OB":
          • case "DOUBLE_OBD":
          • if (valueArg instanceof DoubleExpression) {
          • bb = newByteBuf(9, true);
          • PositionedByteRange br = new SimplePositionedMutableByteRange(bb.array(), 0, 9);
          • if (encodingType.endsWith("_OBD")) { - org.apache.hadoop.hbase.util.OrderedBytes.encodeFloat64(br, - ((DoubleExpression)valueArg).getDouble(), Order.DESCENDING); - this.sortOrderAscending = false; - }

            else

            { - org.apache.hadoop.hbase.util.OrderedBytes.encodeFloat64(br, - ((DoubleExpression)valueArg).getDouble(), Order.ASCENDING); - }

            + protected ByteBuf getByteBuf(LogicalExpression valueArg, String encodingType) {

              • End diff –

          Thanks!

          Show
          githubbot ASF GitHub Bot added a comment - Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r135868684 — Diff: contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/binary/CompareFunctionsProcessor.java — @@ -82,466 +45,37 @@ public static CompareFunctionsProcessor process(FunctionCall call, boolean nullC LogicalExpression swapArg = valueArg; valueArg = nameArg; nameArg = swapArg; evaluator.functionName = COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName); + evaluator.setFunctionName(COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName)); } evaluator.success = nameArg.accept(evaluator, valueArg); + evaluator.setSuccess(nameArg.accept(evaluator, valueArg)); } else if (nullComparatorSupported && call.args.get(0) instanceof SchemaPath) { - evaluator.success = true; - evaluator.path = (SchemaPath) nameArg; + evaluator.setSuccess(true); + evaluator.setPath((SchemaPath) nameArg); } return evaluator; } public CompareFunctionsProcessor(String functionName) { - this.success = false; - this.functionName = functionName; - this.isEqualityFn = COMPARE_FUNCTIONS_TRANSPOSE_MAP.containsKey(functionName) - && COMPARE_FUNCTIONS_TRANSPOSE_MAP.get(functionName).equals(functionName); - this.isRowKeyPrefixComparison = false; - this.sortOrderAscending = true; - } - public byte[] getValue() { - return value; - } - public boolean isSuccess() { - return success; - } - public SchemaPath getPath() { - return path; - } - public String getFunctionName() { - return functionName; - } - public boolean isRowKeyPrefixComparison() { - return isRowKeyPrefixComparison; - } - public byte[] getRowKeyPrefixStartRow() { - return rowKeyPrefixStartRow; - } - public byte[] getRowKeyPrefixStopRow() { - return rowKeyPrefixStopRow; - } - public Filter getRowKeyPrefixFilter() { - return rowKeyPrefixFilter; - } - public boolean isSortOrderAscending() { - return sortOrderAscending; - } - @Override public Boolean visitCastExpression(CastExpression e, LogicalExpression valueArg) throws RuntimeException { if (e.getInput() instanceof CastExpression || e.getInput() instanceof SchemaPath) { - return e.getInput().accept(this, valueArg); - } return false; } - @Override public Boolean visitConvertExpression(ConvertExpression e, LogicalExpression valueArg) throws RuntimeException { if (e.getConvertFunction() == ConvertExpression.CONVERT_FROM) { - String encodingType = e.getEncodingType(); int prefixLength = 0; - // Handle scan pruning in the following scenario: // The row-key is a composite key and the CONVERT_FROM() function has byte_substr() as input function which is // querying for the first few bytes of the row-key(start-offset 1) // Example WHERE clause: // CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'DATE_EPOCH_BE') < DATE '2015-06-17' if (e.getInput() instanceof FunctionCall) { - // We can prune scan range only for big-endian encoded data if (encodingType.endsWith("_BE") == false) { - return false; - } - - FunctionCall call = (FunctionCall)e.getInput(); - String functionName = call.getName(); - if (!functionName.equalsIgnoreCase("byte_substr")) { - return false; - } - LogicalExpression nameArg = call.args.get(0); LogicalExpression valueArg1 = call.args.size() >= 2 ? call.args.get(1) : null; LogicalExpression valueArg2 = call.args.size() >= 3 ? call.args.get(2) : null; - if (((nameArg instanceof SchemaPath) == false) || (valueArg1 == null) || ((valueArg1 instanceof IntExpression) == false) || (valueArg2 == null) || ((valueArg2 instanceof IntExpression) == false)) { - return false; - } - - boolean isRowKey = ((SchemaPath)nameArg).getAsUnescapedPath().equals(DrillHBaseConstants.ROW_KEY); - int offset = ((IntExpression)valueArg1).getInt(); - - if (!isRowKey || (offset != 1)) { - return false; - } - this.path = (SchemaPath)nameArg; prefixLength = ((IntExpression)valueArg2).getInt(); this.isRowKeyPrefixComparison = true; return visitRowKeyPrefixConvertExpression(e, prefixLength, valueArg); } - if (e.getInput() instanceof SchemaPath) { ByteBuf bb = null; - switch (encodingType) { case "INT_BE": case "INT": case "UINT_BE": case "UINT": case "UINT4_BE": case "UINT4": if (valueArg instanceof IntExpression && (isEqualityFn || encodingType.startsWith("U"))) { - bb = newByteBuf(4, encodingType.endsWith("_BE")); - bb.writeInt(((IntExpression)valueArg).getInt()); - } break; case "BIGINT_BE": case "BIGINT": case "UINT8_BE": case "UINT8": if (valueArg instanceof LongExpression && (isEqualityFn || encodingType.startsWith("U"))) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((LongExpression)valueArg).getLong()); - } break; case "FLOAT": if (valueArg instanceof FloatExpression && isEqualityFn) { - bb = newByteBuf(4, true); - bb.writeFloat(((FloatExpression)valueArg).getFloat()); - } break; case "DOUBLE": if (valueArg instanceof DoubleExpression && isEqualityFn) { - bb = newByteBuf(8, true); - bb.writeDouble(((DoubleExpression)valueArg).getDouble()); - } break; case "TIME_EPOCH": case "TIME_EPOCH_BE": if (valueArg instanceof TimeExpression) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((TimeExpression)valueArg).getTime()); - } break; case "DATE_EPOCH": case "DATE_EPOCH_BE": if (valueArg instanceof DateExpression) { - bb = newByteBuf(8, encodingType.endsWith("_BE")); - bb.writeLong(((DateExpression)valueArg).getDate()); - } break; case "BOOLEAN_BYTE": if (valueArg instanceof BooleanExpression) { - bb = newByteBuf(1, false /* does not matter */); - bb.writeByte(((BooleanExpression)valueArg).getBoolean() ? 1 : 0); - } break; case "DOUBLE_OB": case "DOUBLE_OBD": if (valueArg instanceof DoubleExpression) { bb = newByteBuf(9, true); PositionedByteRange br = new SimplePositionedMutableByteRange(bb.array(), 0, 9); if (encodingType.endsWith("_OBD")) { - org.apache.hadoop.hbase.util.OrderedBytes.encodeFloat64(br, - ((DoubleExpression)valueArg).getDouble(), Order.DESCENDING); - this.sortOrderAscending = false; - } else { - org.apache.hadoop.hbase.util.OrderedBytes.encodeFloat64(br, - ((DoubleExpression)valueArg).getDouble(), Order.ASCENDING); - } + protected ByteBuf getByteBuf(LogicalExpression valueArg, String encodingType) { End diff – Thanks!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user paul-rogers commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r135872295

          — Diff: logical/src/main/java/org/apache/drill/common/expression/SchemaPath.java —
          @@ -115,6 +112,33 @@ public static SchemaPath create(NamePart namePart) {
          }

          /**
          + * Parses input string and returns

          {@code SchemaPath} instance.
          + *
          + * @param expr input string to be parsed
          + * @return {@code SchemaPath}

          instance
          + */
          + public static SchemaPath parseFromString(String expr) {
          — End diff –

          Thanks for the explanation. Can you copy it into the Javadoc for this method? Will help others in the future.

          Show
          githubbot ASF GitHub Bot added a comment - Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r135872295 — Diff: logical/src/main/java/org/apache/drill/common/expression/SchemaPath.java — @@ -115,6 +112,33 @@ public static SchemaPath create(NamePart namePart) { } /** + * Parses input string and returns {@code SchemaPath} instance. + * + * @param expr input string to be parsed + * @return {@code SchemaPath} instance + */ + public static SchemaPath parseFromString(String expr) { — End diff – Thanks for the explanation. Can you copy it into the Javadoc for this method? Will help others in the future.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi commented on a diff in the pull request:

          https://github.com/apache/drill/pull/909#discussion_r136016798

          — Diff: logical/src/main/java/org/apache/drill/common/expression/SchemaPath.java —
          @@ -115,6 +112,33 @@ public static SchemaPath create(NamePart namePart) {
          }

          /**
          + * Parses input string and returns

          {@code SchemaPath} instance.
          + *
          + * @param expr input string to be parsed
          + * @return {@code SchemaPath}

          instance
          + */
          + public static SchemaPath parseFromString(String expr) {
          — End diff –

          Done

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/909#discussion_r136016798 — Diff: logical/src/main/java/org/apache/drill/common/expression/SchemaPath.java — @@ -115,6 +112,33 @@ public static SchemaPath create(NamePart namePart) { } /** + * Parses input string and returns {@code SchemaPath} instance. + * + * @param expr input string to be parsed + * @return {@code SchemaPath} instance + */ + public static SchemaPath parseFromString(String expr) { — End diff – Done
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user paul-rogers commented on the issue:

          https://github.com/apache/drill/pull/909

          Thanks much for the great work!
          +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/909 Thanks much for the great work! +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user amansinha100 commented on the issue:

          https://github.com/apache/drill/pull/909

          Thanks @vvysotskyi for the PR and @paul-rogers for reviewing the proposal and code.

          Show
          githubbot ASF GitHub Bot added a comment - Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/909 Thanks @vvysotskyi for the PR and @paul-rogers for reviewing the proposal and code.
          Hide
          amansinha100 Aman Sinha added a comment -

          Fixed in d105950a7a9fb2ff3acd072ee65a51ef1fca120e .

          Show
          amansinha100 Aman Sinha added a comment - Fixed in d105950a7a9fb2ff3acd072ee65a51ef1fca120e .
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user amansinha100 commented on the issue:

          https://github.com/apache/drill/pull/909

          Merged in d105950a7a9fb2ff3acd072ee65a51ef1fca120e. @vvysotskyi pls close the PR (for some reason github is not showing me the option to close).

          Show
          githubbot ASF GitHub Bot added a comment - Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/909 Merged in d105950a7a9fb2ff3acd072ee65a51ef1fca120e. @vvysotskyi pls close the PR (for some reason github is not showing me the option to close).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vvysotskyi closed the pull request at:

          https://github.com/apache/drill/pull/909

          Show
          githubbot ASF GitHub Bot added a comment - Github user vvysotskyi closed the pull request at: https://github.com/apache/drill/pull/909

            People

            • Assignee:
              vvysotskyi Volodymyr Vysotskyi
              Reporter:
              AleCaste Alex
              Reviewer:
              Paul Rogers
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development