Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.0-incubating
    • Component/s: None
    • Labels:

      Description

      Some systems, such as HBase, have a named group to which columns belong (i.e. a column family in this case). To disambiguate column references (i.e. if the same column name is used in different column families), it's useful to allow an optional qualifier. This is similar to the RECORD field mechanism that Calcite already supports, but in this case the record name would be optional, since it's not necessary if the column names are unique. For example: http://phoenix.apache.org/language/index.html#column_ref

        Issue Links

          Activity

          Hide
          julianhyde Julian Hyde added a comment -

          This is related to CALCITE-356, which is also about resolving long identifiers like a.b.c.d.e.f.

          What happens if a is a schema and b is a table not used in the FROM clause? What happens if a is a table alias and there is a column-group named b and also a column called b in a column-group z?

          I think it is worth solving this and CALCITE-356 simultaneously, with lots and lots of tests for the corner cases.

          I think we should model column-groups as records, and allow qualifying field names to be omitted if the inner field is unique. The alternative, adding qualifiers to the names in the list of RelDataTypeField that come from Table.getRowType(RelDataTypeFactory).getFieldList(), would complicate a lot of things.

          Show
          julianhyde Julian Hyde added a comment - This is related to CALCITE-356 , which is also about resolving long identifiers like a.b.c.d.e.f. What happens if a is a schema and b is a table not used in the FROM clause? What happens if a is a table alias and there is a column-group named b and also a column called b in a column-group z? I think it is worth solving this and CALCITE-356 simultaneously, with lots and lots of tests for the corner cases. I think we should model column-groups as records, and allow qualifying field names to be omitted if the inner field is unique. The alternative, adding qualifiers to the names in the list of RelDataTypeField that come from Table.getRowType(RelDataTypeFactory).getFieldList(), would complicate a lot of things.
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          we should model column-groups as records

          I think it makes sense to consider the following:
          1) List of 'columns' in group might depend on the actual row.
          2) It might require non-trivial IO get the list of columns (see #1).

          BTW: it would be nice if jdbcschema could perform lazy-loading of required table/column names.

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - we should model column-groups as records I think it makes sense to consider the following: 1) List of 'columns' in group might depend on the actual row. 2) It might require non-trivial IO get the list of columns (see #1). BTW: it would be nice if jdbcschema could perform lazy-loading of required table/column names.
          Hide
          jamestaylor James Taylor added a comment -

          FWIW, in Phoenix the user declares the list of columns in a column family at DDL time, so no IO would be required. If there are other column qualifiers outside of this list that want to be accessed, then the user must declare them at query time (using the syntax outlined in CALCITE-493).

          Show
          jamestaylor James Taylor added a comment - FWIW, in Phoenix the user declares the list of columns in a column family at DDL time, so no IO would be required. If there are other column qualifiers outside of this list that want to be accessed, then the user must declare them at query time (using the syntax outlined in CALCITE-493 ).
          Hide
          julianhyde Julian Hyde added a comment -

          Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/175d0705.

          In the change I have implemented, if a field is a record type, you can access its fields (and recursively their fields) without qualifiers if the field names are unique.

          So, this is useful for any document-oriented schema that is strongly typed, not just Phoenix. Phoenix would just use two levels (a column group is represented as a record, and each column in that group is a field of that record), but we could handle any number of levels, and we can handle a row type whose fields are a mixture of scalars and records.

          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/175d0705 . In the change I have implemented, if a field is a record type, you can access its fields (and recursively their fields) without qualifiers if the field names are unique. So, this is useful for any document-oriented schema that is strongly typed, not just Phoenix. Phoenix would just use two levels (a column group is represented as a record, and each column in that group is a field of that record), but we could handle any number of levels, and we can handle a row type whose fields are a mixture of scalars and records.
          Hide
          jamestaylor James Taylor added a comment -

          Julian Hyde - Couple of questions on this one:

          • What happens if a field is not unique among records and you try to reference it by it's name? For example:
                CREATE TABLE T(k INTEGER PRIMARY KEY, a.v VARCHAR, b.v VARCHAR);
                SELECT v FROM T;
                

            FWIW, Phoenix would throw a SQLException in this case

          • What about the case in which there's a column name that conflicts with the record field name, like this?
                CREATE TABLE T(k INTEGER PRIMARY KEY, a.k VARCHAR);
                SELECT k FROM T;
                

            FWIW, Phoenix allows this with the column name having preference.

          Show
          jamestaylor James Taylor added a comment - Julian Hyde - Couple of questions on this one: What happens if a field is not unique among records and you try to reference it by it's name? For example: CREATE TABLE T(k INTEGER PRIMARY KEY, a.v VARCHAR, b.v VARCHAR); SELECT v FROM T; FWIW, Phoenix would throw a SQLException in this case What about the case in which there's a column name that conflicts with the record field name, like this? CREATE TABLE T(k INTEGER PRIMARY KEY, a.k VARCHAR); SELECT k FROM T; FWIW, Phoenix allows this with the column name having preference.
          Hide
          julianhyde Julian Hyde added a comment -

          > What happens if a field is not unique among records and you try to reference it by it's name?

          If it's not unique at a given depth, it will find neither. I will say 'column v not found' or similar.

          > What about the case in which there's a column name that conflicts with the record field name, like this?

          If it's unique at a given depth Calcite will not look deeper. So, select k from T will use k, not a.k.

          Sounds like this is inconsistent with Phoenix. But I didn't want to bake in a 2 level scheme, where records (column families) are inferior to columns.

          Show
          julianhyde Julian Hyde added a comment - > What happens if a field is not unique among records and you try to reference it by it's name? If it's not unique at a given depth, it will find neither. I will say 'column v not found' or similar. > What about the case in which there's a column name that conflicts with the record field name, like this? If it's unique at a given depth Calcite will not look deeper. So, select k from T will use k, not a.k. Sounds like this is inconsistent with Phoenix. But I didn't want to bake in a 2 level scheme, where records (column families) are inferior to columns.
          Hide
          jamestaylor James Taylor added a comment -

          If it's unique at a given depth Calcite will not look deeper. So, select k from T will use k, not a.k.

          It's actually consistent with Phoenix

          If it's not unique at a given depth, it will find neither. I will say 'column v not found' or similar.

          Would it be possible to indicate in some way that it's ambiguous (i.e. differentiate this from the case where none are found)? It might be confusing otherwise.

          Show
          jamestaylor James Taylor added a comment - If it's unique at a given depth Calcite will not look deeper. So, select k from T will use k, not a.k. It's actually consistent with Phoenix If it's not unique at a given depth, it will find neither. I will say 'column v not found' or similar. Would it be possible to indicate in some way that it's ambiguous (i.e. differentiate this from the case where none are found)? It might be confusing otherwise.
          Hide
          julianhyde Julian Hyde added a comment - - edited

          Would it be possible to indicate in some way that it's ambiguous (i.e. differentiate this from the case where none are found)? It might be confusing otherwise.

          Yes; please open a new issue.

          Show
          julianhyde Julian Hyde added a comment - - edited Would it be possible to indicate in some way that it's ambiguous (i.e. differentiate this from the case where none are found)? It might be confusing otherwise. Yes; please open a new issue.
          Hide
          jamestaylor James Taylor added a comment -

          Filed CALCITE-555 for this new enhancement request.

          Show
          jamestaylor James Taylor added a comment - Filed CALCITE-555 for this new enhancement request.
          Hide
          julianhyde Julian Hyde added a comment -

          Closing now that 1.0.0-incubating has been released.

          Show
          julianhyde Julian Hyde added a comment - Closing now that 1.0.0-incubating has been released.

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              jamestaylor James Taylor
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development