Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5154

catalogd hangs trying to load an unpartitioned Kudu table

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Catalog
    • Labels:
      None
    • Epic Color:
      ghx-label-5

      Description

      I created a Kudu table from the API which is unpartitioned (using setRangePartitions({}) from the Java client). This is allowed (it creates a single-tablet table) but the lack of partitioning scheme seems to confuse Impala. The catalogd is logging:

      TProtocolException: Required field 'partition_by' was not present! Struct: TKuduTable(table_name:IntegrationTestBigLinkedList, master_addresses:[xxx], key_columns:[key1, key2], partition_by:null)

        Activity

        Hide
        mjacobs Matthew Jacobs added a comment -

        Hm, I wasn't able to repro this very easily using the python client to create the table. Do you expect this python code to set up the table in the right way?

            schema_builder = SchemaBuilder()
            column_spec = schema_builder.add_column("id", INT64)
            column_spec.nullable(False)
            schema_builder.set_primary_keys(["id"])
            schema = schema_builder.build()
            name = "%s.single_partition" % unique_database
        
            kudu_client.create_table(name, schema,
                partitioning=Partitioning().set_range_partition_columns(["id"]))
        

        Then I am able to create an external table in Impala for this table, and it seems to be OK.

        The Kudu web UI shows that this has only a single tablet:

        Partition Schema
        RANGE (id) (
            PARTITION UNBOUNDED
        )
        
        Tablets
        Tablet ID	RANGE (id) PARTITION	State	Message	Peers
        541caeee02a64e8cacc27ca35b68995f	UNBOUNDED	Running	Tablet reported with an active leader
        
        Show
        mjacobs Matthew Jacobs added a comment - Hm, I wasn't able to repro this very easily using the python client to create the table. Do you expect this python code to set up the table in the right way? schema_builder = SchemaBuilder() column_spec = schema_builder.add_column( "id" , INT64) column_spec.nullable(False) schema_builder.set_primary_keys([ "id" ]) schema = schema_builder.build() name = "%s.single_partition" % unique_database kudu_client.create_table(name, schema, partitioning=Partitioning().set_range_partition_columns([ "id" ])) Then I am able to create an external table in Impala for this table, and it seems to be OK. The Kudu web UI shows that this has only a single tablet: Partition Schema RANGE (id) ( PARTITION UNBOUNDED ) Tablets Tablet ID RANGE (id) PARTITION State Message Peers 541caeee02a64e8cacc27ca35b68995f UNBOUNDED Running Tablet reported with an active leader
        Hide
        tlipcon Todd Lipcon added a comment -

        try using .set_range_partition_columns([]) instead?

        Show
        tlipcon Todd Lipcon added a comment - try using .set_range_partition_columns([]) instead?
        Hide
        mjacobs Matthew Jacobs added a comment -

        commit 58286bda7a62251f3841fa431013bdb009c6bfaf
        Author: Matthew Jacobs <mj@cloudera.com>
        Date: Tue Apr 4 10:00:55 2017 -0700

        IMPALA-5154: Handle 'unpartitioned' Kudu tables

        The catalogd was hanging trying to load an unpartitioned
        Kudu table created outside of Impala. This fixes an
        assumption made in KuduTable.java that the list of
        'partition by' expressions is not empty. Regardless, the
        list on the thrift structure must be created because the
        field is marked required.

        Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d
        Reviewed-on: http://gerrit.cloudera.org:8080/6560
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        mjacobs Matthew Jacobs added a comment - commit 58286bda7a62251f3841fa431013bdb009c6bfaf Author: Matthew Jacobs <mj@cloudera.com> Date: Tue Apr 4 10:00:55 2017 -0700 IMPALA-5154 : Handle 'unpartitioned' Kudu tables The catalogd was hanging trying to load an unpartitioned Kudu table created outside of Impala. This fixes an assumption made in KuduTable.java that the list of 'partition by' expressions is not empty. Regardless, the list on the thrift structure must be created because the field is marked required. Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d Reviewed-on: http://gerrit.cloudera.org:8080/6560 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            mjacobs Matthew Jacobs
            Reporter:
            tlipcon Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development