STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez DagId: psftepm_20180715132049_37efeb68-4ca7-408b-8780-f355f5608e95:5 Edges: Map 1 <- Map 2 (CUSTOM_EDGE) DagName: psftepm_20180715132049_37efeb68-4ca7-408b-8780-f355f5608e95:5 Vertices: Map 1 Map Operator Tree: TableScan alias: my_fact Statistics: Num rows: 13893263 Data size: 4052475726 Basic stats: COMPLETE Column stats: NONE GatherStats: false Filter Operator isSamplingPred: false predicate: join_col is not null (type: boolean) Statistics: Num rows: 13893263 Data size: 4052475726 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: bucket_col (type: string), join_col (type: string), accounting_period (type: string) outputColumnNames: _col0, _col1, _col3 Statistics: Num rows: 13893263 Data size: 4052475726 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 Estimated key counts: Map 2 => 81 keys: 0 _col1 (type: string) 1 _col0 (type: string) outputColumnNames: _col0, _col3, _col4 input vertices: 1 Map 2 Position of Big Table: 0 Statistics: Num rows: 15282589 Data size: 4457723395 Basic stats: COMPLETE Column stats: NONE BucketMapJoin: true Select Operator expressions: _col4 (type: string), _col3 (type: string), _col0 (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 15282589 Data size: 4457723395 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false GlobalTableId: 0 directory: hdfs://rcostgvlnx80:9000/tmp/hive/psftepm/111a132d-ad3b-4e55-8457-c98dc710565c/hive_2018-07-15_13-20-49_654_7270903714763671473-1/-mr-10001/.hive-staging_hive_2018-07-15_13-20-49_654_7270903714763671473-1/-ext-10002 NumFilesPerFileSink: 1 Statistics: Num rows: 15282589 Data size: 4457723395 Basic stats: COMPLETE Column stats: NONE Stats Publishing Key Prefix: hdfs://rcostgvlnx80:9000/tmp/hive/psftepm/111a132d-ad3b-4e55-8457-c98dc710565c/hive_2018-07-15_13-20-49_654_7270903714763671473-1/-mr-10001/.hive-staging_hive_2018-07-15_13-20-49_654_7270903714763671473-1/-ext-10002/ table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat properties: columns _col0,_col1,_col2 columns.types string:string:string escape.delim \ hive.serialization.extend.additional.nesting.levels true serialization.escape.crlf true serialization.format 1 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe TotalFiles: 1 GatherStats: false MultiFileSpray: false Execution mode: vectorized, llap LLAP IO: all inputs Path -> Alias: hdfs://rcostgvlnx80:9000/user/hive/warehouse/my_fact/fiscal_year=2015/accounting_period=10 [my_fact] Path -> Partition: hdfs://rcostgvlnx80:9000/user/hive/warehouse/my_fact/fiscal_year=2015/accounting_period=10 Partition base file name: accounting_period=10 input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat partition values: accounting_period 10 fiscal_year 2015 properties: COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"amt":"true","bucket_col":"true","join_col":"true"}} bucket_count 10 bucket_field_name bucket_col file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat location hdfs://rcostgvlnx80:9000/user/hive/warehouse/my_fact/fiscal_year=2015/accounting_period=10 name default.my_fact numFiles 10 numRows 13893263 partition_columns fiscal_year/accounting_period partition_columns.types string:string rawDataSize 4052475726 serialization.ddl struct my_fact { decimal(20,3) amt, string bucket_col, string join_col} serialization.format 1 serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde totalSize 31236106 transient_lastDdlTime 1531682018 serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat properties: bucket_count 10 bucket_field_name bucket_col bucketing_version 2 column.name.delimiter , columns amt,bucket_col,join_col columns.comments columns.types decimal(20,3):string:string file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat location hdfs://rcostgvlnx80:9000/user/hive/warehouse/my_fact name default.my_fact partition_columns fiscal_year/accounting_period partition_columns.types string:string serialization.ddl struct my_fact { decimal(20,3) amt, string bucket_col, string join_col} serialization.format 1 serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde transient_lastDdlTime 1531681712 serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.my_fact name: default.my_fact Truncated Path -> Alias: /my_fact/fiscal_year=2015/accounting_period=10 [my_fact] Map 2 Map Operator Tree: TableScan alias: t4 Statistics: Num rows: 1635 Data size: 304110 Basic stats: COMPLETE Column stats: NONE GatherStats: false Filter Operator isSamplingPred: false predicate: ((filter_col) IN ('VAL1', 'VAL2') and join_col is not null) (type: boolean) Statistics: Num rows: 818 Data size: 152148 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: join_col (type: string) outputColumnNames: _col0 Statistics: Num rows: 818 Data size: 152148 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) null sort order: a sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 818 Data size: 152148 Basic stats: COMPLETE Column stats: NONE tag: 1 auto parallelism: false Execution mode: vectorized, llap LLAP IO: all inputs Path -> Alias: hdfs://rcostgvlnx80:9000/user/hive/warehouse/my_dim [t4] Path -> Partition: hdfs://rcostgvlnx80:9000/user/hive/warehouse/my_dim Partition base file name: my_dim input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat properties: COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"filter_col":"true","join_col":"true"}} bucket_count -1 bucketing_version 2 column.name.delimiter , columns join_col,filter_col columns.comments columns.types string:string file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat location hdfs://rcostgvlnx80:9000/user/hive/warehouse/my_dim name default.my_dim numFiles 1 numRows 1635 rawDataSize 304110 serialization.ddl struct my_dim { string join_col, string filter_col} serialization.format 1 serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde totalSize 5149 transient_lastDdlTime 1531682744 serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat properties: COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"filter_col":"true","join_col":"true"}} bucket_count -1 bucketing_version 2 column.name.delimiter , columns join_col,filter_col columns.comments columns.types string:string file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat location hdfs://rcostgvlnx80:9000/user/hive/warehouse/my_dim name default.my_dim numFiles 1 numRows 1635 rawDataSize 304110 serialization.ddl struct my_dim { string join_col, string filter_col} serialization.format 1 serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde totalSize 5149 transient_lastDdlTime 1531682744 serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.my_dim name: default.my_dim Truncated Path -> Alias: /my_dim [t4] Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink