Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: Table API & SQL
    • Labels:
      None

      Description

      We define two structure mode to hold statistics
      1. TableStats: contain stats for table level, now only one element: rowCount
      2. ColumnStats: contain stats of column level.
      for numeric column type: including ndv, nullCount, max, min, histogram
      for string type: including ndv, nullCount, avgLen,maxLen
      for boolean:including ndv, nullCount, trueCount, falseCount
      for date/time/timestamp: including ndv, nullCount, max, min, histogram

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user kaibozhou opened a pull request:

          https://github.com/apache/flink/pull/3357

          FLINK-5566 [table] Exception when do filter after join a UDTF which returns a POJO type

          Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
          If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
          In addition to going through the list, please provide a meaningful description of your changes.

          • [x] General
          • The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
          • The pull request addresses only one issue
          • Each commit in the PR has a meaningful commit message (including the JIRA id)
          • [ ] Documentation
          • Documentation has been added for new functionality
          • Old documentation affected by the pull request has been updated
          • JavaDoc for public methods has been added
          • [x] Tests & Build
          • Functionality added by the pull request is covered by tests
          • `mvn clean verify` has been executed successfully locally or a Travis build has passed

          This PR will fix the case: do filter after join a UDTF which returns a POJO type

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/kaibozhou/flink flink-5827

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3357.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3357


          commit 0abc4f471f93a5f68ec67c27e8e94f531c36de94
          Author: 宝牛 <baoniu@taobao.com>
          Date: 2017-02-20T05:43:07Z

          FLINK-5566 [table] Exception when do filter after join a udtf which returns a POJO type


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user kaibozhou opened a pull request: https://github.com/apache/flink/pull/3357 FLINK-5566 [table] Exception when do filter after join a UDTF which returns a POJO type Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide] ( http://flink.apache.org/how-to-contribute.html ). In addition to going through the list, please provide a meaningful description of your changes. [x] General The pull request references the related JIRA issue (" [FLINK-XXX] Jira title text") The pull request addresses only one issue Each commit in the PR has a meaningful commit message (including the JIRA id) [ ] Documentation Documentation has been added for new functionality Old documentation affected by the pull request has been updated JavaDoc for public methods has been added [x] Tests & Build Functionality added by the pull request is covered by tests `mvn clean verify` has been executed successfully locally or a Travis build has passed This PR will fix the case: do filter after join a UDTF which returns a POJO type You can merge this pull request into a Git repository by running: $ git pull https://github.com/kaibozhou/flink flink-5827 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3357.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3357 commit 0abc4f471f93a5f68ec67c27e8e94f531c36de94 Author: 宝牛 <baoniu@taobao.com> Date: 2017-02-20T05:43:07Z FLINK-5566 [table] Exception when do filter after join a udtf which returns a POJO type
          Hide
          fhueske Fabian Hueske added a comment -

          Implemented with 663c1e3f773ab1a19f8fb87a8fb5a7f95496cc36

          Show
          fhueske Fabian Hueske added a comment - Implemented with 663c1e3f773ab1a19f8fb87a8fb5a7f95496cc36
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3196

          Merged.
          Thanks @beyond1920!

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3196 Merged. Thanks @beyond1920!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3196

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3196
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3196

          Thanks for the update @beyond1920!
          PR is good to merge.

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3196 Thanks for the update @beyond1920! PR is good to merge.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3196#discussion_r100984668

          — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala —
          @@ -0,0 +1,52 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.table.plan.stats
          +
          +/**
          + * column statistics
          + *
          + * @param ndv number of distinct values
          + * @param nullCount number of nulls
          + * @param avgLen average length of column values
          + * @param maxLen max length of column values
          + * @param max max value of column values
          + * @param min min value of column values
          + */
          +case class ColumnStats(
          + ndv: Long,
          + nullCount: Long,
          + avgLen: Long,
          + maxLen: Long,
          + max: Option[Any],
          + min: Option[Any]) {
          — End diff –

          Yes, I think you are right @beyond1920

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3196#discussion_r100984668 — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala — @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.plan.stats + +/** + * column statistics + * + * @param ndv number of distinct values + * @param nullCount number of nulls + * @param avgLen average length of column values + * @param maxLen max length of column values + * @param max max value of column values + * @param min min value of column values + */ +case class ColumnStats( + ndv: Long, + nullCount: Long, + avgLen: Long, + maxLen: Long, + max: Option [Any] , + min: Option [Any] ) { — End diff – Yes, I think you are right @beyond1920
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user beyond1920 commented on the issue:

          https://github.com/apache/flink/pull/3196

          @fhueske , thanks for your review. I modify code based on your advice, including compatibility with Java and column stats field type.

          Show
          githubbot ASF GitHub Bot added a comment - Github user beyond1920 commented on the issue: https://github.com/apache/flink/pull/3196 @fhueske , thanks for your review. I modify code based on your advice, including compatibility with Java and column stats field type.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user beyond1920 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3196#discussion_r100958745

          — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala —
          @@ -0,0 +1,52 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.table.plan.stats
          +
          +/**
          + * column statistics
          + *
          + * @param ndv number of distinct values
          + * @param nullCount number of nulls
          + * @param avgLen average length of column values
          + * @param maxLen max length of column values
          + * @param max max value of column values
          + * @param min min value of column values
          + */
          +case class ColumnStats(
          + ndv: Long,
          + nullCount: Long,
          + avgLen: Long,
          + maxLen: Long,
          + max: Option[Any],
          + min: Option[Any]) {
          — End diff –

          It makes sense to add a field to denote whether stats are precise or approximate and a field to hold timestamp when the stats were generated. But I'm not sure how these two fields effects the optimized plan. Because we prefer to use the provided stats, even it is estimated value or it is a little stale. So I didn't add these two fields currently, maybe will add them later. What do you think?

          Show
          githubbot ASF GitHub Bot added a comment - Github user beyond1920 commented on a diff in the pull request: https://github.com/apache/flink/pull/3196#discussion_r100958745 — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala — @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.plan.stats + +/** + * column statistics + * + * @param ndv number of distinct values + * @param nullCount number of nulls + * @param avgLen average length of column values + * @param maxLen max length of column values + * @param max max value of column values + * @param min min value of column values + */ +case class ColumnStats( + ndv: Long, + nullCount: Long, + avgLen: Long, + maxLen: Long, + max: Option [Any] , + min: Option [Any] ) { — End diff – It makes sense to add a field to denote whether stats are precise or approximate and a field to hold timestamp when the stats were generated. But I'm not sure how these two fields effects the optimized plan. Because we prefer to use the provided stats, even it is estimated value or it is a little stale. So I didn't add these two fields currently, maybe will add them later. What do you think?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user beyond1920 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3196#discussion_r100951579

          — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala —
          @@ -0,0 +1,52 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.table.plan.stats
          +
          +/**
          + * column statistics
          + *
          + * @param ndv number of distinct values
          + * @param nullCount number of nulls
          + * @param avgLen average length of column values
          + * @param maxLen max length of column values
          + * @param max max value of column values
          + * @param min min value of column values
          + */
          +case class ColumnStats(
          + ndv: Long,
          + nullCount: Long,
          + avgLen: Long,
          — End diff –

          ok

          Show
          githubbot ASF GitHub Bot added a comment - Github user beyond1920 commented on a diff in the pull request: https://github.com/apache/flink/pull/3196#discussion_r100951579 — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala — @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.plan.stats + +/** + * column statistics + * + * @param ndv number of distinct values + * @param nullCount number of nulls + * @param avgLen average length of column values + * @param maxLen max length of column values + * @param max max value of column values + * @param min min value of column values + */ +case class ColumnStats( + ndv: Long, + nullCount: Long, + avgLen: Long, — End diff – ok
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user beyond1920 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3196#discussion_r100950729

          — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala —
          @@ -0,0 +1,52 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.table.plan.stats
          +
          +/**
          + * column statistics
          + *
          + * @param ndv number of distinct values
          + * @param nullCount number of nulls
          + * @param avgLen average length of column values
          + * @param maxLen max length of column values
          + * @param max max value of column values
          + * @param min min value of column values
          + */
          +case class ColumnStats(
          + ndv: Long,
          — End diff –

          @fhueske , there is no need to make all stats optional. If there is no statistics for ndv/nullcount/avgLen/maxLen, we could give them an invalid value, e.g, -1. But it does not work for max/min, because max/min value could be possible negative.

          Show
          githubbot ASF GitHub Bot added a comment - Github user beyond1920 commented on a diff in the pull request: https://github.com/apache/flink/pull/3196#discussion_r100950729 — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala — @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.plan.stats + +/** + * column statistics + * + * @param ndv number of distinct values + * @param nullCount number of nulls + * @param avgLen average length of column values + * @param maxLen max length of column values + * @param max max value of column values + * @param min min value of column values + */ +case class ColumnStats( + ndv: Long, — End diff – @fhueske , there is no need to make all stats optional. If there is no statistics for ndv/nullcount/avgLen/maxLen, we could give them an invalid value, e.g, -1. But it does not work for max/min, because max/min value could be possible negative.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3196#discussion_r100851670

          — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala —
          @@ -0,0 +1,52 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.table.plan.stats
          +
          +/**
          + * column statistics
          + *
          + * @param ndv number of distinct values
          + * @param nullCount number of nulls
          + * @param avgLen average length of column values
          + * @param maxLen max length of column values
          + * @param max max value of column values
          + * @param min min value of column values
          + */
          +case class ColumnStats(
          + ndv: Long,
          + nullCount: Long,
          + avgLen: Long,
          — End diff –

          I think `Int` should be sufficient for value length.

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3196#discussion_r100851670 — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala — @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.plan.stats + +/** + * column statistics + * + * @param ndv number of distinct values + * @param nullCount number of nulls + * @param avgLen average length of column values + * @param maxLen max length of column values + * @param max max value of column values + * @param min min value of column values + */ +case class ColumnStats( + ndv: Long, + nullCount: Long, + avgLen: Long, — End diff – I think `Int` should be sufficient for value length.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3196#discussion_r100851830

          — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala —
          @@ -0,0 +1,52 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.table.plan.stats
          +
          +/**
          + * column statistics
          + *
          + * @param ndv number of distinct values
          + * @param nullCount number of nulls
          + * @param avgLen average length of column values
          + * @param maxLen max length of column values
          + * @param max max value of column values
          + * @param min min value of column values
          + */
          +case class ColumnStats(
          + ndv: Long,
          + nullCount: Long,
          + avgLen: Long,
          + maxLen: Long,
          + max: Option[Any],
          + min: Option[Any]) {
          — End diff –

          Does it make sense to denote whether stats are precise or approximate? Also an optional field could hold the a timestamp when the stats were generated.

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3196#discussion_r100851830 — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala — @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.plan.stats + +/** + * column statistics + * + * @param ndv number of distinct values + * @param nullCount number of nulls + * @param avgLen average length of column values + * @param maxLen max length of column values + * @param max max value of column values + * @param min min value of column values + */ +case class ColumnStats( + ndv: Long, + nullCount: Long, + avgLen: Long, + maxLen: Long, + max: Option [Any] , + min: Option [Any] ) { — End diff – Does it make sense to denote whether stats are precise or approximate? Also an optional field could hold the a timestamp when the stats were generated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3196#discussion_r100851572

          — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala —
          @@ -0,0 +1,52 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.table.plan.stats
          +
          +/**
          + * column statistics
          + *
          + * @param ndv number of distinct values
          + * @param nullCount number of nulls
          + * @param avgLen average length of column values
          + * @param maxLen max length of column values
          + * @param max max value of column values
          + * @param min min value of column values
          + */
          +case class ColumnStats(
          + ndv: Long,
          — End diff –

          Make all stats optional?

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3196#discussion_r100851572 — Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala — @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.plan.stats + +/** + * column statistics + * + * @param ndv number of distinct values + * @param nullCount number of nulls + * @param avgLen average length of column values + * @param maxLen max length of column values + * @param max max value of column values + * @param min min value of column values + */ +case class ColumnStats( + ndv: Long, — End diff – Make all stats optional?
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user beyond1920 opened a pull request:

          https://github.com/apache/flink/pull/3196

          FLINK-5566 [Table API & SQL]Introduce structure to hold table and column level statistics

          This pr aims to introduce structure to hold table and column level statistics.
          TableStats: Responsible for hold table level statistics
          ColumnStats: Responsible for hold column level statistics.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/alibaba/flink flink-5566

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3196.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3196


          commit cadc16eefb0e0a9002e536a48b4b9f6824b6ab23
          Author: 槿瑜 <jinyu.zj@alibaba-inc.com>
          Date: 2017-01-24T06:34:01Z

          Introduce structure to hold table and column level statistics


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user beyond1920 opened a pull request: https://github.com/apache/flink/pull/3196 FLINK-5566 [Table API & SQL] Introduce structure to hold table and column level statistics This pr aims to introduce structure to hold table and column level statistics. TableStats: Responsible for hold table level statistics ColumnStats: Responsible for hold column level statistics. You can merge this pull request into a Git repository by running: $ git pull https://github.com/alibaba/flink flink-5566 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3196.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3196 commit cadc16eefb0e0a9002e536a48b4b9f6824b6ab23 Author: 槿瑜 <jinyu.zj@alibaba-inc.com> Date: 2017-01-24T06:34:01Z Introduce structure to hold table and column level statistics

            People

            • Assignee:
              jinyu.zj jingzhang
              Reporter:
              ykt836 Kurt Young
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development