Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.4.0
-
None
-
None
Description
normally, we name the fields after the corresponding LogiclalPlan or DataFrame API, but they are not consistent in protos, for example, the column name:
message UnresolvedRegex {
// (Required) The column name used to extract column with regex.
string col_name = 1;
}
message Alias { // (Required) The expression that alias will be added on. Expression expr = 1; // (Required) a list of name parts for the alias. // // Scalar columns only has one name that presents. repeated string name = 2; // (Optional) Alias metadata expressed as a JSON map. optional string metadata = 3; }
// Relation of type [[Deduplicate]] which have duplicate rows removed, could consider either only // the subset of columns or all the columns. message Deduplicate { // (Required) Input relation for a Deduplicate. Relation input = 1; // (Optional) Deduplicate based on a list of column names. // // This field does not co-use with `all_columns_as_keys`. repeated string column_names = 2; // (Optional) Deduplicate based on all the columns of the input relation. // // This field does not co-use with `column_names`. optional bool all_columns_as_keys = 3; }
// Computes basic statistics for numeric and string columns, including count, mean, stddev, min, // and max. If no columns are given, this function computes statistics for all numerical or // string columns. message StatDescribe { // (Required) The input relation. Relation input = 1; // (Optional) Columns to compute statistics on. repeated string cols = 2; }
we probably should unify the naming:
single column -> `column`
multi columns -> `columns`