Pig
  1. Pig
  2. PIG-992

[zebra] Separate Schema-related files into a "Schema" package

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.4.0
    • Fix Version/s: 0.6.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      The patch file needs to be applied after Jira 986's patch has been applied.

      Description

      The hope is to facilitate future sharing of the Schema codes between different modules and/or products.

      1. SchemaPackageChange.patch
        378 kB
        Yan Zhou
      2. SchemaPackageChange.patch
        378 kB
        Yan Zhou
      3. SchemaPackageChange.patch
        366 kB
        Yan Zhou

        Issue Links

          Activity

          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12421180/SchemaPackageChange.patch
          against trunk revision 821101.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 183 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/59/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421180/SchemaPackageChange.patch against trunk revision 821101. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 183 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/59/console This message is automatically generated.
          Hide
          Yan Zhou added a comment -

          Additional changes to some test files to avoid use of non-default lzo compression.

          Show
          Yan Zhou added a comment - Additional changes to some test files to avoid use of non-default lzo compression.
          Hide
          Chao Wang added a comment -

          Patch reviewed. +1

          Show
          Chao Wang added a comment - Patch reviewed. +1
          Hide
          Hong Tang added a comment -

          Comments:

          • In many places, both types.ParseException and schema.ParseException are thrown. Do you really want both?
          • In the following
            +public enum ColumnType implements Writable {
            

            Is the Writable interface actually used? You have rather odd pattern of asymmetric readFields and write:

            +  @Override
            +  public void readFields(DataInput in) throws IOException {
            +    // no op, instantiated by the caller
            +  }
            +
            +  @Override
            +  public void write(DataOutput out) throws IOException {
            +    Utils.writeString(out, name);
            +  }
            
          • In the following code
            +  public static class ColumnSchema {
            +    public String name;
            +    public ColumnType type;
            +    public Schema schema;
            +    public int index; // field index in schema
            

            Exposing fields as all-public seems like a bad idea.

          • Is there a specific usage case to allow schema to be mutable at any time? (minor nit: the comment says add a field, but the code seems to add a column to the schema).
            +  /**
            +   * add a field
            +   */
            +  public void add(ColumnSchema f) throws ParseException
            +  {
            +    add(f, false);
            +  }
            
          • Why Schema.equals(Object) is not implemented on top of the static version of the method (or vice versa)?
          • In Schema.readFields(), the Version string from the input is not checked for compatibility.
          • In the following
            +  private void init(String[] columnNames) throws ParseException {
            +    // the arg must be of type or they will be treated as the default type
            +    // TODO: verify column names don't contain COLUMN_DELIMITER
            

            It seems that the TODO should not involve too much work and please consider not deferring it later.

          • Need more detailed documentation on the spec of the parameter for Schema.getColumnSchema(String name)
            +  /**
            +   * Get a column's schema
            +   */
            +  public ColumnSchema getColumnSchema(String name) throws ParseException
            +  {
            
          • Schema.getColumnSchemaOnParsedName and Schema.getColumnSchema seems to be copy/paste code.
          • Schema.getColumnSchema(ParsedName pn) has side effect of modifying the parameter pn. The javadoc reads cryptic to me.
          • There are many classes generated by JavaCC. It is probably better not including them in the patch (and put the generated source under build/src).

          Other minor issues:

          • Typically contrib projects should use the version string as the parent project.
          • Style: there are some very long lines.
          • There are a few white space changes. That should be avoided if possible.
          • In the following
            +    } catch (org.apache.hadoop.zebra.schema.ParseException e) {
            +      throw new AssertionError("Invalid Projection: "+e.getMessage());
            

            consider change AssertionError to IllegalArgumentException.

          • In the following:
            +  /*
            +   * helper class to parse a column name string one section at a time and find the required
            +   * type for the parsed part.
            +   */
            +  public static class ParsedName {
            +    public String mName;
            +    int mKeyOffset; // the offset where the keysstring starts
            +    public ColumnType mDT = ColumnType.ANY; // parent's type
            

            The description seems to indicate that this should not be a public class. I tried to understand the body of the class and do not feel that it serves a general purpose.

          • The following seems like useless assignment:
            +  private long mVersion = schemaVersion;
            
          •   /**
            +   * Normalize the schema string.
            +   * 
            +   * @param value
            +   *          the input string representation of the schema.
            +   * @return the normalized string representation.
            +   */
            +  public static String normalize(String value) {
            +    String result = new String();
            +
            +    if (value == null || value.trim().isEmpty())
            +      return result;
            +
            +    StringBuilder sb = new StringBuilder();
            +    String[] parts = value.trim().split(COLUMN_DELIMITER);
            +    for (int nx = 0; nx < parts.length; nx++) {
            +      if (nx > 0) sb.append(COLUMN_DELIMITER);
            +      sb.append(parts[nx].trim());
            +    }
            +    return sb.toString();
            +  }
            
            

            There is a wasted value.trim().

          • In Schema.equals(Object), instead of comparing class equality, using "instanceof" is typically better.
          • Use StringBuilder instead in the following code:
            +    String merged = new String();
            +    for (int i = 0; i < columnNames.length; i++) {
            +      if (i > 0) merged += ",";
            +      merged += columnNames[i];
            +    }
            
          • There are a few indentation problems.
          Show
          Hong Tang added a comment - Comments: In many places, both types.ParseException and schema.ParseException are thrown. Do you really want both? In the following +public enum ColumnType implements Writable { Is the Writable interface actually used? You have rather odd pattern of asymmetric readFields and write: + @Override + public void readFields(DataInput in) throws IOException { + // no op, instantiated by the caller + } + + @Override + public void write(DataOutput out) throws IOException { + Utils.writeString(out, name); + } In the following code + public static class ColumnSchema { + public String name; + public ColumnType type; + public Schema schema; + public int index; // field index in schema Exposing fields as all-public seems like a bad idea. Is there a specific usage case to allow schema to be mutable at any time? (minor nit: the comment says add a field, but the code seems to add a column to the schema). + /** + * add a field + */ + public void add(ColumnSchema f) throws ParseException + { + add(f, false); + } Why Schema.equals(Object) is not implemented on top of the static version of the method (or vice versa)? In Schema.readFields(), the Version string from the input is not checked for compatibility. In the following + private void init(String[] columnNames) throws ParseException { + // the arg must be of type or they will be treated as the default type + // TODO: verify column names don't contain COLUMN_DELIMITER It seems that the TODO should not involve too much work and please consider not deferring it later. Need more detailed documentation on the spec of the parameter for Schema.getColumnSchema(String name) + /** + * Get a column's schema + */ + public ColumnSchema getColumnSchema(String name) throws ParseException + { Schema.getColumnSchemaOnParsedName and Schema.getColumnSchema seems to be copy/paste code. Schema.getColumnSchema(ParsedName pn) has side effect of modifying the parameter pn. The javadoc reads cryptic to me. There are many classes generated by JavaCC. It is probably better not including them in the patch (and put the generated source under build/src). Other minor issues: Typically contrib projects should use the version string as the parent project. Style: there are some very long lines. There are a few white space changes. That should be avoided if possible. In the following + } catch (org.apache.hadoop.zebra.schema.ParseException e) { + throw new AssertionError("Invalid Projection: "+e.getMessage()); consider change AssertionError to IllegalArgumentException. In the following: + /* + * helper class to parse a column name string one section at a time and find the required + * type for the parsed part. + */ + public static class ParsedName { + public String mName; + int mKeyOffset; // the offset where the keysstring starts + public ColumnType mDT = ColumnType.ANY; // parent's type The description seems to indicate that this should not be a public class. I tried to understand the body of the class and do not feel that it serves a general purpose. The following seems like useless assignment: + private long mVersion = schemaVersion; /** + * Normalize the schema string. + * + * @param value + * the input string representation of the schema. + * @return the normalized string representation. + */ + public static String normalize(String value) { + String result = new String(); + + if (value == null || value.trim().isEmpty()) + return result; + + StringBuilder sb = new StringBuilder(); + String[] parts = value.trim().split(COLUMN_DELIMITER); + for (int nx = 0; nx < parts.length; nx++) { + if (nx > 0) sb.append(COLUMN_DELIMITER); + sb.append(parts[nx].trim()); + } + return sb.toString(); + } There is a wasted value.trim(). In Schema.equals(Object), instead of comparing class equality, using "instanceof" is typically better. Use StringBuilder instead in the following code: + String merged = new String(); + for (int i = 0; i < columnNames.length; i++) { + if (i > 0) merged += ","; + merged += columnNames[i]; + } There are a few indentation problems.
          Hide
          Yan Zhou added a comment -

          Mostly address the review comments by Hong. Most significant changes include removal of JavaCC generated sources from the repository, unification of the ParseException class generated by both the storage parser and schema parser, hide internal members of the Schema.ColumnSchema class.

          Show
          Yan Zhou added a comment - Mostly address the review comments by Hong. Most significant changes include removal of JavaCC generated sources from the repository, unification of the ParseException class generated by both the storage parser and schema parser, hide internal members of the Schema.ColumnSchema class.
          Hide
          Alan Gates added a comment -

          Canceling old patches.

          Show
          Alan Gates added a comment - Canceling old patches.
          Hide
          Alan Gates added a comment -

          Resubmitting patch so hudson will rerun

          Show
          Alan Gates added a comment - Resubmitting patch so hudson will rerun
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12421879/SchemaPackageChange.patch
          against trunk revision 824446.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 183 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/18/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/18/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/18/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421879/SchemaPackageChange.patch against trunk revision 824446. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 183 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/18/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/18/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/18/console This message is automatically generated.
          Hide
          Alan Gates added a comment -

          Patch checked in.

          Show
          Alan Gates added a comment - Patch checked in.

            People

            • Assignee:
              Yan Zhou
              Reporter:
              Yan Zhou
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development