Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-3745

contrib/PIG example fails when column metadata exists for CF

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Duplicate
    • None
    • None
    • Normal

    Description

      I have a sandbox CF for prototyping and it has 17 Secondary Indexes defined. When I would run the contrib/PIG example, using pig 0.8.1 and even the pig 0.8.3 jar, with Cassandra 1.0.6, I would receive the following error from the second line of the example script [ cols = FOREACH rows GENERATE flatten(columns); ]:

      2012-01-14 06:54:27,551 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1007: Found duplicates in schema. : 18 columns. Please alias the columns with unique names.

      I proceeded to drop all of the indexes, and tried again. Same error. On further inspection, show schema showed that the metadata still existed on the CF from the indexes. I ran the following:

      update column family user with column_metadata = [];

      I can now run the full contrib/pig example against my CF.

      *If I select another CF with 2 secondary indexes, the same behaviour persists:

      2012-01-14 08:34:31,413 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1007: Found duplicates in schema. : 3 columns. Please alias the columns with unique names.

      grunt> describe users;
      2012-01-14 08:36:58,227 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      users: {key: bytearray,columns: {T: (name: chararray,value: bytearray,column_family: chararray,value: bytearray,owner_id: chararray,value: bytearray)}}
      grunt>

      grunt> dump users;
      <-- removed INFO/WARN output -->

      HadoopVersion PigVersion UserId StartedAt FinishedAt Features
      0.20.2 0.8.1 sasha 2012-01-14 08:37:24 2012-01-14 08:37:43 UNKNOWN

      Success!

      Job Stats (time in seconds):
      JobId Alias Feature Outputs
      job_local_0001 users MAP_ONLY file:/tmp/temp-1366421017/tmp-1001688304,

      Input(s):
      Successfully read records from: "cassandra://sdo/entity_relations"

      Output(s):
      Successfully stored records in: "file:/tmp/temp-1366421017/tmp-1001688304"

      Job DAG:
      job_local_0001

      (d1540edc-cb16-47dd-96e3-90e1657c2d77:a721966c6026ee85ef35f2108b75d3784b52bf1217f0b62564bdefe67b9504d9,

      {(content_id,d1540edc-cb16-47dd-96e3-90e1657c2d77:a721966c6026ee85ef35f2108b75d3784b52bf1217f0b62564bdefe67b9504d9),(owner_id,d1540edc-cb16-47dd-96e3-90e1657c2d77)}

      )
      grunt>

      I have also tried this with PIG 0.9.1 but encounter https://issues.apache.org/jira/browse/CASSANDRA-3371

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sdolgy Sasha Dolgy
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: