Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-1047

Get hadoop ColumnFamily metadata from describe_keyspace instead of config file

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 0.6.4
    • Component/s: None
    • Labels:
      None

      Description

      Requiring the Hadoop job to contain a copy of the Cassandra configuration file is clunky and error-prone. Instead, the Hadoop job should get an IP and port to contact for the range map and for CF metadata (with describe_keyspace).

      1. 1047.txt
        27 kB
        Jonathan Ellis

        Issue Links

          Activity

          Hide
          jeromatron Jeremy Hanna added a comment -

          I've got the seed host and rpc port into the ConfigHelper. I can also get certain keyspace information from describe_keyspace for what's needed in ColumnFamilyInputFormat.

          However, there are two other things that may require some more configuration - ColumnFamilyInputFormat also needs the Authenticator and the Paritioner. I'll try to mock those in the ConfigHelper as well and see where it gets me.

          Show
          jeromatron Jeremy Hanna added a comment - I've got the seed host and rpc port into the ConfigHelper. I can also get certain keyspace information from describe_keyspace for what's needed in ColumnFamilyInputFormat. However, there are two other things that may require some more configuration - ColumnFamilyInputFormat also needs the Authenticator and the Paritioner. I'll try to mock those in the ConfigHelper as well and see where it gets me.
          Hide
          jeromatron Jeremy Hanna added a comment -

          Need to add the reconciler as part of the keyspace attributes... just so I don't forget.

          Show
          jeromatron Jeremy Hanna added a comment - Need to add the reconciler as part of the keyspace attributes... just so I don't forget.
          Hide
          jeromatron Jeremy Hanna added a comment -

          Jonathan - it may be that we need to have a cassandra configuration file for things that use hadoop. I have it in a state where it works fine without the configuration file for the word_count example. But then when I was going to see what needed to happen for the pig code, I found that there really isn't any code involved. It just uses the cassandra.yaml (on trunk) to get the configuration and uses the pig with the cassandra storage to do everything - no code required.

          So in effect we would have to write another configuration mechanism other than cassandra.yaml for just those bits that it or MapReduce needs.

          I can do that - maybe a simpler version of the configuration file or something which is then passed in so that it never touches DatabaseDescriptor. I just wasn't sure if it was worth it since we know we'll need some sort of configuration file it appears.

          Show
          jeromatron Jeremy Hanna added a comment - Jonathan - it may be that we need to have a cassandra configuration file for things that use hadoop. I have it in a state where it works fine without the configuration file for the word_count example. But then when I was going to see what needed to happen for the pig code, I found that there really isn't any code involved. It just uses the cassandra.yaml (on trunk) to get the configuration and uses the pig with the cassandra storage to do everything - no code required. So in effect we would have to write another configuration mechanism other than cassandra.yaml for just those bits that it or MapReduce needs. I can do that - maybe a simpler version of the configuration file or something which is then passed in so that it never touches DatabaseDescriptor. I just wasn't sure if it was worth it since we know we'll need some sort of configuration file it appears.
          Hide
          jeromatron Jeremy Hanna added a comment -

          This add-reconciler patch is just the diff of adding the reconciler to the attributes in the map that describe_keyspace returns.

          I thought it would be good separately so it could be applied independent of the rest.

          Show
          jeromatron Jeremy Hanna added a comment - This add-reconciler patch is just the diff of adding the reconciler to the attributes in the map that describe_keyspace returns. I thought it would be good separately so it could be applied independent of the rest.
          Hide
          jbellis Jonathan Ellis added a comment -

          so pig expects anyone running a job, to keep a local copy of the cluster config in sync with the real one?

          Show
          jbellis Jonathan Ellis added a comment - so pig expects anyone running a job, to keep a local copy of the cluster config in sync with the real one?
          Hide
          jeromatron Jeremy Hanna added a comment - - edited

          No, sorry, it's just that with the word_count, there is a java program that needs to be written for the map reduce anyway, so I could put the configuration information into the config helper for that case.

          For the pig stuff, all that's needed is the CassandraStorage file - which is probably going to move into core. Other than that it's just relying on libraries and you get to run pig scripts or the pig shell - no other java programming needed. So for that, it's either that we need to use the configuration file for cassandra that's already there - based on the static initialization of the vars in DatabaseDescriptor, or we have to create a mechanism to configure it.

          Does that make more sense?

          Show
          jeromatron Jeremy Hanna added a comment - - edited No, sorry, it's just that with the word_count, there is a java program that needs to be written for the map reduce anyway, so I could put the configuration information into the config helper for that case. For the pig stuff, all that's needed is the CassandraStorage file - which is probably going to move into core. Other than that it's just relying on libraries and you get to run pig scripts or the pig shell - no other java programming needed. So for that, it's either that we need to use the configuration file for cassandra that's already there - based on the static initialization of the vars in DatabaseDescriptor, or we have to create a mechanism to configure it. Does that make more sense?
          Hide
          jeromatron Jeremy Hanna added a comment -

          I wonder if there could be some way to just have the subset of configuration information that it needs in a cassandra.yaml file. I can look into that - that way, they're not overspecifying having to come up with all that info that they don't really need, and it's still in a configuration file so code wouldn't be required for the case of pig.

          Show
          jeromatron Jeremy Hanna added a comment - I wonder if there could be some way to just have the subset of configuration information that it needs in a cassandra.yaml file. I can look into that - that way, they're not overspecifying having to come up with all that info that they don't really need, and it's still in a configuration file so code wouldn't be required for the case of pig.
          Hide
          jeromatron Jeremy Hanna added a comment -

          since this issue is on hold for a little while, created another issue to get the reconciler info.

          Show
          jeromatron Jeremy Hanna added a comment - since this issue is on hold for a little while, created another issue to get the reconciler info.
          Hide
          jbellis Jonathan Ellis added a comment -

          patch for 0.6 attached; adds describe_partitioner method

          Show
          jbellis Jonathan Ellis added a comment - patch for 0.6 attached; adds describe_partitioner method
          Hide
          jeromatron Jeremy Hanna added a comment -

          +1 - again - we'll have to see how Pig will be configured without the storage-conf.xml.

          Show
          jeromatron Jeremy Hanna added a comment - +1 - again - we'll have to see how Pig will be configured without the storage-conf.xml.
          Hide
          jbellis Jonathan Ellis added a comment -

          env variables for pig make sense to me.

          Show
          jbellis Jonathan Ellis added a comment - env variables for pig make sense to me.
          Hide
          jbellis Jonathan Ellis added a comment -

          created CASSANDRA-1322 to follow up w/ Pig. the rest is committed to 0.6.4

          Show
          jbellis Jonathan Ellis added a comment - created CASSANDRA-1322 to follow up w/ Pig. the rest is committed to 0.6.4

            People

            • Assignee:
              jbellis Jonathan Ellis
              Reporter:
              jbellis Jonathan Ellis
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development