Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: master
    • Labels:
      None

      Description

      We should add a table description field, probably in zookeeper, to store an optional field end users can use to provide their relevant information on the table.

      1. ACCUMULO-650-2.patch
        50 kB
        Kevin Faro
      2. ACCUMULO-650.patch
        48 kB
        Kevin Faro

        Issue Links

          Activity

          Hide
          Kevin Faro added a comment -

          So far I have added a desc command to the shell to display the descriptions, added a flag to tables to display the tables and the descriptions, added a flag to createtable to set the description.

          the desc command will either display the description for the table or it will update it using a flag.

          EX:

          user@instance table> desc
          table             [description of table]
          user@instance table> desc -u 'new description'
          table             [new description]
          

          Should I add the ability to set the description when importing a table or a directory? Also, should I clone the description of the table when the table is cloned?

          Show
          Kevin Faro added a comment - So far I have added a desc command to the shell to display the descriptions, added a flag to tables to display the tables and the descriptions, added a flag to createtable to set the description. the desc command will either display the description for the table or it will update it using a flag. EX: user@instance table> desc table [description of table] user@instance table> desc -u 'new description' table [new description] Should I add the ability to set the description when importing a table or a directory? Also, should I clone the description of the table when the table is cloned?
          Hide
          David Medinets added a comment -

          The new description can be "Clone of XXXX; description ... "

          Show
          David Medinets added a comment - The new description can be "Clone of XXXX; description ... "
          Hide
          Josh Elser added a comment -

          I'm be more of the mindset that clone/import would fully replicate the old table, including the description in this case. Just my gut feeling, though.

          Changing descriptions every time a directory is imported could be really messy for an import-heavy workload.

          Show
          Josh Elser added a comment - I'm be more of the mindset that clone/import would fully replicate the old table, including the description in this case. Just my gut feeling, though. Changing descriptions every time a directory is imported could be really messy for an import-heavy workload.
          Hide
          David Medinets added a comment -

          Does a cloned table have any provenance? Maybe a cloned table should have "Cloned from XXXX at YYYY-MM-DD HH:MM:SS" appended to the end of the description? I haven't thought about this deeply. Is there any UUID associated with a table? On the other hand, since cloned tables are mutable, does it matter?

          Show
          David Medinets added a comment - Does a cloned table have any provenance? Maybe a cloned table should have "Cloned from XXXX at YYYY-MM-DD HH:MM:SS" appended to the end of the description? I haven't thought about this deeply. Is there any UUID associated with a table? On the other hand, since cloned tables are mutable, does it matter?
          Hide
          Josh Elser added a comment -

          Table ids aren't anything more than a hex one-up counter. Also, a clone is lazy in that doesn't make a copy of the data until it changes as compared to the table it was cloned from (as in, both tables will point to the same file on disk right after the clone).

          Personally, I would expect a user doing something like..

          > createtable foo
          foo> desc -u 'full of foo!'
          foo> insert bar bar bar bar
          foo> clonetable foo bar
          bar> desc -u 'full of bar!'
          

          I guess I envision the common use case would be that someone clone a table, and then want to update the description with why they cloned it. Mostly thinking out loud here too

          Show
          Josh Elser added a comment - Table ids aren't anything more than a hex one-up counter. Also, a clone is lazy in that doesn't make a copy of the data until it changes as compared to the table it was cloned from (as in, both tables will point to the same file on disk right after the clone). Personally, I would expect a user doing something like.. > createtable foo foo> desc -u 'full of foo!' foo> insert bar bar bar bar foo> clonetable foo bar bar> desc -u 'full of bar!' I guess I envision the common use case would be that someone clone a table, and then want to update the description with why they cloned it. Mostly thinking out loud here too
          Hide
          David Medinets added a comment -

          Since we're thinking. I've used clone in the following manner. I
          import a lot of stuff into one table. Then clone it so that other
          people can work with that data. Then I continue to work with the
          original table. My changes don't affect the other people. Dumb
          question time ... can clones be used like git for data? Can a clone be
          thought of as a tag or branch?

          Show
          David Medinets added a comment - Since we're thinking. I've used clone in the following manner. I import a lot of stuff into one table. Then clone it so that other people can work with that data. Then I continue to work with the original table. My changes don't affect the other people. Dumb question time ... can clones be used like git for data? Can a clone be thought of as a tag or branch?
          Hide
          Kevin Faro added a comment -

          I am going to take the path of least resistance and just default the descriptions of the imported and cloned tables to be:

          CLONE of <srcTableName>
          IMPORTED from path: <path>
          

          We can create another ticket to make that more advanced later on if needed.

          Show
          Kevin Faro added a comment - I am going to take the path of least resistance and just default the descriptions of the imported and cloned tables to be: CLONE of <srcTableName> IMPORTED from path: <path> We can create another ticket to make that more advanced later on if needed.
          Hide
          Kevin Faro added a comment -

          Please take a careful look at the patch - there is a lot going on here. It was a bigger patch that I expected it to be.

          Also, I am not sure why there was a change to TabletServerStatus. I don't think I changed anything in the thrift files that would cause a field to be optional instead of required ... but I left it in the patch since it was compiled from thrift.

          Show
          Kevin Faro added a comment - Please take a careful look at the patch - there is a lot going on here. It was a bigger patch that I expected it to be. Also, I am not sure why there was a change to TabletServerStatus. I don't think I changed anything in the thrift files that would cause a field to be optional instead of required ... but I left it in the patch since it was compiled from thrift.
          Hide
          Eric Newton added a comment - - edited

          Table ids aren't anything more than a hex one-up counter.

          They are base-36.

          Show
          Eric Newton added a comment - - edited Table ids aren't anything more than a hex one-up counter. They are base-36.
          Hide
          Eric Newton added a comment -

          Kevin, can you add some unit tests?

          Show
          Eric Newton added a comment - Kevin, can you add some unit tests?
          Hide
          Kevin Faro added a comment -

          Sure, no problem.

          Show
          Kevin Faro added a comment - Sure, no problem.
          Hide
          Keith Turner added a comment -

          I reviewed the patch and compiled some comments. Then I thought of a totally different way to go about this. I decided to include the comments as they may be informative.

          • this brings the # of create table methods to 6. I am thinking it would be good to introduce a config object (maybe class NewTableConfig) thats passed to create table. Like BatchWriterConfig that was introduced in 1.5. This way as we add options to create table in the future, we do not need to add new create methods.
          • I do not think the FATE operation needs to get the tableNameLock, its getting the table write lock on the table. The table name lock is special lock in the master for table names, the table name is not being changed.
          • Do not need get the old table description and check that in the zookeeper mutate call. Do not even need to call mutate, just set the desciption. Mutate is useful for applying a function to existing data in zookeeper, that is not the case here. For example, mutate would be useful for adding one to something in zookeeper. Use mutate for compare and set operations.
          • should check that the user has alter_table permission for updateDesc
          • do we want to the user to have a permission (like read table, to get the desciption?)
          • If you deserialize an old TableInfo object (e.g. from FATE op create in 1.5), then tableDescription may be null. So would need to handle null. Same for ImportedTableInfo.
          • Export table output some informational data in the export file. I think it would be nice to include the table desc. I think it would be good to do this as a sub-task, after this initial task is done.

          While reviewing the shell code, I started thinking of an entirely different way to go about doing this. Use a new per table config. For example, could do the following in the shell.

            config -t table_foo -s table.description=test
          

          This would naturally be copied by export and clone. Setting a config already requires alter table.

          Show
          Keith Turner added a comment - I reviewed the patch and compiled some comments. Then I thought of a totally different way to go about this. I decided to include the comments as they may be informative. this brings the # of create table methods to 6. I am thinking it would be good to introduce a config object (maybe class NewTableConfig) thats passed to create table. Like BatchWriterConfig that was introduced in 1.5. This way as we add options to create table in the future, we do not need to add new create methods. I do not think the FATE operation needs to get the tableNameLock, its getting the table write lock on the table. The table name lock is special lock in the master for table names, the table name is not being changed. Do not need get the old table description and check that in the zookeeper mutate call. Do not even need to call mutate, just set the desciption. Mutate is useful for applying a function to existing data in zookeeper, that is not the case here. For example, mutate would be useful for adding one to something in zookeeper. Use mutate for compare and set operations. should check that the user has alter_table permission for updateDesc do we want to the user to have a permission (like read table, to get the desciption?) If you deserialize an old TableInfo object (e.g. from FATE op create in 1.5), then tableDescription may be null. So would need to handle null. Same for ImportedTableInfo. Export table output some informational data in the export file. I think it would be nice to include the table desc. I think it would be good to do this as a sub-task, after this initial task is done. While reviewing the shell code, I started thinking of an entirely different way to go about doing this. Use a new per table config. For example, could do the following in the shell. config -t table_foo -s table.description=test This would naturally be copied by export and clone. Setting a config already requires alter table.
          Hide
          Kevin Faro added a comment -

          I added a few unit test to the ShellTests.

          Also, I added the check to make sure the user had alterTable permissions ... that was my bad, just missed that.

          I had the same thought with populating a TableConfig object of some sort, but didn't know your guy's convention and didn't want to introduce a new layer of abstraction if not neeeded.

          I guess we should figure out our path forward whether it should be an entry in ZooKeeper or a config option before I proceed any further ...

          Show
          Kevin Faro added a comment - I added a few unit test to the ShellTests. Also, I added the check to make sure the user had alterTable permissions ... that was my bad, just missed that. I had the same thought with populating a TableConfig object of some sort, but didn't know your guy's convention and didn't want to introduce a new layer of abstraction if not neeeded. I guess we should figure out our path forward whether it should be an entry in ZooKeeper or a config option before I proceed any further ...
          Hide
          Keith Turner added a comment -

          I think the per table config route would be best. Ultimately its still sored in zookeeper, just in a more systematic way. This would involve adding an enum to o.a.a.core.conf.Property

          could still have API methods in table operation to set and get desc. These would be similar to methods for manipulating locality groups and iterator settings, which are also per table configs. Also, if you are still thinking of adding something to create table, I still think we should add a NewTableConfig method.

          Would still have an option on tables command to list descriptions. Would we still need the desc command? I do not really like the name, but I am uncertain about the functionality. As far as functionality, it may make things slightly simpler for the user rather than using the config command. Also having a specific shell command for setting it puts a layer of indirection between the implementation and the user.

          Show
          Keith Turner added a comment - I think the per table config route would be best. Ultimately its still sored in zookeeper, just in a more systematic way. This would involve adding an enum to o.a.a.core.conf.Property could still have API methods in table operation to set and get desc. These would be similar to methods for manipulating locality groups and iterator settings, which are also per table configs. Also, if you are still thinking of adding something to create table, I still think we should add a NewTableConfig method. Would still have an option on tables command to list descriptions. Would we still need the desc command? I do not really like the name, but I am uncertain about the functionality. As far as functionality, it may make things slightly simpler for the user rather than using the config command. Also having a specific shell command for setting it puts a layer of indirection between the implementation and the user.
          Hide
          Christopher Tubbs added a comment -

          ACCUMULO-2841 incorporates this feature, in a more generic way.

          Show
          Christopher Tubbs added a comment - ACCUMULO-2841 incorporates this feature, in a more generic way.
          Hide
          Christopher Tubbs added a comment -

          Do we want to include this specific specialized per-table property in addition to the user-defined properties added in ACCUMULO-2841? I think 2841 pretty much satisfies all the use-cases. Any objections to closing?

          Show
          Christopher Tubbs added a comment - Do we want to include this specific specialized per-table property in addition to the user-defined properties added in ACCUMULO-2841 ? I think 2841 pretty much satisfies all the use-cases. Any objections to closing?
          Hide
          Christopher Tubbs added a comment -

          Closing this due to lack of activity, and because it is effectively superceded by ACCUMULO-2841.

          Show
          Christopher Tubbs added a comment - Closing this due to lack of activity, and because it is effectively superceded by ACCUMULO-2841 .

            People

            • Assignee:
              Kevin Faro
              Reporter:
              John Vines
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development