Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9962

Erasure Coding: need a way to test multiple EC policies

Details

    • Test
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Now that we support multiple EC policies, we need a way test it to catch potential issues.

      UPDATE:
      Since many existing EC unit tests need some changes to use multiple EC policies, it would be good to create a jira for every few tests. Based on some discussion in HDFS-7866 and HDFS-9962, this task follows the below idea.

      • Default policy (RS-6-3) always gets tested.
      • Create a random ec policy test if the test is long-running (e.g. using minicluster many times).
      • If the test is not long, all policies test with parametrizing would be better.

      If you are interested in this work, please join it and create a jira under this jira.

      Attachments

        Issue Links

          Activity

            lirui Rui Li added a comment -

            As I proposed in HDFS-7866, one possible solution is to randomly choose a policy in each test. Kai also suggested to make sure the default policy always gets tested.
            drankye, zhz, let's use this JIRA to reach an agreement. Would like to know your opinions. Thanks.

            lirui Rui Li added a comment - As I proposed in HDFS-7866 , one possible solution is to randomly choose a policy in each test. Kai also suggested to make sure the default policy always gets tested. drankye , zhz , let's use this JIRA to reach an agreement. Would like to know your opinions. Thanks.
            andrew.wang Andrew Wang added a comment -

            It looks like we still lack complete test coverage, I see that RS (10,4), RS (6,3), and XOR 2,1 are covered by subclasses of TestDFSStripedInputStream but we're still missing RS (3,2). Not sure how comprehensive other tests are either.

            andrew.wang Andrew Wang added a comment - It looks like we still lack complete test coverage, I see that RS (10,4), RS (6,3), and XOR 2,1 are covered by subclasses of TestDFSStripedInputStream but we're still missing RS (3,2). Not sure how comprehensive other tests are either.

            Hi, thank you for creating this jira , lirui, and thank you for organizing ec jiras lately, andrew.wang!

            Do you plan to work on this topic? If not, can I work on this jira?

            tasanuma Takanobu Asanuma added a comment - Hi, thank you for creating this jira , lirui , and thank you for organizing ec jiras lately, andrew.wang ! Do you plan to work on this topic? If not, can I work on this jira?
            lirui Rui Li added a comment -

            tasanuma0829, feel free to take the JIRA. Thanks.

            lirui Rui Li added a comment - tasanuma0829 , feel free to take the JIRA. Thanks.

            lirui, thanks for your reply. I'd like to take it.

            tasanuma Takanobu Asanuma added a comment - lirui , thanks for your reply. I'd like to take it.
            Sammi Sammi Chen added a comment -

            Hi tasanuma0829, do you still plan to work on this?

            Sammi Sammi Chen added a comment - Hi tasanuma0829 , do you still plan to work on this?

            Hi Sammi, yes I still plan to do. Sorry for late.

            tasanuma Takanobu Asanuma added a comment - Hi Sammi , yes I still plan to do. Sorry for late.

            Created HDFS-11823 for extending TestDFSStripedIutputStream/TestDFSStripedOutputStream/TestDFSStripedOutputStreamWithFailure.

            tasanuma Takanobu Asanuma added a comment - Created HDFS-11823 for extending TestDFSStripedIutputStream / TestDFSStripedOutputStream / TestDFSStripedOutputStreamWithFailure .

            It seems many existing EC unit tests need some changes if they use non-default EC policies. I think it would be good to create a jira for every few tests. I made this jira the umbrella jira. If you are interested in this work, please join it and create a jira under this jira. If you have any opinions, please let me know.

            tasanuma Takanobu Asanuma added a comment - It seems many existing EC unit tests need some changes if they use non-default EC policies. I think it would be good to create a jira for every few tests. I made this jira the umbrella jira. If you are interested in this work, please join it and create a jira under this jira. If you have any opinions, please let me know.

            Hello andrew.wang and drankye,

            While writing the patches for the subtasks of this jira, I thought it is better to check all EC policy every time by parametrized test, rather than choosing an EC policy randomly. Comparing to the random policy test (see HDFS-12547), the parametrized test (see HDFS-12587) keeps idempotency and simple code. As a disadvantage, the parameterized test may become more flakey (especially when using minicluster).

            I want to hear your opinions. Which do you think is better?

            tasanuma Takanobu Asanuma added a comment - Hello andrew.wang and drankye , While writing the patches for the subtasks of this jira, I thought it is better to check all EC policy every time by parametrized test, rather than choosing an EC policy randomly. Comparing to the random policy test (see HDFS-12547 ), the parametrized test (see HDFS-12587 ) keeps idempotency and simple code. As a disadvantage, the parameterized test may become more flakey (especially when using minicluster). I want to hear your opinions. Which do you think is better?
            andrew.wang Andrew Wang added a comment -

            Hi tasanuma0829, thanks for working on this,

            Is your concern about flakiness related to test timeouts? We've been struggling with that recently, since increasing the cell size from 64k to 1MB greatly increased test runtimes. The approach we've been taking is to split long running tests into separate classes, or "parameterizing" by creating a lot of subclasses.

            We randomized the policy tests to reduce the runtime of the test suite, under the belief that there wasn't much incremental benefit from running all of them every time. I think that's still true, and would prefer we randomize any long-running tests. If the tests run quickly, then it's okay to be exhaustive.

            andrew.wang Andrew Wang added a comment - Hi tasanuma0829 , thanks for working on this, Is your concern about flakiness related to test timeouts? We've been struggling with that recently, since increasing the cell size from 64k to 1MB greatly increased test runtimes. The approach we've been taking is to split long running tests into separate classes, or "parameterizing" by creating a lot of subclasses. We randomized the policy tests to reduce the runtime of the test suite, under the belief that there wasn't much incremental benefit from running all of them every time. I think that's still true, and would prefer we randomize any long-running tests. If the tests run quickly, then it's okay to be exhaustive.

            Thanks for you comments, andrew.wang. Yes, test timeouts may increase if we use parametrized tests.

            I understand. I will use random policy tests for long-running tests, and using all ec policies by the parameterized test when it doesn't use minicluster.

            tasanuma Takanobu Asanuma added a comment - Thanks for you comments, andrew.wang . Yes, test timeouts may increase if we use parametrized tests. I understand. I will use random policy tests for long-running tests, and using all ec policies by the parameterized test when it doesn't use minicluster.

            People

              tasanuma Takanobu Asanuma
              lirui Rui Li
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: