Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-268

Distinguishing file missing/corruption for low replication files

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      In PIG-856, there's a discussion about reducing the replication factor for intermediate files between jobs.
      I've seen users doing the same in mapreduce jobs getting some speed up. (I believe their outputs were too small to benefit from the pipelining.)

      Problem is, when users start changing replications to 1 (or 2), ops starts seeing alerts from fsck and HADOOP-4103 even with a single datanode failure.
      Also, problem of Namenode not getting out of safemode when restarted.

      My answer has been asking the users "please don't change the replication less than 3".
      But is this the right approach?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              knoguchi Koji Noguchi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: