In current implementation of HDFS HA using QJOURNAL there is no way to add a new Journalnode(JN) to an existing JN quorum or reinstall a failed JN machine.
The current process to populate JN directories is:
- Start JN daemons on multiple machines (usually an odd number 3 or 5)
- Shutdown Namenode
- Issue hdfs namenode -initializeSharedEdits - This will populate JN
After JN are populated; if a machine, after hardware failure, is reinstalled or a new set of machines are added to expand the JN quorum the new JN machines will not be populated by NameNode without following the current process that is described above.
The current process causes downtime on a 24x7 operation cluster if JN needs any maintenance.
Although, one can follow steps given below to work around the issue described above:
1. Install a new JN or reinstall an existing JN machine.
2. Created the required JN directory structure
3. Copy VERSION file from an existing JN to JN's current directory
4. Manually create paxos directory under JN's current directory
5. Start the JN daemon.
6. Add new set of JNs to hdfs-site.xml and restart NN