How to append to an hdfs file on an extremely small cluster (3 nodes or less) Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of How to append to an hdfs file on an extremely small cluster (3 nodes or less) without wasting too much if your time.

The question is published on by Tutorial Guruji team.

I am trying to append to a file on an hdfs on a single node cluster. I also tried on a 2 node cluster but get the same exceptions.

In hdfs-site, I have dfs.replication set to 1. If I set dfs.client.block.write.replace-datanode-on-failure.policy to DEFAULT I get the following exception

java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[10.10.37.16:50010], original=[10.10.37.16:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.

If I follow the recommendation in the comment for the configuration in hdfs-default.xml for extremely small clusters (3 nodes or less) and set dfs.client.block.write.replace-datanode-on-failure.policy to NEVER I get the following exception:

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot append to file/user/hadoop/test. Name node is in safe mode.
The reported blocks 1277 has reached the threshold 1.0000 of total blocks 1277. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 3 seconds.

Here’s how I try to append:

Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://MY-MACHINE:8020/user/hadoop");
conf.set("hadoop.job.ugi", "hadoop");

FileSystem fs = FileSystem.get(conf);
OutputStream out = fs.append(new Path("/user/hadoop/test"));

PrintWriter writer = new PrintWriter(out);
writer.print("hello world");
writer.close();

Is there something I am doing wrong in the code? maybe, there is something missing in the configuration? Any help will be appreciated!

EDIT

Even though that dfs.replication is set to 1, when I check the status of the file through

FileStatus[] status = fs.listStatus(new Path("/user/hadoop"));

I find that status[i].block_replication is set to 3. I don’t think that this the problem because when I changed the value of dfs.replication to 0 I got a relevant exception. So apparently it does indeed obey the value of dfs.replication but to be on the safe side, is there a way to change the block_replication value per file?

Answer

As I mentioned in the edit. Even though the dfs.replication is set to 1, fileStatus.block_replication is set to 3.

A possible solution is to run

hadoop fs -setrep -w 1 -R /user/hadoop/

Which will change the replication factor for each file recursively in the given directory. Documentation for the command can be found here.

What to be done now is to look why the value in hdfs-site.xml is ignored. And how to force the value 1 to be the default.

EDIT

It turns out that the dfs.replication property has to be set in the Configuration instance too, otherwise it requests that the replication factor for the file be the default which is 3 regardless of the value set in hdfs-site.xml

Adding to the code the following statement will solve it.

conf.set("dfs.replication", "1");
We are here to answer your question about How to append to an hdfs file on an extremely small cluster (3 nodes or less) - If you find the proper solution, please don't forgot to share this with your team members.

Related Posts

Tutorial Guruji