HADOOP - what is meaning Replication factor?

asked SRVMTrainings October 11, 2014 07:18 AM  

what is meaning Replication factor?


5 Answers

Replication factor defines about the number of replications in a single cluster.defoult replication is factor 3.we can manipulate the replication factor.

Replication Factor tells about data replication on multiple nodes by the way we can achieve high fault tolerant and high availablity

Mean when a datanode is down, Since we have copy of same in another datanode toelrance level is good and it can provide service to client with out any interreption and data loss

Replication factor in HDFS defines how many time each data block will be replicated in the cluster environment.

HDFS uses rack-aware replica replacement policy. According to this policy 2 copies of data blocks are stored on the same Rack & 1 copy is stored on a different Rack.
Two blocks of data should be one rack and other copy in another rack . This information should be maintained Rack - awareness script.
Replication factor defines the number of times a given data block is stored in the cluster. The default replication factor is 3. This also means that you need to have 3times the amount of storage needed to store the data. Each file is split into data blocks and spread across the cluster.
