The number of mappers are usually driven by the distributed file systems block size. For instance if the input file is 2GB and the block size is 64MB, the number of maps will be set to 32 (2000MB / 64MB). The reason why this is agood solution is because each mapper task will get its own block to work on. We followed this approach when we did the WordCount experiments and suggests that each machine
should execute between 10 and 100 mapper tasks.
Correct number of reducers must also be considered before executing a job. A function given as 1.75 (number of nodes mapred.tasktracker.reduce.tasks.maximum) will make the faster nodes finish their initial round of reduces first, when the initial round is done they immediately start a new round. Load-balancing will be provided by following this approach . We followed this approach in theWord-Count experiments.