Home > Software > BIGDATA > HADOOP
Interview Questions   Tutorials   Discussions   Programs   Videos   Discussion   

HADOOP - Hadoop optimizations

asked Experts-976 November 16, 2014 02:59 AM  

Hadoop optimizations


1 Answers

answered By Experts-976   0  
  1. For small files, use CombineFileInputFormat instead of FileInputFormat

  2. To get name of the file that you are processing, you can poll job config to get the value of setting map.input.file

  3. To give a unique line number of each file, use the offset combined with the file name makes its unique within the whole file system.

  4. Use KeyValueTextInputFormat to read a key value file separated by tab. You can specify the separator via the key.value.separator.in.input.line property.

  5. To read Sequence files as input use SequenceFileInputFormat.

  6. To change the output delimiter for Key Value, set setting mapred.textoutputformat.separator

  7. Use MultipleTextOutputFormat to format data in multiple formats.

   add comment

Your answer

Join with account you already have



Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!