For small files, use CombineFileInputFormat instead of FileInputFormat
To get name of the file that you are processing, you can poll job config to get the value of setting map.input.file
To give a unique line number of each file, use the offset combined with the file name makes its unique within the whole file system.
Use KeyValueTextInputFormat to read a key value file separated by tab. You can specify the separator via the key.value.separator.in.input.line property.
To read Sequence files as input use SequenceFileInputFormat.
To change the output delimiter for Key Value, set setting mapred.textoutputformat.separator
Use MultipleTextOutputFormat to format data in multiple formats.
Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!