Home > Software > BIGDATA > HADOOP
Interview Questions   Tutorials   Discussions   Programs   Videos   Discussion   

HADOOP - What is the InputFormat ?

asked SRVMTrainings February 14, 2014 06:38 AM  

What is the InputFormat ?


3 Answers

answered By   0  

TextInputFormat, each line in the text file is a record. Key is the byte offset of the line and value is the content of the line.

For instance,Key: LongWritable, value: Text.

   add comment

answered By   0  
The default InputFormat is the TextInputFormat. This treats each line of each input file as a separate record, and performs no parsing. This is useful for unformatted data or line-based records like log files. A more interesting input format is the KeyValueInputFormat. This format also treats each line of input as a separate record. While the TextInputFormat treats the entire line as the value, the KeyValueInputFormat breaks the line itself into the key and value by searching for a tab character. This is particularly useful for reading the output of one MapReduce job as the input to another, as the default OutputFormat (described in more detail below) formats its results in this manner. Finally, the SequenceFileInputFormat reads special binary files that are specific to Hadoop. These files include many features designed to allow data to be rapidly read into Hadoop mappers. Sequence files are block-compressed and provide direct serialization and deserialization of several arbitrary data types (not just text). Sequence files can be generated as the output of other MapReduce tasks and are an efficient intermediate representation for data that is passing from one MapReduce job to anther.
   add comment

answered By   0  

The InputFormat defines how to read data from a file into the Mapper instances. Hadoop comes with several implementations of InputFormat; some work with text files and describe different ways in which the text files can be interpreted. Others, like SequenceFileInputFormat, are purpose-built for reading particular binary file formats. These types are described in more detail in Module 4.

More powerfully, you can define your own InputFormat implementations to format the input to your programs however you want. For example, the default TextInputFormat reads lines of text files. The key it emits for each record is the byte offset of the line read (as a LongWritable), and the value is the contents of the line up to the terminating \'\\n\' character (as a Text object). If you have multi-line records each separated by a $ character, you could write your own InputFormat that parses files into records split on this character instead.

   add comment

Your answer

Join with account you already have



 Write A Tutorials
Online-Classroom Classes

  1 person following this question

  4 people following this tag

  Question tags

hadoop × 7

Asked 4 years and 13 days ago ago
Number of Views -438
Number of Answers -3
Last updated
2 years and 8 months ago ago

  Similar questions

Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!