Home > Software > BIGDATA > HADOOP
Interview Questions   Tutorials   Discussions   Programs   Videos   Discussion   

HADOOP - How to write output to multiple named files in Hadoop




941
views
asked Experts-976 November 16, 2014 02:54 AM  

How to write output to multiple named files in Hadoop using MultipleTextOutputFormat


           

1 Answers



 
answered By Experts-976   0  

This can be done in Hadoop by using MultipleTextOutputFormat class. The following is a simple example implementation of MultipleTextOutputFormat class which will read the file above and create 2 output files Name and Age The code where the action happens is highlighted with >

package org.myorg;

import java.io.*;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapred.lib.*;


public class mult{
        static class Map extends MapReduceBase implements Mapper {
                public void map(LongWritable key, Text value,OutputCollector output, Reporter reporter) throws IOException {
                        String [] dall=value.toString().split(":");
                        output.collect(new Text(dall[0]),new Text(dall[1]));
                }
        }

        static class Reduce extends MapReduceBase implements Reducer {
                public void reduce(Text key, Iterator values,OutputCollector output, Reporter reporter) throws IOException {
                        while (values.hasNext()) {
                                output.collect(key, values.next());
                        }
                }
        }




>   static class MultiFileOutput extends
> MultipleTextOutputFormat {
>                 protected String generateFileNameForKeyValue(Text key,
> Text value,String name) {
>                         return key.toString();
>                 }
>         }


        public static void main(String[] args) throws Exception {
                String InputFiles=args[0];
                String OutputDir=args[1];

                Configuration mycon=new Configuration();
                JobConf conf = new JobConf(mycon,mult.class);

                conf.setOutputKeyClass(Text.class);
                conf.setMapOutputKeyClass(Text.class);
                conf.setOutputValueClass(Text.class);

                conf.setMapperClass(Map.class);
                conf.setReducerClass(Reduce.class);

                conf.setInputFormat(TextInputFormat.class);


> conf.setOutputFormat(MultiFileOutput.class);

                FileInputFormat.setInputPaths(conf,InputFiles);
                FileOutputFormat.setOutputPath(conf,new Path(OutputDir));
                JobClient.runJob(conf);

        }
}

The output would be files Name and Age. File Name contains data

Name    Nish
Name    Dash

File Age contains data

Age     27
Age     29
flag   
   add comment

Your answer

Join with account you already have

FF

Preview


Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!

Alert