Home > Software > BIGDATA > HADOOP
Interview Questions   Tutorials   Discussions   Programs   Videos   Discussion   

HADOOP - how would you modify that solution to only count the number of unique words in all the documents?

asked SRVMTrainings October 30, 2014 06:32 PM  

how would you modify that solution to only count the number of unique words in all the documents?


4 Answers

answered By   0  

for (IntWritable val : values) { sum += val.get(); break; }

   add comment

answered By   0  

With the existing code on wordcount if we include a functionality of either toLower or toUpper on key(Word) we can count only distinct words , otherwise there might be a possibility of duplicates like the and The

   add comment

answered By   0  
 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            context.write(word, one);
 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
    public void reduce(Text key, Iterable<IntWritable> values, Context context) 
      throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        context.write(key, new IntWritable(sum));
   add comment

answered By   0  
 its is in  wordcount program
   add comment

Your answer

Join with account you already have



 Write A Tutorials
Online-Classroom Classes

  1 person following this question

  4 people following this tag

  Question tags

hadoop × 7

Asked 4 years and 18 days ago ago
Number of Views -487
Number of Answers -4
Last updated
1 year and 11 months ago ago

Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!