Home > Software > BIGDATA > HADOOP
Interview Questions   Tutorials   Discussions   Programs   Videos   Discussion   

HADOOP - What is the meaning of speculative execution in Hadoop? Why is it important?




4140
views
asked SRVMTrainings April 8, 2015 02:03 AM  

What is the meaning of speculative execution in Hadoop? Why is it important?


           

7 Answers



 
answered By   0  

If a node appears to be running slow, the master node can redundantly execute another instance of the same task and first output will be taken .this process is called as Speculative execution.

flag   
   add comment

 
answered By Lalapeta77   0  

If any job is running on particular DN and it'll get slow or down then immediately JobTracker(JT will get heartbeat of DN & TT every 3rd second and block report of every 10th second) will assign the same job to another replication of active DN then it's called as "Speculative Execution".

flag   
   add comment

 
answered By Tushar   0  

If any job is taking more time to complete than expected then Hadoop creates one more copy of same job thats called speculative exclusion.Which ever finishes first other one will be killed. By default speculative exclusion is on but hadoop admin can turn it off depends on cluster need.

flag   
   add comment

 
answered By   0  

If any job is taking more time to complete than expected then Hadoop creates one more copy of same job thats called speculative exclusion.Which ever finishes first other one will be killed. By default speculative exclusion is on but hadoop admin can turn it off depends on cluster need.

flag   
   add comment

 
answered By marvit   0  

One problem with the Hadoop system is that by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program.

Tasks may be slow for various reasons, including hardware degradation, or software mis-configuration, but the causes may be hard to detect since the tasks still complete successfully, albeit after a longer time than expected. Hadoop doesn’t try to diagnose and fix slow-running tasks; instead, it tries to detect when a task is running slower than expected and launches another, equivalent, task as a backup. This is termed speculative execution of tasks.

For example if one node has a slow disk controller, then it may be reading its input at only 10% the speed of all the other nodes. So when 99 map tasks are already complete, the system is still waiting for the final map task to check in, which takes much longer than all the other nodes.

By forcing tasks to run in isolation from one another, individual tasks do not know where their inputs come from. Tasks trust the Hadoop platform to just deliver the appropriate input. Therefore, the same input can be processed multiple times in parallel, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform. This process is known as speculative execution. When tasks complete, they announce this fact to the JobTracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the TaskTrackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first.

Speculative execution is enabled by default. You can disable speculative execution for the mappers and reducers by setting the mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution JobConf options to false, respectively using old API, while with newer API you may consider changing mapreduce.map.speculative and mapreduce.reduce.speculative.

So to answer your question it does start afresh and has nothing to do with how much the other task has done/completed.

flag   
   add comment

 
answered By pallav   0  
Hadoop uses this speculative execution to mitigate the slow-task problem. Hadoop does not wait for a task to get failed. Whenever it is seen that a task is running slow, the Hadoop platform will schedule redundant copies of that task across several nodes which do not have other work to perform. The parallel tasks are monitored. As soon as one of the parallel tasks finishes successfully, Hadoop will use its output and kill
the other parallel tasks. This process is termed as speculative execution.

For enabling and disabling speculative execution of map and reduce tasks, the configuration parameters(Boolean) are mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution. By default both are set to true.



flag   
   add comment

 
answered By   0  
if task-tracker is not sending heartbeats to job-tracker then that particular task will be re-scheduled to another machine is called speculative execution.by default, speculative execution in hadoop  can be disabled.  
flag   
   add comment

Your answer

Join with account you already have

FF

Preview


Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!

Alert