Home > Software > BIGDATA > HADOOP
Interview Questions   Tutorials   Discussions   Programs   Videos   Discussion   

HADOOP - Whats is Distributed Cache in Hadoop




2725
views
asked SRVMTrainings March 12, 2015 04:44 AM  

Whats is Distributed Cache in Hadoop


           

4 Answers



 
answered By Lalapeta77   0  

Distributed Cache is a facility provided by the MapReduce framework to cache files (text, archives, jars and so on) needed by applications during execution of the job. The framework will copy the necessary files to the slave node before any tasks for the job are executed on that node.

flag   
   add comment

 
answered By vijaya   0  

Rather than serializing side data in the job configuration, it is preferable to distribute datasets using Hadoop’s distributed cache mechanism. This provides a service for copying files and archives to the task nodes in time for the tasks to use them when they run. To save network bandwidth, files are normally copied to any particular node once per job.

You can use the distributed cache for copying files that do not fit in memory. MapFiles are very useful in this regard, since they serve as an on-disk lookup format (see “MapFile” on page 137). Because MapFiles are a collection of files with a defined directory structure, you should put them into an archive format (JAR, ZIP, TAR, or gzipped TAR) and add them to the cache using the -archives option.

flag   
   add comment

 
answered By Kumar-1003   0  

Can you please provide an example how do we use distributed cache .

flag   
   add comment

 
answered By   0  
To run a Pipes job, we need to run Hadoop in pseudo-distributed mode (where all daem ons run on the local machine). Pipes doesn’t run in standalone (local) mode, since it relies on Hadoop’s distributed cache mechanism, which works only when HDFS is running.

Rather than serializing side data in the job configuration, it is preferable to distribute datasets using Hadoop’s distributed cache mechanism. This provides a service for copying files and archives to the task nodes in time for the tasks to use them when they run. To save network bandwidth, files are normally copied to any particular node once per job.
flag   
   add comment

Your answer

Join with account you already have

FF

Preview


Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!

Alert