Home > Software > BIGDATA > HADOOP
Interview Questions   Tutorials   Discussions   Programs   Videos   Discussion   

HADOOP - The Most Important points HDFS

asked experts May 11, 2015 01:26 PM  

The Most Important points HDFS


1 Answer

answered By experts   0  
1)1 MB file stored in 128MB block with 3 replication. Then the file will be stored in 3 blocks and uses 3*1=3 MB only not 3*128=384 MB. But it shows each the block size as 128 MB. It is just an abstraction to store the metadata in the namenode, but not an actual memory size used

2)No way to store more than a file in a single block. Each file will be stored in a separate block. 

3)Handling with small files refer below link :

4)Maximum number of files in hadoop:

The maximum number of files in HDFS depends on the amount of memory
available for the namenode. Each file object and each block object
takes about  150 bytes of the memory. Thus, if you have 1million files
and each file has 1 one block each, then you would need about 3GB of
memory for the namenode.

5)Maximum upper limit for block size:
 certain older version of Hadoop, the limit was 2GB 
 - see https://issues.apache.org/jira/browse/HDFS-96

6)Hadoop for processing non-splittable very large binary files?
You can write a custom InputSplit for your file, bthe more files that you process in one batch (like hundreds), the more worth while it will be to use Hadoop.Here it won't really be ideal because unless your HDFS chunk size is the same as your file size your files are going to be spread all around and there will be network overhead. Or if you do make your HDFS size match your file size then you're not getting the benefit of all your cluster's disks. Bottom line is that Hadoop may not be the best tool for this.

   add comment

Your answer

Join with account you already have



Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!