1)1 MB file stored in 128MB block with 3 replication. Then the file will be stored in 3 blocks and uses 3*1=3 MB only not 3*128=384 MB. But it shows each the block size as 128 MB. It is just an abstraction to store the metadata in the namenode, but not an actual memory size used
2)No way to store more than a file in a single block. Each file will be stored in a separate block.
3)Handling with small files refer below link :
4)Maximum number of files in hadoop:
The maximum number of files in HDFS depends on the amount of memory
available for the namenode. Each file object and each block object
takes about 150 bytes of the memory. Thus, if you have 1million files
and each file has 1 one block each, then you would need about 3GB of
memory for the namenode.
5)Maximum upper limit for block size:
certain older version of Hadoop, the limit was 2GB
- see https://issues.apache.org/jira/browse/HDFS-96
6)Hadoop for processing non-splittable very large binary files?
You can write a custom InputSplit for your file, bthe more files that you process in one batch (like hundreds), the more worth while it will be to use Hadoop.Here it won't really be ideal because unless your HDFS chunk size is the same as your file size your files are going to be spread all around and there will be network overhead. Or if you do make your HDFS size match your file size then you're not getting the benefit of all your cluster's disks. Bottom line is that Hadoop may not be the best tool for this.