You want to perform analysis on a large collection of images. You want to store this data in HDFS and process it with MapReduce but you also want to give your data analysts and data scientists the ability to process the data directly from HDFS with an interpreted high-level programming language like Python. Which format should you use to store this data in HDFS?
Which HDFS command displays the contents of the file x in the user's HDFS home directory?
You need to move a file titled ''weblogs'' into HDFS. When you try to copy the file, you can't. You know you have ample space on your DataNodes. Which action should you take to relieve this situation and store more files in HDFS?
Analyze each scenario below and indentify which best describes the behavior of the default partitioner?
The default partitioner computes a hash value for the key and assigns the partition based on this result.
The default Partitioner implementation is called HashPartitioner. It uses the hashCode() method of the key objects modulo the number of partitions total to determine which partition to send a given (key, value) pair to.
In Hadoop, the default partitioner is HashPartitioner, which hashes a record's key to determine which partition (and thus which reducer) the record belongs in.The number of partition is then equal to the number of reduce tasks for the job.
You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt. How many files will be processed by the FileInputFormat.setInputPaths () command when it's given a path object representing this directory?
Files starting with '_' are considered 'hidden' like unix files starting with '.'.
# characters are allowed in HDFS file names.
Currently there are no comments in this discussion, be the first to comment!