Hadoop Cluster Performance Evaluation
Step 1: Benchmarking IO Performance
$ hadoop jar $HADOOP_HOME/hadoop-test.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
once this is complete, if please run
$ hadoop jar $HADOOP_HOME/hadoop-test.jar TestDFSIO -read -nrFiles 10 -fileSize 1000
It will also write a log file to the folder from where you issue the command.
These commands will data write to /benchmarks folder. You can delete this by issuing
$ hadoop jar $HADOOP_HOME/hadoop-test.jar TestDFSIO -clean
Step 2: Benchmarking Map-Reduce using Sort/ Tera Sort Examples
- Invoke
RandomWriter
(in examples JAR) to write random data output to a directory calledrandom-data
:
$hadoop jar $HADOOP_INSTALL/hadoop-examples.jar randomwriter random-data -Dtest.randomwrite.bytes_per_map=5000000 -Dtest.randomwrite.total_bytes=50000000
- Next run the sort example to sort the data in “random-data” and store it into “sorted-data”. The time to sort this data will give you an idea of how efficient your cluster is.
$hadoop jar $HADOOP_INSTALL/hadoop-examples.jar sort random-data sorted-data
- A final verification can be done using the testmapredsort (SortValidator) program.
$hadoop jar $HADOOP_INSTALL/hadoop-test.jar testmapredsort -sortInput random-data -sortOutput sorted-data
The output should be of the form
SUCCESS! Validated the MapReduce framework's 'sort' successfully.
Leave a comment