The results and ramblings of research

phewww!

Hadoop Cluster Performance Evaluation

leave a comment »

Step 1: Benchmarking IO Performance

$ hadoop jar $HADOOP_HOME/hadoop-test.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 

once this is complete, if please run

$ hadoop jar $HADOOP_HOME/hadoop-test.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

It will also write a log file to the folder from where you issue the command.
These commands will data write to /benchmarks folder. You can delete this by issuing

$ hadoop jar $HADOOP_HOME/hadoop-test.jar TestDFSIO -clean

Step 2: Benchmarking Map-Reduce using Sort/ Tera Sort Examples

  • Invoke RandomWriter (in examples JAR) to write random data output to a directory called random-data:
$hadoop jar $HADOOP_INSTALL/hadoop-examples.jar randomwriter random-data -Dtest.randomwrite.bytes_per_map=5000000 -Dtest.randomwrite.total_bytes=50000000
  • Next run the sort example to sort the data in “random-data” and store it into “sorted-data”. The time to sort this data will give you an idea of how efficient your cluster is.
$hadoop jar $HADOOP_INSTALL/hadoop-examples.jar sort random-data sorted-data
  • A final verification can be done using the testmapredsort (SortValidator) program.
$hadoop jar $HADOOP_INSTALL/hadoop-test.jar testmapredsort -sortInput random-data -sortOutput sorted-data

The output should be of the form

SUCCESS! Validated the MapReduce framework's 'sort' successfully.

Written by anujjaiswal

April 7, 2011 at 12:10 am

Posted in Hadoop

Leave a comment