Hadoop Cluster Performance Evaluation

Step 1: Benchmarking IO Performance

$ hadoop jar $HADOOP_HOME/hadoop-test.jar TestDFSIO -write -nrFiles 10 -fileSize 1000

once this is complete, if please run

$ hadoop jar $HADOOP_HOME/hadoop-test.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

It will also write a log file to the folder from where you issue the command.
These commands will data write to /benchmarks folder. You can delete this by issuing

$ hadoop jar $HADOOP_HOME/hadoop-test.jar TestDFSIO -clean

Step 2: Benchmarking Map-Reduce using Sort/ Tera Sort Examples

Invoke RandomWriter (in examples JAR) to write random data output to a directory called random-data:

$hadoop jar $HADOOP_INSTALL/hadoop-examples.jar randomwriter random-data -Dtest.randomwrite.bytes_per_map=5000000 -Dtest.randomwrite.total_bytes=50000000

Next run the sort example to sort the data in “random-data” and store it into “sorted-data”. The time to sort this data will give you an idea of how efficient your cluster is.

$hadoop jar $HADOOP_INSTALL/hadoop-examples.jar sort random-data sorted-data

A final verification can be done using the testmapredsort (SortValidator) program.

$hadoop jar $HADOOP_INSTALL/hadoop-test.jar testmapredsort -sortInput random-data -sortOutput sorted-data

The output should be of the form

SUCCESS! Validated the MapReduce framework's 'sort' successfully.

Written by anujjaiswal

April 7, 2011 at 12:10 am

Posted in Hadoop

The results and ramblings of research

Hadoop Cluster Performance Evaluation

Leave a comment Cancel reply

Pages

Archives

Tags

Blogroll

Top Clicks

Categories

The results and ramblings of research

Hadoop Cluster Performance Evaluation

Share this:

Related

Leave a comment Cancel reply

Pages

Archives

Tags

Blogroll

Top Clicks

Categories