The results and ramblings of research


General Scripting Commands (To be continued)

leave a comment »

As a small startup, we utilize AWS extensively. Most of our files are stored in S3, and sometimes it is painful to download files one by one. However, linux is da bomb. I am listing some of the commands that are especially useful. Please note s3cmd must be installed and configured correctly.

  1. Creating a list of files to download  
     s3cmd ls --recursive s3://bucket/ | grep -i condition > files.txt
  2. Downloading a list of files stored in a file
     while read line do s3cmd get $line done > files.txt
  3. However, spaces in filenames always kill you. So the following works like a charm. Notice the quotes 🙂
     while read line do s3cmd -v get "${line}"  done < files.txt
  4. Get a total set of files in bucket folder 
     s3cmd ls s3://bucket/ |wc 


     s3cmd ls --recursive s3://bucket/ | wc 
  5. Get a list of s3 files (URL’s) and save to file. Especially useful if you are passing a file which contains all s3 files to be processed, e.g., in hadoop with NLineInputFormat
     s3cmd ls --recursive s3://bucket/ | awk '{print $4}' > output_file.txt 

Written by anujjaiswal

June 26, 2013 at 11:51 am

Posted in AWS, Bash

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: