Downloading .gz files from Amazon S3 and extracting them

This article will demonstrate how to download .gz files from an Amazon S3 bucket. The files will be automatically extracted and stored on a local folder.

Configuration

  1. Create a new channel.
  2. General: give the channe a title (e.g. 'extract Amazon S3 gz')
  3. Schedule: select periodic scheduling and check e.g. every 10minutes.
    Note: selecting Detect files automatically could check one or several times a second and might increase your Amazon AWS costs! Better use periodic scheduling or a manual check (= don't schedule).
  4. Input: select Amazon S3. Select your Region, Provide your Access Key ID and Secret Access Key. Finally fill in your Bucket name and optionally the specific Folder name.
    The Folder name could be left empty to locate files in the root.
  5. Input Filter: For demo purposes we'll configure a filter to only search for .gz files (GNU zip).
    a. Click the Green add button.
    b. Select Property.
    c. Select File name and Match Regex and fill in (?i).*[.]gz
  6. Conversion: Click Add Converter, select Extract files.
  7. Impersonation: not being used now.
  8. Output: select Local/Network and locate the path to store your extracted .gz files.
  9. Post-Action: leave it as default (Process -> Delete (input file)). This removes the file from your Amazon S3 bucket after it has been processed, otherwise we will be in an infinite loop.

Image

Finding the Amazon S3 Access Key ID and Secret Access Key

  1. Sign in to the Amazon AWS Portal.
  2. In the AWS Management Console in the upper right corner click your company name and choose My Security Credentials from the menu.
  3. Select Access keys (access key ID and secret access key).
  4. You can Create New Access Key there or view the existing AWS access keys. Remember the secret cannot be found anymore if you didn't save it.