Downloading .gz files from Amazon S3 and extracting them
This article will demonstrate how to download .gz files from an Amazon S3 bucket. The files will be automatically extracted and stored on a local folder.
Configuration
- Create a new channel.
- General: give the channe a title (e.g. 'extract Amazon S3 gz')
- Schedule: select periodic scheduling and check e.g. every 10minutes.
Note: selectingDetect files automatically
could check one or several times a second and might increase your Amazon AWS costs! Better use periodic scheduling or a manual check (= don't schedule). - Input: select
Amazon S3
. Select your Region, Provide your Access Key ID and Secret Access Key. Finally fill in your Bucket name and optionally the specific Folder name.
The Folder name could be left empty to locate files in the root. - Input Filter: For demo purposes we'll configure a filter to only search for .gz files (GNU zip).
a. Click the Green add button.
b. SelectProperty
.
c. SelectFile name
andMatch Regex
and fill in(?i).*[.]gz
- Conversion: Click Add Converter, select
Extract files
. - Impersonation: not being used now.
- Output: select
Local/Network
and locate the path to store your extracted .gz files. - Post-Action: leave it as default (
Process -> Delete (input file)
). This removes the file from your Amazon S3 bucket after it has been processed, otherwise we will be in an infinite loop.
Finding the Amazon S3 Access Key ID and Secret Access Key
- Sign in to the Amazon AWS Portal.
- In the AWS Management Console in the upper right corner click your company name and choose My Security Credentials from the menu.
- Select Access keys (access key ID and secret access key).
- You can Create New Access Key there or view the existing AWS access keys. Remember the secret cannot be found anymore if you didn't save it.