Setting up Amazon Aurora to read from S3

Raj Samuel
5 min readApr 14, 2019

--

After having found it very easy to use Amazon Athena to read from S3, I underestimated the difficulty of getting Aurora MySQL to do the same. The documentation sometimes doesn’t offer help all the way. Note that Aurora runs on RDS although it’s not marketed that way exactly.

Key thing to note is that Aurora serverless cannot read from S3, as of April 2019.

This is the official documentation for setting up S3 load:

Steps to load Aurora MySQL from S3.

  1. Initial steps:

There are 3 initial steps listed as a,b,c below:

a. Create and launch Aurora MySQL Database following the simple navigation on AWS Console. Do not choose serverless option.

b. If Aurora was created in a private subnet, as would be the case with most enterprise systems, set up a tunnel from your local machine (127.0.0.1) to Aurora cluster through a NAT Gateway. See appendix at bottom on how to do this.

c. Download a MySQL client like MySQL Workbench.

Note that only an older version of MySQL Workbench, version 6.3, works with Aurora.

2. Create IAM Role for Aurora’s S3 access.

Go to IAM, Roles and search for the role “RDS — CloudHSM and Directory Service”, select it and detach the existing policies attached to this role. Now attach either of these policies to the role: AmazonS3ReadOnlyAccess or AmazonS3FullAccess.

This will grant Aurora access to all buckets in the account. If Aurora shouldn’t have access to all buckets, create a specific policy for the bucket and attach it to the role as explained below.

3. Create IAM policy for a specific bucket (optional)

Go to IAM, Policy and Create Policy. Choose Service and select S3. For Bucket permissions choose “ListBucket”. For Object permissions choose “GetObject” and “GetObjectVersion”.

Go to Resources and Add ARN for bucket. ARN looks like arn:aws:s3:::your-bucket-name. Add object ARNs similarly or you can leave it empty in which case all objects (files) in the bucket can be read by Aurora. ARN will look like arn:aws:s3:::*/* in the latter case.

4. Add IAM Role to Aurora cluster

This involves two steps listed in a and b below:

a. Add role to cluster

As of April 2019, AWS Console for Aurora has removed the option to add IAM Roles to cluster. Hence this needs to be done via AWS CLI for the first time, and then Aurora Console will show the option for subsequent role management.

Command to do this via CLI:

PROMPT> aws rds add-role-to-db-cluster --db-cluster-identifier myfirstcluster --role arn:aws:iam::1234567890:role/my_role_for_s3access

b. Add role to cluster parameter group

Default Cluster Parameter Group or default Parameter Group cannot be modified.

If Aurora cluster was created with default Cluster Parameter Group, create a new Cluster Parameter Group. Add role’s ARN to the value of parameter aurora_load_from_s3_role.

Go to cluster, choose Modify and attach the new Cluster Parameter Group to the cluster. Choose to apply changes immediately.

5. Establish a VPC endpoint

Endpoint helps Aurora cluster sitting in a private subnet to communicate with other AWS services.

Go to VPC, Endpoints and create a new endpoint if there isn’t one already created. After it’s created, go to Route Tables of the Endpoint, add Route Table ID through “Manage Route Tables”. Remember to add the Route Table ID that’s associated with the subnet in which Aurora cluster was created.

If there are many Route Table ID’s and you’re not sure which one to associate to the Endpoint, try running LOAD DATA FROM S3 command from MySQL Workbench after adding one Route Table ID. If all above steps are done properly, and LOAD doesn’t work, try associating another Route Table ID.

On the Route table’s list of routes, there should be an entry where destination is S3 (com.amazonaws.ca-central-1.s3) and target is the VPC Endpoint.

If this step isn’t done properly, MySQL Workbench will return errors like “Unable to initialize S3 stream” or “Cannot instantiate S3 stream”.

Now log in from MySQL Workbench with the master user of Aurora to local host 127.0.0.1 and the source port of the tunnel. If this is not master user, then master user should have run this command beforehand:
GRANT LOAD FROM S3 ON *.* TO <non-master-user@domain-or-ipaddress>

It should connect now. Try the LOAD command:

LOAD DATA FROM S3 FILE ‘S3-ca-central-1://my-bucket/myfolder/myfile.csv’
INTO TABLE my_aurora_table
FIELDS TERMINATED BY ‘,’;

If MySQL Workbench times out, try this: Go to Edit, Preferences, SQL Editor. Under MySQL Session, increase all time out intervals to a bigger number, say 600 seconds. This will return a meaningful error message from Aurora.

APPENDIX: Set up a tunnel to Amazon Aurora

AWS resources are usually set up to be blocked from direct outside access by securing them inside a private subnet. For this reason, users who try to directly connect to those services should first connect to an NAT server (Network Address Translation) first, which will channel (tunnel) the connection securely to the target service.

This assumes that the AWS Admin has set up an NAT Gateway or an EC2 NAT instance already.

1. Get the key pair (.ppk file) from Admin and save it locally.

2. Open putty, go to SSH, Auth. Under “private key for authentication”, Browse to the above key pair file and select it.

3. Go to SSH, Tunnels:

a. Provide a source port. This is the port on your local machine. Eg: 4000 (Any port that’s not used by another service on the local machine will do).

b. Provide a destination. This is the endpoint of Aurora cluster’s master (also known as writer or read-write endpoint), combined with MySQL port. Use the default port for MySQL which is 3306. Eg: myauroracluster.cijftcgjx1hk.ca-central-1.rds.amazonaws.com:3306

c. Click Add.

4. Go back to Session, enter the Public DNS (or Public IP) of NAT instance in the box for Hostname. Your AWS Admin can give you this. Port# is 22. Save the session with a name.

5. Click Open to connect to NAT via SSH.

6. At the login prompt, type user name ec2-user. It should log you in without prompting for password.

Now we have opened a tunnel from local host to Aurora. This means that any connection made to the source port of the tunnel (Eg. 4000 of local machine) will be forwarded to the destination port which is 3306 of Aurora cluster.

7. Now connect to Aurora from MySQL Workbench. Hostname is 127.0.0.1 (local host) and port number is what you selected above, for example 4000. Provide user name and password for the master user of Aurora (or someone who was granted permission by master user). Try LOAD command now.

--

--

Raj Samuel
Raj Samuel

Written by Raj Samuel

I write because I forget. (PS: if you take what I wrote and post it as your own please try not to edit it and post rubbish. CTRL+C, CTRL+V is your friend.)

Responses (2)