AWS RDS quirks and limitations
This write-up is about Oracle on RDS but some of these apply to any RDS databases.
A note on PaaS and SaaS
Before highly autonomous cloud databases like Snowflake and Serverless Aurora, RDS was the easiest no-DBA cloud database. It’s a PaaS solution, which means it doesn’t give you any control of the host machine or operating system. At the same time it lets you control all (or most) of the configurable properties of the product hosted on it (the database) in order to “administer” it. That configurability is what makes it a non-SaaS solution. In that sense, autonomous databases like Snowflake are a SaaS solution much like Microsoft Office 365.
External tables pointing to S3 won’t detect new data
Unlike on-premise Oracle databases, RDS doesn’t let you load files to Oracle’s host machine to be used as external tables. But it let’s you specify an S3 location and create an external table. However this mechanism has a bug as of writing this. When the file’s content changes, Oracle’s external table won’t detect it.
The only way to overcome this is to delete the reference of the file from Oracle and re-establish it whenever you expect new contents. AWS Support’s response was “you can add a feature request”. Huh! Not a bug report? Here’s the work around in detail:
1. When a new file is added to S3
Run the proc download_from_s3(). This will tell Oracle a new file (with a new file name) has arrived in S3.2. When the contents of the file change
(a) Run this proc: exec utl_file.fremove(‘INPUT_DIR’,’my_file.csv’);
This will remove file handle from Oracle only. Will not delete the actual file in S3. You should have permission to run this.
(b) Re-run the proc download_from_s3()
At this point you will see changes in the file contents.
Expired archive logs won’t get cleared out
On Oracle instances running on archivelog mode, depending on how much time you want Oracle to keep the log files, RDS might maintain a few archive redo log files — only one of which is the active redo file. RDS appears to be a no-DBA PaaS solution but I needed a smart DBA to figure the whole thing out.
To see archive log file retention:
set serveroutput on;
exec rdsadmin.rdsadmin_util.show_configuration;— — output:
NAME:archivelog retention hours
VALUE:24
DESCRIPTION:ArchiveLog expiration specifies the duration in hours before archive/redo log files are automatically deleted.
NAME:tracefile retention
VALUE:10080
DESCRIPTION:tracefile expiration specifies the duration in minutes before tracefiles in bdump are automatically deleted.
This setting can be changed if you want to, for instance, keep logs for a longer period of time for a replication process that uses Oracle’s LogMiner API (like AWS DMS). This one retains logs for 48 hours:
exec rdsadmin.rdsadmin_util.set_configuration(‘archivelog retention hours’,48);
However, RDS may not be deleting expired log files which keeps accumulating. To see expired logs and their delete status, run this query:
select trunc(completion_time,’DD’), deleted, count(*)
from v$archived_log
group by trunc(completion_time,’DD’), deleted
order by 1 desc;
To delete expired archive logs, set the first flag to true:
BEGIN
rdsadmin.rdsadmin_rman_util.crosscheck_archivelog(
p_delete_expired => TRUE,
p_rman_to_dbms_output => FALSE);
END;
/
But because log files keep getting generated you will have to schedule this. Oracle can schedule this without using any external scheduler tools:
BEGIN
DBMS_SCHEDULER.CREATE_JOB (
job_name => ‘“ADMIN”.”JOB_CLEAN_ARCHIVELOGS”’,
job_type => ‘PLSQL_BLOCK’,
job_action => ‘BEGIN
rdsadmin.rdsadmin_rman_util.crosscheck_archivelog(
p_delete_expired => TRUE,
p_rman_to_dbms_output => FALSE);
END;’,
number_of_arguments => 0,
start_date => TO_TIMESTAMP_TZ(‘2021–01–20 10:24:22.521234000 EUROPE/BELGRADE’,’YYYY-MM-DD HH24:MI:SS.FF TZR’),
repeat_interval => ‘FREQ=DAILY;BYTIME=120000’,
end_date => NULL,
enabled => FALSE,
auto_drop => FALSE,
comments => ‘’);
DBMS_SCHEDULER.SET_ATTRIBUTE(
name => ‘“ADMIN”.”JOB_CLEAN_ARCHIVELOGS”’,
attribute => ‘store_output’, value => TRUE);
DBMS_SCHEDULER.SET_ATTRIBUTE(
name => ‘“ADMIN”.”JOB_CLEAN_ARCHIVELOGS”’,
attribute => ‘logging_level’, value => DBMS_SCHEDULER.LOGGING_OFF);
DBMS_SCHEDULER.enable(
name => ‘“ADMIN”.”JOB_CLEAN_ARCHIVELOGS”’);
END;
/
RDS procs are case sensitive
If you want grant select on sys objects to users, you have to call Amazon provided stored procedures available on rdsadmin user. For instance:
begin
rdsadmin.rdsadmin_util.grant_sys_object( p_obj_name => ‘V_$SESSION
', p_grantee => 'USERNAME
', p_privilege => 'SELECT
');
end; /
What RDS documentation doesn’t mention is that these parameters in single quotes are case sensitive. It must be in upper case. Learned it the hard way (means banging my head on the wall way).
More external table quirks
I had queries fired on external table that ran in an infinite loop for weeks bringing CPU to 99% utilization all the time. If a file has been removed from s3, Oracle wouldn’t know it unless rdsadmin procs were run to remove the reference to that file from Oracle engine. So Oracle thinks it’s still there and instead of failing or giving a warning message, it just sits on the CPU. Those sessions had to be killed when found out.
Killing sessions
Like a lot of administrative tasks, killing a user session need an rdsadmin stored procedure to be run. Per RDS documentation all you need to do is run this:
begin
rdsadmin.rdsadmin_util.kill( sid =>sid
, serial =>serial_number
);
end;
/
There is a 3rd parameter called method and it has value IMMEDIATE by default (as shown below). But this never kills the process, it simply sets a bit, hoping that the bit will be used by Oracle to kill the session. If you want the session to be killed immediately you have to kill the process. The way to do it is by overriding the value IMMEDIATE in 3rd parameter with value PROCESS:
begin
rdsadmin.rdsadmin_util.kill( sid =>sid
, serial =>serial_number,
method =>PROCESS
);
end;
/
Non-availability of SYS/SYSTEM
SYS or SYSTEM users are locked out by default. There is no access to roles like SYSDBA either. There is an admin user, and all admin tasks are available as RDS custom procedures stored on rdsadmin schema.
This is ok most of the time with some exceptions. One problem I came across is when DB runs out of max threshold space. Normally you can log in as SYS to fix the issue but RDS doesn’t have it. The only way out is to forcefully allocate storage on AWS Console (you should have sufficient privileges on AWS Console from the AWS Admin). Not even the admin user can login to Oracle until this done from Console. See below.
Storage quirks
A note about AWS storage terminology:
IOPS = I/O operations per second
Throughput = IOPS x block size
With 1024 KB reads and 1000 IOPS you can have a throughput of almost 1 GB/s. However throughput gets throttled at the point where hardware limits it — bandwidth of disk controllers, disk bus width etc.
Allocated storage = storage that you want to allocated to DB (eg: 1 TB)
Max storage threshold = a storage limit up to which you RDS can auto-allocate more storage based on data volume growth (eg: 3 TB).
When the initial 1 TB fills up, RDS auto-allocates more storage but only until it reaches 3 TB at which point DB will fail. At this point the only way out is to go to AWS Console and allocate more storage.
The other quirk is that in provisioned IOPS setup allocated storage and maximum storage threshold has some relationship and AWS doesn’t seem to have it documented any where, not sure why! For instance if you want to allocate 1024 GB (or 1 TB), then max threshold should be some (?)multiple of 1 TB. Simply specifying 3 TB won’t succeed. Also allocated storage cannot be de-allocated, the only option is to create a new database.
Snapshots as backup
S3 snapshots are extremely easy to use, no product knowledge required. They are kept in an S3 location that AWS internally manages. There is no way you can download this or back this up to your own S3 bucket or access this through S3 console. In other words the only way to see snapshots is through RDS Console and pray AWS doesn’t lose it. However, given S3’s high durability losing data is unlikely.