The lost art of database space estimation

Raj Samuel
3 min readMay 2, 2021

Database space estimation is a relic of the past. Space estimation is usually done after Physical Data Model has been designed. Architects had to fill these estimations on Excel sheets and convince infrastructure managers (and sometimes finance) that they need the requested GBs to run their applications. Then they wait.

Today we start from a guesstimate and let the serverless database auto-increment storage based on demand. The most we do is point and click on our cloud provider’s console on the browser.

But it’s good to learn it due to the off chance that an old school architect asks this in an interview. Here’s a non-comprehensive list of items to think about. This is made for SQL Server .

  1. Find average row size of tables.
    Since DB-wide average size won’t be accurate, classify tables into 3–5 groups based on average row size to avoid the trouble of measuring row size for individual tables. For example, Group1 = tables having at least 250 bytes in a row, Group2 = ~150 bytes, Group3 = ~50 bytes.
  2. Consider Group1. How many 250 byte rows can go in one DB page? (see notes below on Page). Page size in SQL Server is 8KB. So,
    8KB page size / 250 byte row = number of rows per page
    8192/250 = 32.768 rows per page
  3. Round this down (because the rows won’t be scattered in fractions — a row is always inside a single page). So 32 rows per page.
  4. Let’s say initial load has 100,000 records into the Group1 tables. How many pages will be needed?
    100,000 records / 32 = total number of pages needed
    100,000/32 = 3125 pages
  5. If no. of pages is a fraction, round it up because pages are always 8K, no fractions. So if we had 3125.16 pages it would be rounded up to 3126.
  6. Let’s say in subsequent months (or years or quarters), there is 20% data growth expected. So 20,000 new records on second month (or year or quarter)
  7. Repeat the page calculation for second month.
    20,000/32 = 625 pages
  8. Add the number of pages for all months (or periods).
    3125 + 625= 3750 pages
  9. 3750 pages * 8KB page size = Total no. of bytes needed for Group1 tables for first 2 months.
    6875 * 8 *1024 bytes = 30,720,000 bytes
    30.72 MB disk space is needed for Group1 tables for first 2 months

Repeat above steps for the remaining table groups and indexes.

Note on indexes:

In SQL Server, a clustered index store the table itself (which resembles Index Organized Tables in Oracle). But secondary indexes are stored separately with two items for each row of the original table: column being indexed plus a pointer to the row of the table (and some housekeeping bytes).

If the index is a covering index, all columns being covered need to be considered for spacing. Filtered indexes may not hold all rows from table based on the filter condition but we can err on the side of caution and assume they have all the rows for the purpose of space estimation.

Special notes to keep in mind:

  1. Page is the basic storage unit for disk operations in SQL Server. It’s 8KB for all pages in database. Data is always read and written in pages even if only one row (out of many rows in the page) is needed.
  2. Each page requires close to 100 bytes (96 bytes in SQL Server) for housekeeping — depends on the product.
  3. Each varchar needs 2 extra bytes for housekeeping.
  4. Must allocate space for tempDB, MasterDB and a file for ResourceDB.
  5. tempDB has an 8MB data file and 1MB log file at database creation. tempDB is by default configured to grow (allocate space) 10%. Restarting database will clear tempDB, and starts at 8MB again.
  6. Large Object columns (LOBs) if present need to be accounted for. Whether LOBs are stored locally in the row or stored separately from the row via a pointer depends on the database product. SQL Server by default stores a pointer but you can change this behavior.
  7. Always add buffer for immediate future growth and things that you didn’t think of (for example additional indexes that you didn’t foresee). My rule of thumb for total size is 2–3 times the estimated size.

Disclaimer: I am rusty on SQL Server. When I last worked on it Nadella was running cloud and Azure wasn’t a player. Things may have changed a little but the ideas presented here are still valid.

--

--

Raj Samuel

I write because I forget. (PS: if you take what I wrote and post it as your own please try not to edit it and post rubbish. CTRL+C, CTRL+V is your friend.)