Multi-dimensional Cube concepts for dummies

Raj Samuel
3 min readFeb 8, 2021

--

OLAP cube or multi-dimensional cube is equivalent to implementing aggregated tables on DB. Data is pre-calculated and stored in the cube. Users running analysis don’t have to join and group by.

Cubes are not necessarily a feature of the database engine, it is commonly built by the BI tool. Cubes can be built very easily without having to do any data modeling. If you know what measures and dimensions are the users interested in, you can drag-and-drop facts and dimensions to build a cube using a GUI tool like Microsoft SSAS.

Basic objects are: cubes, measures and dimensions. Measures are columns from a fact table (claim_amount), dimensions are columns from a dimension (country_code) and cubes are the pre-calculated materialization of those two.

Pic credit: https://www.olapcube.com/mdxhelp/KeyConcepts.htm

This cube has a measure (actually two measures — but let’s ignore the measure Last Date and consider only Packages sold).

This measure is pre-calculated across 3 dimensions — time, region (source), transportation channel (route). For instance the no. of packages sold (measure) at the end of first quarter (dimension 1) in North America (dimension 2) by way of Air transport (dimension 3) is 3056. In a normal dimensional model this is equivalent to :

  • joining fact table to 3 dimensions (time, route, region) (FROM clause)
  • aggregating on the 3 dimensions (GROUP BY clause)
  • filtering by North America, 1st Quarter, Air (WHERE clause)
  • calculating count on the quantity field. (SELECT sum())

Cube has this pre-calculated.

Cube is useful when users want to run analyses quickly without having to join and aggregate on the fly. For the same reason they are relatively faster when dimensions have complexities: snowflakes or hierarchies (product hierarchy or geographical hierarchy for instance).

The drawback of cubes is that it takes time to build as measures have to be aggregated across dimensions (even if they all are not going to be used). Add to this the time for ETL to load the data warehouse before cube can be built for that day.

The query language used for querying cube is called MDX (Multi-Dimensional Expressions — Microsoft invention), which is somewhat similar to SQL. Note that unlike a geometric cube an OLAP cube can have many dimensions and can have dimensions (sides) with unequal lengths.

OLAP cubes have almost become obsolete with computing becoming cheaper/powerful and lack of developer enthusiasm for MDX. Data warehouses have moved to columnar storage which optimizes for selective reads and compression — both useful in analytics.

Compute and storage capacity of a warehouse has become pay-per-minute or pay-per-usage, so permanently storing pre-calculated values doesn’t make sense. You can pay for a very high compute capacity when analyzing data and turn it off right after. You can also analyze real-time data (streams) as opposed to batch based cubes. Snowflake is a great example of this (it has columnar storage, pay-per-usage, scale-up/down or scale-out/in on demand, built-in data streaming). So why build cubes?

--

--

Raj Samuel

I write because I forget. (PS: if you take what I wrote and post it as your own please try not to edit it and post rubbish. CTRL+C, CTRL+V is your friend.)