Data modelling

Normalization

  • normalized

    • reduce data(field) duplication

    • data storage is more efficient for write performance

    • simple queries run faster

    • data is scattered across multiple tables

  • denormalized

    • duplicate data across tables to improve read performance over write performance

    • data exists in 1 table, as opposed to multiple tables

Transactional Schema (OLTP)

  • normalized schema

  • used for OLTP - transactional workloads

Star Schema (OLAP)

  • easiest schema design for data warehouse/mart

  • data is organized into a central fact table that contains the measures of interest, surrounded by dimension tables that describe the attributes of the measures.

  • used for OLAP - analytical workloads

  • fact table: usually numeric values that can be aggregated

  • dimentional table: groups of hierarchies and descriptors that define the facts (attributes)

  • data is joined through FKs

  • data can be denormalized, to improve reporting reads

  • fact table examples: sales transactions, weather measurements

  • dimension table examples: client info, seller info, product info

Snowflake Schema (OLAP)

  • snowflake schema is a generalization of a star schema

  • similar to star schema, but dimensions are normalized into multiple related tables (linked through FKs)

  • some database developers compromise by creating an underlying snowflake schema with views built on top of it that perform many of the necessary joins to simulate a star schema.

  • Some OLAP multidimensional database modeling tools are optimized for snowflake schemas.[3]

  • Normalizing attributes results in storage savings, the tradeoff being additional complexity in source query joins.

Last updated

Was this helpful?