Data modelling

Normalization

normalized
- reduce data(field) duplication
- data storage is more efficient for write performance
- simple queries run faster
- data is scattered across multiple tables
denormalized
- duplicate data across tables to improve read performance over write performance
- data exists in 1 table, as opposed to multiple tables
check https://en.wikipedia.org/wiki/Database_normalization#Normal_forms

easiest schema design for data warehouse/mart
data is organized into a central fact table that contains the measures of interest, surrounded by dimension tables that describe the attributes of the measures.
used for OLAP - analytical workloads
fact table: usually numeric values that can be aggregated
dimentional table: groups of hierarchies and descriptors that define the facts (attributes)
data is joined through FKs
data can be denormalized, to improve reporting reads
fact table examples: sales transactions, weather measurements
dimension table examples: client info, seller info, product info

snowflake schema is a generalization of a star schema
similar to star schema, but dimensions are normalized into multiple related tables (linked through FKs)
some database developers compromise by creating an underlying snowflake schema with views built on top of it that perform many of the necessary joins to simulate a star schema.
Some OLAP multidimensional database modeling tools are optimized for snowflake schemas.[3]
Normalizing attributes results in storage savings, the tradeoff being additional complexity in source query joins.

Last updated 5 months ago

Was this helpful?