# Data modelling

## **Normalization**

* normalized
  * reduce data(field) duplication
  * data storage is more efficient for write performance
  * simple queries run faster
  * data is scattered across multiple tables
* denormalized
  * duplicate data across tables to improve read performance over write performance
  * data exists in 1 table, as opposed to multiple tables
* check <https://en.wikipedia.org/wiki/Database_normalization#Normal_forms>

## Transactional Schema (OLTP)

* normalized schema
* used for OLTP - transactional workloads

## Star Schema (OLAP)

* easiest schema design for data warehouse/mart
* data is organized into a **central fact table** that contains the measures of interest, surrounded by **dimension tables** that describe the attributes of the measures.
* used for OLAP - analytical workloads
* fact table: usually numeric values that can be aggregated
* dimentional table: groups of hierarchies and descriptors that define the facts (attributes)
* data is joined through FKs
* data can be denormalized, to improve reporting reads
* fact table examples: sales transactions, weather measurements&#x20;
* dimension table examples: client info, seller info, product info

## Snowflake Schema (OLAP)

* snowflake schema is a generalization of a star schema
* similar to star schema, but dimensions are [normalized](https://en.wikipedia.org/wiki/Normalization_\(database\)) into multiple related tables (linked through FKs)
* some database developers compromise by creating an underlying snowflake schema with [views](https://en.wikipedia.org/wiki/View_\(database\)) built on top of it that perform many of the necessary joins to simulate a star schema.
* Some [OLAP](https://en.wikipedia.org/wiki/OLAP) multidimensional database modeling tools are optimized for snowflake schemas.[\[3\]](https://en.wikipedia.org/wiki/Snowflake_schema#cite_note-3)
* [Normalizing](https://en.wikipedia.org/wiki/Database_normalization) attributes results in storage savings, the tradeoff being additional complexity in source query joins.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://claudiu-stanciu.gitbook.io/software-craft/big-data/data-formats.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
