Software
  • Introduction
  • Code craft
    • Clean code
    • Reusable code
    • Refactoring
    • Code smells
  • Service craft
  • Software Principles
  • Big Data
    • Introduction
    • Data modelling
  • Patterns
    • Enterprise patterns
    • Software patterns
  • Agile & proj management
    • Scrum vs Kanban
    • Kanban
    • Scrum
  • Conflict management
  • Reading
  • Software Architecture
    • Architecture components
Powered by GitBook
On this page
  • Defined by the 3 V's
  • Storing data
  • Processing data

Was this helpful?

  1. Big Data

Introduction

Defined by the 3 V's

  • volume (data size) = TB/PB of data

  • velocity (data/event speed)= GB/sec, 1 mil events/sec

  • variety (data formats)

    • structured: tabular (SQL, CSV, Excel)

    • semi-structured: JSON, XML, binary formats

    • unstructured: text, pdf, images, videos, binary blobs

Storing data

  • lakehouse

    • data lake ++

    • data is optimistically consistent (transactional)

    • uses iceberg data format

  • data swamp

    • raw/all data

    • required to enable reprocessing

    • data is stored to enable reprocessing, or processing in the future

  • data lake

    • all data

    • used in analytics & reporting

    • data is stored in the hope that it could be processed in the future by analytics

    • cleansed & standardised data

    • data governance (data catalogue, lineage, security, metadata)

    • less size than data swap

  • data warehouse

    • enterprise-wide data

    • sub-collection of day to day/ historical data, used in reporting

    • less size than data lake

  • data mart

    • department-wide data

    • reporting

    • less size than data warehouse

Processing data

  • streaming

    • row/event/micro-batch based processing

    • mostly for preparing data, or short-term decisions/analytics (decision engine)

  • batch

    • large chunks of data for processing

    • mostly for long-term analytics (finding connections between data)

  • lambda

    • stream + batch in parallel

    kappa

    • stream only

PreviousBig DataNextData modelling

Last updated 4 months ago

Was this helpful?