Big Data Principles

Data sharding/replication

Duplicate the same data over several system instance.

High availability

  • Server high availability

At least a server/instance is always available for processing.

There is always some downtime compared to fault tolerance

  • Data high availability

Data is replicated/sharded over several systems so to always ( or with minimal downtime) provide the requested data

Fault tolerance

The system can still function ( in a degraded state) in the events of failures, or always provide a system ( fall back to a back-up system) without any downtime = stricter than HA

Requirements

  • no single point of failure

  • fault isolation

TL;DR; there is a back-up of everything

FT vs HA

http://www.pbenson.net/2014/02/the-difference-between-fault-tolerance-high-availability-disaster-recovery/

Scalability

  • Vertical

Add more resources to 1 machine

  • Horizontal

Add more machines to the cluster

Elasticity

Easily grow or shrink the service instances depending on the usage

CAP Theorem

  • Consistency

  • Availability

  • Partition tolerance

In the presence of network failures, pick Consistency or Availabity. Can't have them all'!

Last updated

Was this helpful?