Big Data Principles

Data sharding/replication

Duplicate the same data over several system instance.

High availability

Server high availability

At least a server/instance is always available for processing.

There is always some downtime compared to fault tolerance

Data high availability

Data is replicated/sharded over several systems so to always ( or with minimal downtime) provide the requested data

Fault tolerance

The system can still function ( in a degraded state) in the events of failures, or always provide a system ( fall back to a back-up system) without any downtime = stricter than HA

Requirements

no single point of failure
fault isolation

TL;DR; there is a back-up of everything

FT vs HA

http://www.pbenson.net/2014/02/the-difference-between-fault-tolerance-high-availability-disaster-recovery/

Scalability

Vertical

Add more resources to 1 machine

Horizontal

Add more machines to the cluster

Elasticity

Easily grow or shrink the service instances depending on the usage

CAP Theorem

Consistency
Availability
Partition tolerance

In the presence of network failures, pick Consistency or Availabity. Can't have them all'!

PreviousHBase NextBig Data Architectures

Last updated 5 years ago

Was this helpful?