Big Data Principles
Data sharding/replication
Duplicate the same data over several system instance.
High availability
Server high availability
At least a server/instance is always available for processing.
There is always some downtime compared to fault tolerance
Data high availability
Data is replicated/sharded over several systems so to always ( or with minimal downtime) provide the requested data
Fault tolerance
The system can still function ( in a degraded state) in the events of failures, or always provide a system ( fall back to a back-up system) without any downtime = stricter than HA
Requirements
no single point of failure
fault isolation
TL;DR; there is a back-up of everything
FT vs HA
Scalability
Vertical
Add more resources to 1 machine
Horizontal
Add more machines to the cluster
Elasticity
Easily grow or shrink the service instances depending on the usage
CAP Theorem
Consistency
Availability
Partition tolerance
In the presence of network failures, pick Consistency or Availabity. Can't have them all'!
Last updated
Was this helpful?