🪅
Hadoop tools
  • Introduction
  • Ingestion
    • Sqoop
    • Flume
  • Transformation
    • Pig
    • Hive
    • Spark - Scala
      • Examples
    • Spark - Python
  • NoSQL
    • HBase
  • Big Data Principles
  • Big Data Architectures
Powered by GitBook
On this page
  • Data sharding/replication
  • High availability
  • Fault tolerance
  • FT vs HA
  • Scalability
  • Elasticity
  • CAP Theorem

Was this helpful?

Big Data Principles

Data sharding/replication

Duplicate the same data over several system instance.

High availability

  • Server high availability

At least a server/instance is always available for processing.

There is always some downtime compared to fault tolerance

  • Data high availability

Data is replicated/sharded over several systems so to always ( or with minimal downtime) provide the requested data

Fault tolerance

The system can still function ( in a degraded state) in the events of failures, or always provide a system ( fall back to a back-up system) without any downtime = stricter than HA

Requirements

  • no single point of failure

  • fault isolation

TL;DR; there is a back-up of everything

FT vs HA

Scalability

  • Vertical

Add more resources to 1 machine

  • Horizontal

Add more machines to the cluster

Elasticity

Easily grow or shrink the service instances depending on the usage

CAP Theorem

  • Consistency

  • Availability

  • Partition tolerance

In the presence of network failures, pick Consistency or Availabity. Can't have them all'!

PreviousHBaseNextBig Data Architectures

Last updated 5 years ago

Was this helpful?

http://www.pbenson.net/2014/02/the-difference-between-fault-tolerance-high-availability-disaster-recovery/