🪅
Hadoop tools
  • Introduction
  • Ingestion
    • Sqoop
    • Flume
  • Transformation
    • Pig
    • Hive
    • Spark - Scala
      • Examples
    • Spark - Python
  • NoSQL
    • HBase
  • Big Data Principles
  • Big Data Architectures
Powered by GitBook
On this page

Was this helpful?

  1. Transformation
  2. Spark - Scala

Examples

Average

// given: have a CSV file with the schema: registration_dttm,id,first_name,last_name,email,gender,ip_address,cc,country,birthdate,salary,title,comments
// when: doing an average of salary per gender
// then: should return a map with (gender,salary)

val users = sc.textFile("file:///userdata.csv").split(",",-1)
// removing empty salaries and genders
val usersClean = users.filter(x => !x(5).isEmpty && !x(10).isEmpty)
val usersGend = usersClean.map(x => (x(5), x(10).toFloat))
// adding a counter for each item
val usersGendAvgSal = usersGend.mapValues((_,1)).reduceByKey((x,y) => )
usersGendAvgSal.collect
PreviousSpark - ScalaNextSpark - Python

Last updated 5 years ago

Was this helpful?