🪅
Hadoop tools
  • Introduction
  • Ingestion
    • Sqoop
    • Flume
  • Transformation
    • Pig
    • Hive
    • Spark - Scala
      • Examples
    • Spark - Python
  • NoSQL
    • HBase
  • Big Data Principles
  • Big Data Architectures
Powered by GitBook
On this page
  • Create
  • Select
  • Read/Write

Was this helpful?

  1. Transformation

Spark - Python

Create

df = spark.createDataFrame(
    [
        ['red', 'banana', 1, 10], 
        ['blue', 'banana', 2, 20], 
        ['red', 'carrot', 3, 30],
        ['blue', 'grape', 4, 40], 
        ['red', 'carrot', 5, 50], 
        ['black', 'carrot', 6, 60],
        ['red', 'banana', 7, 70], 
        ['red', 'grape', 8, 80]
    ], 
    schema=['color', 'fruit', 'v1', 'v2']
    )
df.show()
df = spark.read.json("resources/zipcodes.json")  # must be jsonl

Select

// Some code

Transformations

Grouping

df.groupby('color').avg().show()

Read/Write

df.write.json()

PreviousExamplesNextNoSQL

Last updated 4 months ago

Was this helpful?