Stack Overflow Big Data Processing
Implemented Python API for Apache Spark (PySpark) and performed spark-submit job to process data from the Stack Overflow Annual Developer Survey 2020.
Spun an Elastic MapReduce (EMR) cluster using Spark and created a Spark application written in Python (PySpark) for EMR data processing.