After fighting with AWS Glue and realizing that in order to use Glue beyond the point of a dumb scheduler, I need to learn the hell out of Spark.
So, now I am setting up Spark on my Mac, and in the process of creating a hello world query to create a dataset that adds three columnar values.
I found a great article by a fellow Mac user.
I got stuck however in trying to get my system domain in Pycharm. Every time I tried to "Add Content Root" I got a lot of GUI crap and no system files.
GUI Crap
I figured out how to get beyond the GUI crap by using Tags. I browsed using FInder to the PySpark folder and added an "Orange" tag to my target.
Tagging Content Root in Orange
Now I am able to use the Orange tag to get this file. This is why I love Macs!
Comments