The Author Enjoying some Time at Patriot Place
Maybe I'm just old, but AI algorithm playlists like Apple Music, Spotify, and Pandora leave me feeling empty and bored. I need a human on the other picking the music and I want multiple people with different tastes picking my playlists.
KEXP is has been my soundtrack to the pandemic. Whether it was International Clash Day World Goth Day, or my favorite, Music Heals, KEXP was playing in the background while I holed up in my home office, writing data pipeline code. The mantra of, "you are not alone" was a comfort for all those times when I felt disconnected and missed the excitement of building software as a team.
I took it as an auspicious omen when I heard my favorite DJ, Cheryl Waters, play the "War on Drugs" as I drove to the NO THC drug test for my new job. I didn't look up NO THC until afterward and then realized that I could have done bong hits in the parking lot and would have been fine (Not that I would have, I don't even drink). Wow, the world is changing!
I like personal projects because they give me a handy repository of interesting coding snippets. After too many design patterns, I'm glad to fall onto just one:
Pull a data to a data lake using an API
Transform the data so we can understand it
Use that data to make our lives better
Here is a quick example of using a small amount of serverless Python code, a data lake in S3, a Snowflake Data Warehouse, a Business Analytics tool called Metabase, and Slack (my new UI) to make my life better by bringing new music into my life.
Building a Data Pipe from Web Data - Network Discovery
A powerful tool for Web Data Discovery is built right into the Chrome Browser. Within the system option "Developer Tools," there is a tab called "Network." This tab tells you everything that is going on with a web page. You can view the list of commands output and find out how to pull the data using code rather than through your browser.
Look for the API by checking for "format=json" Parameters
Pull out the API URL and feed it into Python requests and the JSON library.
page = requests.get("https://api.kexp.org/v2/plays/?format=json&limit=10&ordering=-airdate")
json_response = json.loads(page.text)
print(json.dumps(json_response, indent=2, sort_keys=True))
The Pretty Printer will write the file out.
Transforming the Data - Give the Data Shape
Snowflake is an effective tool to Transform data from JSON files into shaped data that we can query using SQL. I wrote about this previously in this article.
For the KEXP data, suppose we copy it into our staging bucket. The following select statement will turn the output results into data we can query. To follow the process, uncomment from the last column to the first to see more and more about the JSON file structure.
select tab.value:airdate::TIMESTAMP_LTZ airdate,
tab.value:album::STRING album,
tab.value:artist::STRING artist,
tab.value:song::STRING song,
tab.value,
stg."$1":results,
metadata$filename filename,
metadata$file_row_number
from @owlmtn.stage.AZRIUS_STAGE_TEST (
pattern =>'stage/kexp/.*',
file_format => stage.json_file_format) stg,
table(flatten(stg.$1:results)) tab;
From this point, we can build data warehouse staging tables on top of the API data output and start copying the data into the database.
Commentaires