Python JSON with S3: Parsing Directly from S3
The previous post on parsing JSON seemed so completely pointless. Here is what I did:
Wrote a test to assert a string literal equals a parsed JSON value
Wrote a wrapper program called "core.py" that just called another program called "awsdkjson.py"
Wrote a program that executed "json.load(json_filename)"
How about that for pointless? I could have written one line of code that loaded the JSON with a string.
The reason I did that is that I want to create a component that builds off that simple act of parsing a file, and I want to use S3 directly to that.
Step 0: Ensure that you have Created your S3 Environment
Review the post entitled: Setting up your first AWS Remote File.
Step 1: Write a Test that uses the S3 location as file URL
Write a similar test to the configure JSON, but use the S3 URL "s3://awsdk-book/json-files/table_of_contents.json" to indicate that we pull this file from S3. The test should fail.
Step 2: Fix the Failure by Implementing the Method
Here is the habit that makes great developers: get in the habit of causing a failure, then fixing it. Fix this failure by implementing a wrapper around configure_json that delegates the source of the file to the URL.
def configure_json(json_file): if json_file.startswith("s3://"): (bucket, key) = split_s3_path(json_file) session = awsdks3.create_aws_session() s3 = session.resource('s3') file_content =s3.Object(bucket, key).get()['Body'].read().decode('utf-8') return awsdkjson.loads(file_content) else: return awsdkjson.configure_json(json_file)
Note here that we create a separate package for S3 to handle that specifically. We may want to in the future extend this to the Google cloud, Azure cloud, etc, and so we can keep the backend code focused on specific dependencies. S3 is specific that it keeps data in a "bucket, file" pair, so we will implement that method
def split_s3_path(s3_path): path_parts = s3_path.replace("s3://", "").split("/") bucket = path_parts.pop(0) key = "/".join(path_parts) return bucket, key
Step 3: Execute the Working Test to Validate
Once the tests are in place, the code is implemented and working, then we are done and now have a method to seamlessly load either local JSON files or remote files from S3.
The code for this example is available on the GitHub under "Chapter 2".