Splitting json.dumps into smaller files

Question

I am extracting email attachments from outlook using the graph api and loading them into an s3 bucket in aws.

main = 'https://graph.microsoft.com/v1.0/me/mailFolders/inbox/messages?$expand=attachments&$search="hasAttachments:true"&Top=10'
  
response = requests.get(main, headers={'Authorization': 'Bearer ' + result['access_token']})
if response.status_code != 200:
    raise Exception(response.json())

response_json = response.json()

emails = response_json['value']
for email in emails:
    if email['hasAttachments']:
        email_id = email['id']
        download_email_attachments(email_id, headers)
        print(email['subject'])
        print(email['hasAttachments'])

    s3 = boto3.client('s3')
    bucket ='demo-bucket'
    fileName = email['subject'] + '.json'
    fileContent = bytes(json.dumps(graph_data, indent=2).encode('UTF-8'))

    s3.put_object(Bucket=bucket, Key=fileName, Body=fileContent)
    print('Upload Complete')

graph_data is an endpoint that calls the api to pull the data, similar to main, but has an additional search criteria.

This is the block of code i’m using to take the file content and upload it to the s3 bucket. However, when I run this the data gets put into a single file but it will still create a different file for each attachment name. So i’ll have 10 files with the exact same data, but different names.

I’m getting the below sample but if I pull 10 emails, I get 10 different file names with all 10 emails worth of data in each one.

Sample of data from 1 email:

{
    "@odata.context": "https://graph.microsoft.com/v1.0/$metadata#users('48d31887-5fad-4d73-a9f5-3c356e68a038')/mailFolders('inbox')/messages(attachments())",
    "value": [
        {
            "@odata.etag": "W/"CQAAABYAAAAiIsqMbYjsT5e/T7KzowPTAASWWffQ"",
            "id": "AAMkAGVmMDEzMTM4LTZmYWUtNDdkNC1hMDZiLTU1OGY5OTZhYmY4OABGAAAAAAAiQ8W967B7TKBjgx9rVEURBwAiIsqMbYjsT5e-T7KzowPTAAAAAAEMAAAiIsqMbYjsT5e-T7KzowPTAASXFxfBAAA=",
            "createdDateTime": "2022-06-24T09:31:45Z",
            "lastModifiedDateTime": "2022-06-24T09:31:46Z",
            "changeKey": "CQAAABYAAAAiIsqMbYjsT5e/T7KzowPTAASWWffQ",
            "categories": [],
            "receivedDateTime": "2022-06-24T09:31:45Z",
            "sentDateTime": "2022-06-24T09:31:45Z",
            "hasAttachments": true,
            "internetMessageId": "<SJ0PR15MB5245AA459418AC65A5B18C6ACDB49@SJ0PR15MB5245.namprd15.prod.outlook.com>",
            "subject": "Voice Mail (25 seconds)",
            "bodyPreview": "You received a voice mail from Developer desginer at [email protected]:   917573933439Email:  [email protected]________________________________Thank you for using Transcription! If you don't see a transcr",
            "importance": "normal",
            "parentFolderId": "AAMkAGVmMDEzMTM4LTZmYWUtNDdkNC1hMDZiLTU1OGY5OTZhYmY4OAAuAAAAAAAiQ8W967B7TKBjgx9rVEURAQAiIsqMbYjsT5e-T7KzowPTAAAAAAEMAAA=",
            "conversationId": "AAQkAGVmMDEzMTM4LTZmYWUtNDdkNC1hMDZiLTU1OGY5OTZhYmY4OAAQABWnJa-v201IgLwcCubGPfM=",
            "conversationIndex": "AQHYh604Faclr+/bTUiAvBwK5sY98w==",
            "isDeliveryReceiptRequested": false,
            "isReadReceiptRequested": false,
            "isRead": false,
            "isDraft": false,
            "webLink": "https://outlook.office365.com/owa/?ItemID=AAMkAGVmMDEzMTM4LTZmYWUtNDdkNC1hMDZiLTU1OGY5OTZhYmY4OABGAAAAAAAiQ8W967B7TKBjgx9rVEURBwAiIsqMbYjsT5e%2FT7KzowPTAAAAAAEMAAAiIsqMbYjsT5e%2FT7KzowPTAASXFxfBAAA%3D&exvsurl=1&viewmodel=ReadMessageItem",
            "inferenceClassification": "focused",
            "body": {
                "contentType": "html",
                "content": "<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><style type="text/css"><!--a:link{color:#0563C1}a:visited{color:#954F72}a:active{color:#954F72}--></style></head><body><style type="text/css"><!--a:link{color:#0563C1}a:visited{color:#954F72}a:active{color:#954F72}--></style><div style="font-family:'Segoe UI',Arial,sans-serif; background-color:#ffffff; color:#16233A; font-size:10.5pt"><div id="UM-call-info" lang="en"><div style="font-family:'Segoe UI',Arial,sans-serif; font-size:9pt; color:#595959">You received a voice mail from Developer desginer at <a href="sip:[email protected]" style="color:#0070C0">[email protected]</a>.</div><br><table border="0" style="width:100%; table-layout:auto"><tbody><tr><td width="15%" nowrap="" style="font-family:'Segoe UI',Arial,sans-serif; color:#595959; font-size:9pt; border-width:0in">Work:</td><td width="85%" style="font-family:'Segoe UI',Arial,sans-serif; color:#000000; border-width:0in; font-size:9pt; vertical-align:top; padding-left:10px; padding-right:10px"><a href="tel:917573933439" style="color:#3366CC">917573933439</a></td></tr><tr><td width="15%" nowrap="" style="font-family:'Segoe UI',Arial,sans-serif; color:#595959; font-size:9pt; border-width:0in">Email:</td><td width="85%" style="font-family:'Segoe UI',Arial,sans-serif; color:#000000; border-width:0in; font-size:9pt; vertical-align:top; padding-left:10px; padding-right:10px"><a href="mailto:[email protected]" style="color:#3366CC">[email protected]</a></td></tr></tbody></table><br><br></div><div><hr style="width:75%; background-color:#bfcddb; border:0 none; text-align:left; margin-left:0px"><br></div><div lang="en" dir="ltr" style="font-family:'Segoe UI',Arial,sans-serif; font-size:9pt; color:#595959; font-weight:bold">Thank you for using Transcription! If you don't see a transcript above, it's because the audio quality was not clear enough to transcribe.</div><br><div lang="en" dir="ltr" style="font-family:'Segoe UI',Arial,sans-serif; font-size:9pt; color:#595959"><a href="https://aka.ms/vmsettings" style="font-size:9pt; color:#0070C0">Set Up Voice Mail</a></div></div></body></html>"
            },
            "sender": {
                "emailAddress": {
                    "name": "[email protected]",
                    "address": "[email protected]"
                }
            },
            "from": {
                "emailAddress": {
                    "name": "[email protected]",
                    "address": "[email protected]"
                }
            },
            "toRecipients": [
                {
                    "emailAddress": {
                        "name": "Megan Bowen",
                        "address": "[email protected]"
                    }
                }
            ],
            "ccRecipients": [],
            "bccRecipients": [],
            "replyTo": [],
            "flag": {
                "flagStatus": "notFlagged"
            },
            "attachments": [
                {
                    "@odata.type": "#microsoft.graph.fileAttachment",
                    "@odata.mediaContentType": "audio/mp3",
                    "id": "AAMkAGVmMDEzMTM4LTZmYWUtNDdkNC1hMDZiLTU1OGY5OTZhYmY4OABGAAAAAAAiQ8W967B7TKBjgx9rVEURBwAiIsqMbYjsT5e-T7KzowPTAAAAAAEMAAAiIsqMbYjsT5e-T7KzowPTAASXFxfBAAABEgAQAFbg-4hu7BZIrggwQ9N5ikk=",
                    "lastModifiedDateTime": "2022-06-24T09:31:45Z",
                    "name": "audio.mp3",
                    "contentType": "audio/mp3",
                    "size": 168986,
                    "isInline": false,
                    "contentId": null,
                    "contentLocation": null,
                    "contentBytes": "//OIxAAAAAAAAAAAAFhpbmcAAAAPAAACwAACkvgAAwcJDQ8TFhk"

How can I split this data so each attachment has it’s own file with the data related to the attachment?

Asked By: RunRabbit

||

Source

Answer 1

graph_data is an endpoint

So not a JSON value. Then why use that for json.dumps?

so each attachment has it’s own file with the data related to the attachment

Loop over the attachments.

emails = response_json['value']

s3 = boto3.client('s3')
bucket ='demo-bucket'

for email in emails:
    email_id = email['id']
    subject = email['subject']
    if email['hasAttachments']:
        print(subject)
        attachments = email['attachments']
        for attachment in attachments:
            name = attachment['name']
            fileContent = json.dumps(attachment, indent=2)
            s3.put_object(Bucket=bucket, Key=name.replace('.', '_') + '.json', Body=fileContent.encode('UTF-8'))
        print('Upload Complete')
        download_email_attachments(email_id, headers)

Answered By: OneCricketeer

Splitting json.dumps into smaller files

Question:

Answers: