Create the Lambda Function

Time Estimate: 5 - 10 minutes

In this section you will create the S3 Processor Lambda function

  1. Make sure you are in N. Virginia region. Look for the Lambda service in the AWS Management console and click on the highlighted result to access the service.

  2. Click Create function in the next screen.

  3. You should see the screen below. Select the 3.7 python runtime as shown below and configure the execution role to be the IAM role that you created in the previous step. Click Create function.

    Create Lambda
  4. Copy the code below. The code will update the DynamoDB Documents table as soon as a document is uploaded to our S3 bucket.

import json
import os
import uuid
import urllib
import datastore
from helper import FileHelper

def processRequest(request):

    output = ""

    print("request: {}".format(request))

    bucketName = request["bucketName"]
    objectName = request["objectName"]
    documentsTable = request["documentsTable"]
    outputTable = request["outputTable"]

    print("Input Object: {}/{}".format(bucketName, objectName))

    ext = FileHelper.getFileExtenstion(objectName.lower())
    print("Extension: {}".format(ext))

    if(ext and ext in ["jpg", "jpeg", "png", "pdf"]):
        documentId = str(uuid.uuid1())
        ds = datastore.DocumentStore(documentsTable, outputTable)
        ds.createDocument(documentId, bucketName, objectName)

        output = "Saved document {} for {}/{}".format(documentId, bucketName, objectName)

        print(output)

    return {
        'statusCode': 200,
        'body': json.dumps(output)
    }

def lambda_handler(event, context):

    print("event: {}".format(event))

    request = {}
    request["bucketName"] = event['Records'][0]['s3']['bucket']['name']
    request["objectName"] = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
    request["documentsTable"] = os.environ['DOCUMENTS_TABLE']
    request["outputTable"] = os.environ['OUTPUT_TABLE']

    return processRequest(request)
  1. Paste the code as shown below:

    Code Lambda
  2. We need to configure some enviroment variables in order for our Lambda fucntion to be able to identify the SQS queue endpoint and DynamoDB table. The enviroment variables we are going to configure are:

    SYNC_QUEUE_URL -> sqs queue url
    DOCUMENTS_TABLE -> Documents table name
    OUTPUT_TABLE -> Ouputs table name
    
  3. Configure those as shown below. Then click Save.

    Env Variables