PPTX file generator using AWS services and python

In this article, I'm going to explain how we can generate PPTX files using python, AWS services and PPTX templates.

In our case we have already created PPTX templates as listed in figure 1. Now we need to identify the correct template. To do this we can use a unique name to specify each template. In this article I will be using the template name as the unique identifier for the template. and data to update these templates. Also we need to update the red color texts ( those are not in actual pptx template but those are the requirements ) using actual data.so we can identify what are the PPTX operations we need to automate.

I have selected following requirements to explain in this article,

  • Replace text placeholders according to our data.
  • Execute conditions.
  • Add Images.

Prerequisites 

  • Should have knowledge on python
  • Should have knowledge on AWS serverless stack

First of all we should have a solution to mention these operations and positions in the PPTX templates. Therefore I decided to use some syntax that we can add to the pptx templates using text boxes. These syntax can be executed within our code base.

Syntax

Replace text using data ( +++INS data path +++ )

As I mentioned this is the syntax for replacing text according to the data set. If we have a data object like this { “name” : “John”, “city”: “ Colombo” } we can update the pptx template as below ( figure 2).

Execute conditions using data ( +++IF (( condition data ))<< +++INS data.project_description +++ >>IF-END+++ )

We can use this syntax to execute if conditions. We need to mention the condition and syntax to execute content if the condition returns true ( Figure 3) .

Add images ( +++IM data path +++ )

We can use this syntax to insert images that can be base64 converted data set or image path. In our case we have used image paths in the data set { “sample_image” : “image url | image path | base64 data set” } (Figure 4 ).

These are only a few syntaxes and operations that we have used, according to the requirement, you can use any kind of syntax and operations.

Architecture

Shown above is the solution architecture ( figure 5 ). This service is running on AWS serverless. Therefore, scalability and availability will be handled by AWS.

Now let’s go through the details regarding the architecture.

  1. User request.

    This request should include PPTX template key ,data to update pptx template and authentication token.

  2. Lambda function trigger

    API gateway will validate user request using authentication token and it will trigger the PPTX generation lambda function. Then this function will start to read and update according to the syntax (what we added to the pptx template).

  3.  Fetch PPTX template

    First we need to upload our PPTX template (with syntaxes) to the s3 bucket. After that we can fetch the PPTX template by using the requested template key. Now the PPTX generator function is going to update the PPTX template according to our syntax and requested data. Also we can use this bucket to store images if we need to add images to a PPTX file.

  4. Upload generated PPTX file

    After the PPTX generation process, the generated PPTX file will be stored in this s3 bucket.

Setup AWS serverless environment

This service is based on python because it’s open source and very fast (pptx file with 20 slides and more than 40 text replacements would take only 2 seconds).
Therefore we need to install python and its version should be over 3.0.

After that we can install the python-pptx module.

pip install python-pptx

Now we can create a code base by adding a new folder.

.

├── .gitignore
├── handler.py
└── serverless.yml

We can add a gitignore file, handler.py file to write our function and serverless.yml file to connect and create AWS resources. Also we can add requirements.txt file to manage and install python modules.

Let’s install serverless using the following code segment

npm install -g serverless

The virtual environment could be created using the following code segment

pip3 install virtualenv

virtualenv venv --python=python3

We can active the virtual environment by running following command

source venv/bin/activate

Lets update serverless.yml file by adding s3 bucket creation, permission to access s3 bucket, creation of docx generation lambda function with API endpoint. You can follow the sample serverless file mentioned below ( figure 6).

PPTX generation function

First we can start with a simple python function like below ( figure 7 ).

We can get the pptx file content using the python-pptx module. After that we are going to loop every slide to execute our logic contained in the pptx template.

Every slide contains shapes therefore we need to loop shapes inside the slide and need to capture text shapes ( we are using text shapes to add our syntaxes ) figure 8.

In this case I have used if else conditions but we can use python class to replace this.

Now we can capture the syntaxes.

Let's move on to the logics behind the syntaxes.

Text replace

Here, we are going to get actual value from the data object using data path which includes syntax. After that we can replace actual value to syntax. If there isn’t any value for the data path then we can clear the syntax ( Figure 9).

IF conditions

We need to use the eval operator to get the condition output (Figure 10).

First we need to capture the condition and after that we can get condition output by using eval_executor function, we need to pass the condition and our data object to the eval operator function then it will return the output.

After that, if the return value is True we can execute syntaxes inside the if condition. If it is False then we can clear the syntaxes.

In this example, we only consider text replacement inside the if condition but we can add any syntax as per the requirement.

Insert Images

We need to set left, height, top, width values to insert images, we can get that information from our data set ( Figure 12).

Now we have updated the PPTX template file. We can save this file in a S3 bucket or we can save this in a file system.

This is a very fast and very accurate process to update PPTX files using our custom logics.

References:

Gayashan Galagedara

Associate Technical Lead