What it covers
In this blog post we will see how we can use Google Cloud Platform’s Vision API to extract text from images in different languages. This will be implemented in a container based AWS Lambda.
We will walk through setting up a GCP project, service account and build, deploy and test the container lambda. To implement these services you should have an AWS user account with permissions to — read/ write to S3, upload to ECR and create Lambda functions, for GCP you can use the free trial period but it will need billing information to be added to use its services.
Why we need GCP to extract text from images
There are multiple ways to extract text from images but what makes GCP Vision API special? The use case here is for images that can be in different languages and we need to extract the raw text to identify the language.
Some of the popular tools include –
- Pytesseract — its a wrapper for Google’s Tesseract-OCR Engine that can be used to OCR text from images in non english languages. The tool works for >100 languages but there are 2 limitations with it, we need to download the language packs manually from GitHub and install them and we also need to specify the language before extracting text from the image.
- Amazon Rekognition — this tool is able to automatically detect and extract text in images in all supported languages, without requiring a language parameter. This overcomes the limitations with Pytesseract but it only supports Arabic, Russian, German, French, Italian, Portuguese and Spanish so far making the usage restrictive.
Google’s Cloud Vision API comes to the rescue here. It supports >50 languages that are actively maintained and providing language hints to the service is not required. All the supported languages can be found here.
Setting up permissions for GCP
Set up your GCP project and authentication
- Create a project in google platform here. Create an organization and select that for the project.
- Enable the Cloud Vision API
Search for Cloud Vision API under APIs & Services and click on ENABLE. We need the Vision API to extract text from image
- Create a service account under the new project, we will then use this account to implement GCP services.
- Download the private key for service account
Once the service account is created, go to Manage Keys and select Add Key -> Create New Key -> Json (key type)
This downloads the private key as a json file. We need this key to initialize the GCP Client in Lambda. There are multiple ways to do that-
- Less Secure: Directly give access to the file path
from google.oauth2.service_account import Credentials
Credentials.from_service_account_file(<PATH_TO_SERVICE_ACCOUNT_JSON>)
To do so we need to add the file to the lambda container image.
- More Secure: Store the private key (contents of the json file) in a vault like AWS Secret Manager and access the value from Lambda when initializing.
Credentials.from_service_account_info(<JSON_ACCOUNT_INFO>)
We will implement this second method in our example.
Add private key to AWS Secret Manager
We use the AWS CLI to add the key value pair to AWS Secret Manager using create-secret. Replace PRIVATE_KEY_JSON_CONTENT with contents of the downloaded key json as secret-string
aws secretsmanager create-secret –name “”service-account-key”” –secret-string ‘<PRIVATE_KEY_JSON_CONTENT>’
- Make sure you have enabled billing on your project even if you’re using the free trial. You can create a new billing account here.
Create a lambda function to get text from image
We are going to create a container based lambda function that uses the GCP Vision package to extract text from an image stored in an AWS S3 bucket. We have set up all the permissions needed in GCP and added the service account key to AWS Secret Manager, this key is all we need now to make the GCP Vision API call.
- Create a folder where we will add all the files needed for the container lambda
mkdir gcp-in-lambda
Before getting started with the Lambda function, there are some env vars that we need to set for deploying the function. There are 2 ways we can do this-
- We can set the environment variables using export in CLI but these can only be used by processes created by that shell
export AWS_REGION=<YOUR_AWS_REGION>
export AWS_ACCOUNT_ID=<YOUR_AWS_ACCOUNT_ID>
or
- Use direnv package to load the environment variables inside our current project directory. The installation steps can be found here. For MacOS run
- Add lambda_function.py to the folder
Now let’s look at the main piece of code. Here,
- We store the image in AWS S3 and pass the bucket and file path in the Lambda trigger event
- We need the GCP project id and service account key for initializing the GCP Client, we get these from Secret Manager
- GCP Vision package implements document_text_extraction method on the image and returns the content as string
- Create the requirements file and list the packages needed for lambda here. For this example you will just need the google-cloud-vision.
echo google-cloud-vision==2.7.1 > requirements.txt
- Create the Dockerfile
- Create the script to build and push the Lambda docker image
This script creates a new AWS ECR repository named container-lambda-for-gcp-vision, builds the docker image and pushes it to this repo.
- Create the container lambda using AWS cli
Now that the final Lambda container image is in AWS ECR, let’s create a container Lambda with reference to our image URI. To get started let’s create a basic execution role for Lambda and then add some additional permissions needed from AWS Console.
After running these 2 commands, go to the AWS Lambda Console and you should see the image-text-extraction Lambda created.
- To add more permissions, goto Configuration -> Permissions and click on the role name lambda_execution.
Select Add permissions -> attach policies
Here, search for policy AmazonS3ReadOnlyAccess, SecretsManagerReadWrite and CloudWatchFullAccess and attach these to the role. We need S3 permissions when reading the image from s3, Secret Manager permissions to access the GCP service account key and CloudWatch access to write Lambda logs.
Let’s also bump the Lambda timeout from the default 3 sec to 5 min giving it time to make the text extraction api call.
- OPTIONAL: if you need to update the lambda function run the following commands. This will build and push the updated docker image to ECR and then update the now created lambda function with the new image
- Now that everything is ready, let’s upload our test image to AWS S3 and invoke the Lambda function.
For this example we’ll use 2 images, one in the English language and another in Simplified Chinese.Download the images — english text, chinese text and then copy them to AWS S3
Once the files are added to the S3 bucket, lets invoke the lambda function
You can see the logs in AWS CloudWatch, look for the lambda under log groups here –
Conclusions
In this blog we demonstrate how to use GCP Vision service in AWS Lambda. A similar approach can be used to extract text from images in all languages supported by GCP Vision API and to implement more GCP services in Lambda.
To read more about GCP services click here.
To understand how to use container based AWS Lambda check out this blog.