AWS Public Sector Blog

Scaling intelligent document processing workflows with AWS AI services

Scaling intelligent document processing workflows with AWS AI services

Governments and the organizations providing technology solutions for the benefits of the citizens are continuously presented with evolving laws and compliance requirements. For example, a recent directive from the US Government’s Office of Management and Budget (OMB) and National Archives and Records Administration (NARA) requires all US federal government agencies to transition to electronic records by June 30, 2024. These changes impact document-based workflows, both existing and newly developed, and require timely processing of information.

Organizations can turn to Amazon Web Services (AWS) to implement intelligent document processing (IDP) workflows. Services like Amazon Textract can extract text and Amazon Comprehend can efficiently derive valuable insights, categorize documents, and then route information to appropriate departments. IDP uses artificial intelligence (AI) and machine learning (ML) to inform natural language processing (NLP), optical character recognition (OCR), and computer vision that can read, extract, collect, label, and interpret document data so it can be put to use digitally.

As the daily volumes of document submissions increases for government organizations, IDP solution architectures must absorb spikes in requests without creating delays or other impact for the users. In cases where the processing volume exceeds the limits of the resources available in an AWS Region, organizations can distribute the workloads across multiple Regions to increase the document processing throughput.

This post presents high-level architecture guidance built around Amazon Comprehend to create a distributed document processing workload that can overcome the challenges of unpredictable request patterns.

Solution overview: Scaling document processing workflows with AWS

AWS provides a set of quotas that limit the number of resources that can be utilized by a service in any given AWS Region. This function is used to protect the AWS account holder from unintentional spend and to provide reliable and highly available services to all customers. When the incoming volume of documents surge in an unpredictable pattern, the calls to the document processing services increase as well. This spike in usage may exceed the allocated quota, and the service responds with a throttling error, impacting the workload’s capabilities.

As part of a solution design, the account owners determine the number of resources needed to achieve the desirable throughput and accommodate the maximum anticipated document volumes, then submit request for adjustments to quotas up to the available limits. In situations where the requested quota increases exceed the Region limits, organizations can achieve the desired target processing throughput by distributing the workloads across multiple AWS Regions.

When processing documents, a typical architecture for extracting and classifying information with Amazon Comprehend is as follows.

Figure 1. Architecture for extracting and classifying information with Amazon Comprehend.

Figure 1. Architecture for extracting and classifying information with Amazon Comprehend.

First, the solution receives incoming documents, which are uploaded and stored in an input Amazon Simple Storage Service (Amazon S3) bucket. The action triggers an AWS Lambda document classifier function through event notifications. The AWS Lambda function sends a request to the Amazon Comprehend Custom Classification API, which extracts the text from the document and produces a label based on the content. The document classifier returns the classification results. The document classifier function can then trigger the downstream process and route the information to the desired destination.

However, when the AWS Lambda document classifier function exceeds the size of quotas and the available system processing bandwidth for Amazon Comprehend Custom Classification API, it receives a throttling error.

To accommodate the desired request volumes, this solution introduces workload distribution and the rate-limiting concepts.

Rate-limiting the requests

The scale at which applications can process requests can be defined with transactions per second (TPS) metric. Services like Amazon Comprehend and Amazon Textract are bound by a maximum TPS through the quota size of their API operations. To limit the rate of requests, the following architecture introduces an intermediary queue that can buffer the requests and allow users to define the rate at which to process the incoming documents.

Figure 2 features an updated architecture to incorporate the ability to rate limit the requests.

Figure 2. Rate-limit implementation architecture to support intelligent document processing workflows.

Figure 2. Rate-limit implementation architecture to support intelligent document processing workflows.

In an architecture for intelligent document processing that supports rate-limit implementation, the solution uses the Amazon Simple Queue Service (Amazon SQS) queue to temporarily store the new document event notifications from Amazon S3 input bucket. AWS Lambda document classifier function is configured with the Amazon SQS queue as the event source. The AWS Lambda function allows for setting the concurrency limit, which should be equal to or lower than the desired TPS. This concurrency limit setting keeps the rate at which the functions are executed, limiting the rate of calls to the Amazon Comprehend service API.

Workload distribution across AWS Regions and AWS accounts

This rate-limiting architecture in Figure 2 helps customers make sure they don’t trigger a throttling error, but designing the architecture within one AWS Region still limits the amount of requests the solution can process. By distributing the workload across multiple AWS Regions and accounts, the solution can achieve a higher total TPS than a single AWS Region or AWS account can offer – increasing the total document processing capacity. For example, if the required TPS for an application is 200, and if a single AWS Region and AWS account can provide 80, then utilizing three AWS Regions with the same capacity can result in a total available TPS of 240.

AWS Region Maximum TPS
us-east-1 80
us-east-2 80
us-west-1 80
Total maximum TPS 240

The following architecture includes load distribution logic in the AWS Lambda document classifier function to distribute the solution across three AWS Regions.

Figure 3. Load distribution architecture to support intelligent document processing across multiple AWS Regions.

Figure 3. Load distribution architecture to support intelligent document processing across multiple AWS Regions.

The architecture in Figure 3 introduces the request distribution function of the AWS Lambda document classifier. By implementing the logic and a set of rules, the function can calculate the desired volume of calls for each AWS Region.

For example, to implement an even split between all AWS Regions, a document classification function can use a randomized operation to produce a number between 1 and 100, representing the percentage of all available TPS, then send the request to the destination that is corresponding to the number range. In Figure 3, numbers in range 1-33 will result in routing to us-west-1 region, 34-66 will route the request to us-east-1 region, and the remainder values will route to the third destination, in this case us-east-2.

This example uses a random distribution logic. For a precise allocation of requests against their desired destinations, you can use a persistent tracking mechanism, such as counters in Amazon DynamoDB.

Conclusion

These reference architectures can help organizations scale document processing workflows to support sudden spikes in processing needs and accelerate the transition to digital document processing. Intelligent document processing can support more accurate collection, lowered capture costs, and the ability to retain and recall information electronically.

For demonstration on how you can use intelligent document processing on AWS, see the video Automate your public sector document processing with AWS AI.

 Find more solutions and guidance for building intelligent document processing solutions in the AWS Solutions Library.

 Learn how more than 7,500 government agencies around the world use AWS to support their missions at the AWS for Government hub.

Read related stories on the AWS Public Sector Blog:

Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.

Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.

Sebastian Leks

Sebastian Leks

Sebastian Leks is a product acceleration solutions architect at Amazon Web Services (AWS), where he guides customers on their artificial intelligence (AI) and machine learning (ML) product development journey, promoting best practices for building performant and scalable solutions. Prior to joining AWS, Sebastian held multiple senior technology leadership roles in education, healthcare, and energy industry organizations. In his spare time, he enjoys water sports activities and traveling with his family.

Aneel Murari

Aneel Murari

Aneel Murari is a senior serverless specialist solutions architect for Amazon Web Services (AWS), based in the Washington, D.C area. He helps customers design scalable, highly performant, secure, and cost effective solutions on AWS. He is passionate about using event-driven architecture and serverless solutions to solve various customer challenges. He has over 18 years of software development and architecture experience and holds a graduate degree in computer science.