Skip to main content
Detect EventBridge target failure: Part 1 - with dead letter queue

Detect EventBridge target failure: Part 1 - with dead letter queue

·4 mins· 0 ·
AWS Serverless EventBridge SQS DLQ
Pubudu Jayawardana
Author
Pubudu Jayawardana
Cloud Engineer, AWS Community Builder

Intro
#

When EventBridge delivers messages to its target, there can be many reasons that cause failing to send a message. There can be permission issues, rate limits or the unavailability of the target or even can be a glitch in the AWS itself, just to name a few.

No matter what the reason is, it is always ideal to get notified that there is an issue delivering messages and the reason for the failure. In this blog post I will discuss how a dead letter queue can be useful to get notified when the EventBridge fails to deliver messages to its target.

Dead letter queues
#

Dead letter queues are unsung heroes of the event driven architecture 😀. Those are easy to set up and manage yet greatly improve the resilience of a system. Also it is very cost effective.

Let’s see how we can capture the target delivery failures in EventBridge using a dead letter queue.

Please note that EventBridge supports DLQ in a couple of “levels”. EventBridge bus can have a DLQ itself, or you can set a DLQ per target basis. Let’s discuss the differences.

DLQ on EventBridge bus level
#

EventBridge bus can have a DLQ of its own. However, this is limited to capture any errors related to the KMS encryption. EventBridge sends events that aren’t successfully encrypted to the DLQ.

You can only see this DLQ setting in the EventBridge in AWS console only if customer managed KMS is used to encrypt messages. In fact, it is part of the Encryption settings.

Image: DLQ for Event bus only available when customer managed KMS is in use.
Image: DLQ for Event bus only available when customer managed KMS is in use.

However, this DLQ will NOT capture any target related failures, so that we cannot use this for our purpose.

DLQ on EventBridge target level
#

When EventBridge cannot deliver a message to a target, we can set up a SQS queue to put that message in, on the target level.

Image: DLQ on target.
Image: DLQ on target.

Since one rule can have more than one target, each target can have different DLQs as well. You can use the same SQS queue as the DLQ for all the targets, but you have to configure it for each and every target separately. It may sound like repetitive work, but if you use an infrastructure as a code tool like CDK or CloudFormation, this is not complex.

How it works
#

Image: High level architecture.
Image: High level architecture.
  1. EventBus tries to deliver a message to its target (here, it is a SQS queue) via EventBridge rule.

  2. Let’s assume there is a permission issue, and the message cannot be delivered.

  3. Then, EventBridge will put the message into the DLQ configured for this specific target.

  4. In Cloudwatch, there is an alarm set up to be triggered whenever there is a message in the DLQ.

  5. When the failed message is in the DLQ, Alarm triggers and there is a SNS topic configured as the alarm action.

  6. And when the alarm action publishes a message to SNS topic, it will send the notification to all the subscribers to notify about the failure.

Try this yourself
#

I have created an AWS SAM template to try this scenario in your AWS account.

  1. Clone the Github repository: https://github.com/pubudusj/event-bridge-target-failure-detection-with-dlq

  2. Deploy the stack using below command:

sam deploy \
  --template-file template.yaml \
  --stack-name event-bridge-target-failure-detection-with-dlq \
  --capabilities CAPABILITY_IAM \
  --no-confirm-changeset \
  --parameter-overrides NotificationEmail=[YourEmailAddress]
  1. Here, add your email address as NotificationEmail, so you will get the notification into your email box when the target fails.

  2. Once the stack is deployed, you will get a SNS subscription confirmation email. You need to confirm it in order to receive notifications.

  3. Then, publish a message into the created event bus with the source as xyzcorp.

  4. This way the message will match the rule and try to deliver the message to the target.

  5. I have blocked the permission for publishing the target intentionally to simulate the failure.

  6. In a moment, you should get an email with the alarms status.

  7. Further, if you check the messages in DLQ, you can see the failed message and in the message attributes, you may see the reason of failure (depends on the reason).

Image: Message attributes of a failed message in DLQ.
Image: Message attributes of a failed message in DLQ.
  1. You can configure the threshold, period and evaluation periods of the alarm as needed to control the frequency of the notifications in case of a failure. https://github.com/pubudusj/event-bridge-target-failure-detection-with-dlq/blob/main/template.yaml#L61-L63

Summary
#

  1. EventBridge bus has a DLQ but it is for a different purpose and cannot capture any target failures.

  2. You can use this dead letter queue approach, to capture any messages which cannot be delivered to the target. Based on the no of messages in the queue, you can get notified using CloudWatch metric and SNS. However, you will need to configure it for each and every EventBridge target separately. Using an IAC tool to configure this may be convenient.

  3. I will discuss another solution to achieve the same in part 2 of this blog post.

Resources
#

  1. Using dead-letter queues to process undelivered events in EventBridge https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-rule-dlq.html

👋 I regularly create content on AWS and Serverless, and if you’re interested, feel free to follow / connect with me so you don’t miss out on my latest posts!

Related

EventBridge to SQS when cross region and cross account
·3 mins· 2
AWS Serverless EventBridge SQS
When EventBridge needs to deliver messages to SQS, depending on cross account or cross region, the solution is different.
Dead Letter Queue (DLQ) for AWS Step Functions
·6 mins· 1
AWS Serverless Step Functions SQS DLQ
Step Functions - one of the main Serverless services offered by AWS for workflow orchestration - does not support dead letter queues natively. This is how we can work around to use DLQ with Step Functions.
Self healing Serverless App with Lambda Destinations and EventBridge
·6 mins· 0
AWS Lambda Serverless EventBridge EventBridge Pipe SQS
In this blog post, explains how a Lambda based Serverless application reacts to the errors and attempts to re-drive messages to the origin in a controlled manner.