In this article, we’re going to take a tour of the beating heart of our serverless architecture: our central Amazon EventBridge event bus. We’ll look at how domain events flow through it, how we get those events out of our database safely, and how EventBridge lets us react to both our own custom events and native AWS service events; all without tightly coupling anything together.
If you’ve read our previous articles on securing video with CloudFront signed URLs, event-driven closed captions, or automated certificate generation, you’ve already seen the event bus in action. This article zooms out and tells the whole story for practitioners getting started with EDAs.
“The goal is simple: when something meaningful happens in our system (a domain event), we publish an event describing it, and any number of downstream processes can react to that event and consume independently, asynchronously, and without the producer ever knowing they exist.” - Lee
The Problem We’re Solving 🎯
As a platform grows, the number of things that need to happen “when X occurs” grows with it. Consider a single business moment, a student completes a course:
- A certificate should be generated.
- An email should be sent congratulating them.
- A badge status might need recalculating.
- A Slack or Telegram message should ping the engineering team.
- Reporting tables should be updated.
- A CRM record should be touched.
If you wire all of that directly into the code that marks a course complete, you end up with a tangled monolith of side effects (where if one part fails, it all does). The completion endpoint now knows about PDFs, SES, badges, Slack, and your data warehouse. Every new requirement means editing the most critical path in your application, and every one of those side effects is a chance to fail the original request.
We need a way to:
- Decouple producers from consumers — the thing that consumes “course completed” shouldn’t care where it came from.
- React asynchronously — slow side effects (PDF generation, sending email) must never block the user experience.
- Add new reactions cheaply — a new consumer should be a new rule, not a code change to the producer.
- Emit events reliably — without falling into the classic dual-write trap (more on this below).
- Handle both our events and AWS’s events — S3 uploads and Transcribe completions are events too (not just our custom domain ones!).
The answer is an event-driven architecture (EDA) built around Amazon EventBridge, fed by Change Data Capture (CDC) from our database.
Our Example
To keep things grounded (but avoid handing out a map of our real infrastructure), we will discuss the Study From Experts platform at a high level.
We’ll follow a handful of business moments: a sign-up, a purchase, a video upload, a transcription completion, and watch how each one becomes an event on the bus that other parts of the system quietly react to.
💡 Note: the code examples are for discussion only and not fully productionised.
Architecture Overview 🏗️
At the centre of everything is EventBridge. Events arrive on it from two very different worlds: our own application (via Change Data Capture) and native AWS services, and EventBridge rules route them to the consumers that care.
Producers Custom & Default Buses Consumers
┌─────────────────────────┐ ┌───────────────────────┐
│ DynamoDB (source of │ │ Certificate Service │
│ truth) │ │ (SQS → Lambda) │
│ │ │ └───────────────────────┘
│ ▼ │ ┌──────────────────┐ ┌───────────────────────┐
│ DynamoDB Stream │─ CDC ───▶│ │── rule ▶ Email Service │
│ │ │ │ │ │ (SQS → Lambda → SES) │
│ ▼ │ │ Our │ └───────────────────────┘
│ Stream Processor │ │ EventBridge │ ┌───────────────────────┐
│ (Lambda) │ │ Buses (default │── rule ▶ Slack / Telegram │
└─────────────────────────┘ │ & custom │ │ Notifications │
│ for domain │ └───────────────────────┘
┌─────────────────────────┐ │ events + │ ┌───────────────────────┐
│ Amazon S3 (uploads) │── rule ─▶│ AWS service │── rule ▶ Video Processing │
└─────────────────────────┘ │ events) │ │ Pipeline │
┌─────────────────────────┐ │ │ └───────────────────────┘
│ AWS Transcribe │── rule ─▶│ │ ┌───────────────────────┐
│ (job state change) │ │ │── rule ▶ Reporting / CRM │
└─────────────────────────┘ └──────────────────┘ └───────────────────────┘
┌─────────────────────────┐ ▲
│ AWS MediaConvert │── rule ───────────┘
│ (job complete) │
└─────────────────────────┘
The shape is deliberately simple: producers don’t know about consumers. They publish an event on the bus and walk away. Consumers declare, through a rule, the events they’re interested in, and EventBridge handles the routing. Adding a new consumer never requires touching a producer.
.png)
💡This is event-driven architecture in action! Each component does one thing, communicates via events, and the pipeline flows naturally from upload to playable captioned video.
We call this choreography rather than orchestration, and it is covered in this great course by James Eastham here: https://www.studyfromexperts.com/courses/serverless-integration-patterns/
Events Are a First-Class Citizen 🥇
Before we get into mechanics, it’s worth stating the philosophy, because it shapes every decision that follows: for us, events are a first-class citizen, not an afterthought.
That means:
- Events have a defined, versioned schema — they’re a contract, not a loose JSON blob.
- Events describe business facts (“CourseCompleted”, “CoursePaymentComplete”), not database mechanics (“RowUpdated”).
- Events live in their own module in the codebase, organised by domain, with their own tests. (we have a mono-repo approach).
- Events carry enough context for a consumer to act without calling back to the producer (event-carried state transfer).
- Events are validated before they’re published — a malformed event never makes it onto the bus.
- Events are validated before they're consumed - we don't want to use a malformed event for further processing.
When you treat events this way, the bus becomes a reliable, self-documenting nervous system for the whole platform. A new engineer can read the list of event types and understand the important things that happen in the business.
Note: We also use Docusaurus, so we have fantastic developer documentation which allows you to see the events easily, and all of the information that surrounds them (versions, descriptions, schemas etc).
The Dual-Write Problem 🪤
Here’s the question that trips up most teams building EDA: when something changes, how do you both save it to your database and publish an event reliably?
The naive approach is to do both in the request handler:
// ❌ The dual-write trap — DO NOT do this
await database.save(order); // Step 1: write to the database
await eventBus.publish(orderPlaced); // Step 2: publish the event
This looks innocent, but it’s a distributed systems landmine. There are two processes here (your database and your event bus), and no shared transaction between them:
- If step 1 succeeds and step 2 fails, you’ve saved the order, but nobody downstream knows: no email, no certificate, silent data drift.
- If you flip the order and publish first, a failed save means you’ve announced an order completion that doesn’t exist.
The “obvious” fix is a distributed transaction, i.e. a two-phase commit (2PC) spanning the database and the bus. In a serverless, managed-services world, that’s not possible, and even where it’s possible, 2PC is slow, brittle, and a scaling bottleneck. We want to get around two-phase commit entirely.
The way out is to stop treating the database and the bus as two writes. Instead, we make the database the single source of truth and derive events from what was actually committed to it. That’s exactly what Change Data Capture gives us.
💡 The golden rule: the record is saved first in the database, and the event is published from the committed change. We never publish an event for something that didn’t make it into the database.
CDC: The Engine Room ⚙️
Our primary datastore is DynamoDB, and we enable DynamoDB Streams with the NEW_AND_OLD_IMAGES view type. Every committed write, i.e. insert, modify, or delete, produces a stream record containing the item before and after the change. A single Lambda, the stream processor, consumes that stream.
Because the stream only ever contains changes that were already durably committed to DynamoDB, the dual-write problem evaporates:
- The application writes to DynamoDB. That’s the only write it does. ✅
- DynamoDB commits the change and puts a record on the stream. ✅
- The stream processor reads the committed change and, where appropriate, publishes a domain event. ✅
If the application write fails, there’s no stream record and no event, correctly. If the stream processor fails to emit, the record is still safely in the database, and the change can be retried. The event is always a faithful echo of the committed state (albeit it is eventually consistent).
The single processor publishes a domain event to EventBridge when a meaningful business state change is detected.
Detecting Meaningful Change
Not every database write is a business event. A MODIFY fires on the stream every time any field changes, but “the user updated their avatar” is not “the user completed a course.” So the processor compares the old and new images to detect the specific transition it cares about.
A few principles are baked into that small block:
✔️ Compare old vs new — we publish CourseCompleted only on the transition from “no completion date” to “has completion date” which is only ever set once. Without this, every later edit to the enrolment would re-fire the event. Quite often, we also check what the entity status of the item was previously and what it is now.
✔️ Detect facts, not CRUD — the event is CourseCompleted, a business fact, not EnrolmentRowModified. Consumers care about meaning.
✔️ Rich payload — we include everything the consumer needs so it can act without a round-trip back to us.
For more critical processes, we use the notification pattern where the payload is made simply of the entity IDs involved, so we can call back out and get the up-to-date information for the consumer process).
This is the pattern repeated across the system. A new completed payment becomes CoursePaymentComplete. A status flipping to deleted becomes AccountDeactivated. A new row becomes UserSignedUp or CourseRegistered. The processor is a translator: committed data changes in, meaningful business events out.
The Event Envelope ✉️
Every event we publish, regardless of domain, shares the same outer structure: a block of standard metadata and a domain-specific data payload.
We define it once with Zod, which gives us both runtime validation and a TypeScript type for free:
import { z } from 'zod';
// Standard metadata attached to EVERY event
export const eventMetadataSchema = z.object({
version: z.number().int().positive(), // schema version for this event type
created: z.iso.datetime(), // when the event was created (ISO 8601)
correlationId: z.uuid(), // ties related events together for tracing
domain: z.enum(['user', 'course', 'certificate', 'video', 'supportTicket']),
source: z.string().startsWith('sfe-'), // the producer (all of ours start sfe-)
type: z.string(), // e.g. 'CourseCompleted'
id: z.uuid(), // unique id for this specific event
});
// The full message: metadata + a domain-specific data payload
export const eventMessageSchema = z.object({
detail: z.object({
metadata: eventMetadataSchema,
data: z.record(z.string(), z.any()),
}),
});
export type EventMessage = z.infer<typeof eventMessageSchema>;
Each field earns its place:
Field | Why it exists |
|---|---|
| Let's event schemas evolve independently. A consumer can branch on the version it understands. (always backwards-compatible for us with no breaking changes) |
| The event’s own timestamp, distinct from when a consumer happens to process it. |
| The thread that ties a chain of related processes together. A sign-up and the welcome email can share one ID, making cross-service tracing and troubleshooting significantly easier. |
| A coarse grouping ( |
| Which service emitted it. We enforce a prefix so it’s always obvious the event is ours. |
| The business fact: |
| A unique identifier for this event, invaluable for idempotency and de-duplication (always a UUID v4). |
The split between metadata (consistent, infrastructural) and data (domain-specific, meaningful) is the key design decision. Consumers can rely on metadata always being there for tracing, routing, and idempotency, while the data payload carries exactly what that particular event needs.
Versioning Events
Because events are a contract, changes need a version bump. We keep a central mapping of event type to version, so the publisher always stamps the right number:
export const EVENT_VERSIONS: Record<EventType, number> = {
[EventType.UserSignedUp]: 1,
[EventType.CourseCompleted]: 2, // bumped when the payload changed
[EventType.CoursePaymentComplete]: 1,
// ...one entry per event type
};
Using Record<EventType, number> means TypeScript forces us to assign a version to every event type, i.e., forget one, and the build fails. It’s a small thing that prevents a whole class of “we shipped an event with no version” bugs.
Two Kinds of Events, One AWS Service 🔀
Amazon EventBridge doesn’t care whether an event came from our own code or from an AWS service, it’s all just events with a source and a detail-type, matched by declarative rules. That means our consumers can react to custom domain events and native AWS service events through exactly the same mechanism.
1. Reacting to Our Own Custom Events
When the stream processor publishes the eventCourseCompleted, a target rule routes it to the certificate pipeline/process. The rule is pure configuration:
// CDK — route our custom CourseCompleted event to the certificate queue
new events.Rule(this, 'CertificateEventsRule', {
eventBus: props.eventBus,
ruleName: `${stageName}-certificate-events`,
eventPattern: {
source: ['sfe-service'], // our application
detailType: ['CourseCompleted'], // our business fact
},
targets: [new targets.SqsQueue(certificateQueue)],
});
“When an event from sfe-service with type CourseCompleted arrives, drop it on the certificate queue.” A Lambda drains that queue and generates the PDF, the full flow is in our certificate generation article.
2. Reacting to AWS Service Events
Now the same style of target rule is used for AWS events. When an S3 upload lands, or an AWS Transcribe job finishes, those services emit their own events. We match them with rules that filter on the AWS source instead of ours:
// CDK — react to AWS Transcribe finishing a job
new events.Rule(this, 'TranscribeCompletionRule', {
ruleName: `${stageName}-transcribe-completion`,
eventPattern: {
source: ['aws.transcribe'],
detailType: ['Transcribe Job State Change'],
detail: {
TranscriptionJobStatus: ['COMPLETED', 'FAILED'],
// only react to our own jobs, identified by a naming prefix
TranscriptionJobName: [{ prefix: 'sfe-' }],
},
},
targets: [new targets.LambdaFunction(captionProcessorLambda)],
});
This is the exact pattern behind our event-driven closed captions pipeline: an S3 upload kicks off transcription, Transcribe emits a completion event, a rule routes it to a Lambda, and the captions get embedded during transcoding. We didn’t poll, we didn’t run a state machine, we just listened to events.
💡 The mental model that unlocks everything: a video upload finishing, a transcription completing, and a course being completed are all just events. Once you stop distinguishing “my events” from “AWS’s events”, your whole pipeline becomes a set of small functions that each listen for one thing and react.
Declarative Filtering Is the Superpower
Notice that in both cases the consumer expresses interest declaratively. There’s no if/else dispatcher in code deciding who gets what. EventBridge evaluates the patterns and routes accordingly. That keeps routing logic out of our functions and makes the system’s wiring visible in infrastructure-as-code, where it can be reviewed and reasoned about.
One Event, Many Consumers 🌱
The real magic of EDA shows up when a single event needs to trigger several unrelated reactions. Because consumers subscribe independently, we just add more rules, and the producer is untouched.
Take CoursePaymentComplete. The moment a purchase is recorded in DynamoDB, the stream processor emits the event, and several rules light up at once:
┌─────────────────────────┐
│ CoursePaymentComplete │
│ (on the central bus) │
└────────────┬────────────┘
│
┌──────────────────────────────┼──────────────────────────────┐
▼ ▼ ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│ Email Service │ │ Notifications │ │ Reporting / CRM │
│ "Thanks for your │ │ Ping the team in │ │ Update revenue │
│ purchase!" │ │ Slack / Telegram │ │ dashboards │
└───────────────────┘ └───────────────────┘ └───────────────────┘
Each consumer:
- Has its own rule filtering for the events it cares about.
- Has its own queue and Lambda, so it scales and fails independently.
- Can be added, changed, or removed without anyone else knowing.
Want to start sending purchase pings to Telegram instead of Slack? That’s a new consumer and a new rule; the payment flow doesn’t change at all. (We compared those two notification channels in our Telegram vs Slack article.)
Want to feed a new analytics system? Add a target rule. The producer has no idea, and that’s the entire point.
This is also why downstream async processes are such a natural fit. Generating a certificate takes a few seconds. Rendering and sending an email involves templates, SES, and bounce handling. None of that belongs on the request path.
By emitting an event and letting independent consumers pick it up, the user’s action returns instantly while the slow, important work happens in the background: each consumer working through its own queue at its own pace, with its own retries and dead-letter queue.
Design Principles & Gotchas ⚠️
A central bus is powerful, but distributed events come with sharp edges. Here’s what we keep front of mind.
Save First, Emit Second — Always
This is the one rule that makes everything else safe, so it bears repeating. The database write is the source of truth that the business event happened; the event is derived from the committed change via CDC. We never publish optimistically before the data is durable. This is how we sidestep two-phase commit without ever risking phantom or missing events.
Detect Transitions, Not Just Changes
On MODIFY, always compare the old and new images and publish only on the specific transition you care about. Skipping this is the most common cause of duplicate or spurious events: you’ll re-fire CourseCompleted every time anything else on the record is touched.
Design Consumers to Be Idempotent
EventBridge (and SQS) give at-least-once delivery. A consumer can occasionally see the same event twice. Because every event carries a unique id, consumers can de-duplicate, and downstream handlers should be safe to run twice, for example, “create a certificate for this enrolment if one doesn’t already exist” rather than blindly creating.
Mind the Loops
When events can trigger writes, and writes produce stream records, it’s possible to create a loop. The same happens when writing to S3 but also processing off S3 events. We avoid it by being precise about transitions and by filtering native service events tightly (for example, S3 notifications scoped to specific prefixes and suffixes, so derivative files don’t re-trigger a pipeline). We dug into loop prevention in the closed captions article.
Version From Day One
The moment a consumer depends on an event, its shape is a contract. Stamp a version on every event and bump it on changes (always backwards compatible changes), so consumers can evolve without a coordinated big-bang deploy.
Wrapping Up 📝
The key takeaways:
- Make the database the source of truth and derive events from it. CDC turns the impossible dual-write into a single, safe write plus a faithful echo.
- Treat events as contracts. Version them, validate them, and give them a consistent envelope — future-you and every consumer will thank you.
- Let routing be declarative. EventBridge rules keep dispatch logic out of your code and visible in infrastructure-as-code.
- Emit business facts, not CRUD.
CourseCompletedages far better thanRowUpdated. - Everything is an event. Your code, S3, and Transcribe all speak the same language on the bus — lean into it.
When events are first-class, async side effects stop being a tax on your critical path and become a clean, extensible surface you can build on for years.
Further Reading 📚
- Event-Driven Closed Captions with Transcribe and MediaConvert — reacting to AWS service events on the bus
- Automated Certificate Generation — a custom event consumer in full
- Integrating Polar Payments with Event-Driven Serverless — events from a payment provider
- Why Telegram Might Be Better Than Slack for Serverless Notifications — swapping a consumer without touching the producer
I hope you found this article useful. If you have any questions or feedback, feel free to reach out!
Ready to level up your AWS skills?
Visit sign-up today and join a community of builders and architects dedicated to mastering the cloud
