all_articles()
Node.jsRedisBullMQBackendMicroservices
May 14, 20258 min read

Taming Background Jobs: BullMQ + Redis in a Node.js Microservice

How I used BullMQ and Redis to handle async task processing, recover from failures, and improve throughput by 25% under peak load — with practical patterns you can apply today.

The Problem with Synchronous Processing

When our e-commerce backend started handling payment webhooks, email notifications, and inventory sync all inside the HTTP request lifecycle, we hit a wall. Response times climbed, retries failed silently, and a single slow task could cascade.

The answer was asynchronous job queues — specifically BullMQ on top of Redis.

Why BullMQ?

BullMQ is the spiritual successor to Bull, redesigned from the ground up with TypeScript support and a cleaner API. It gives you:

  • Named queues — separate queues for payments, emails, inventory
  • Concurrency control — process N jobs in parallel per worker
  • Retry strategies — exponential backoff out of the box
  • Job prioritization — critical jobs jump the line
  • Observability — Bull Board gives you a real-time UI

Setting Up a Queue

import { Queue, Worker } from 'bullmq'

import { redisConnection } from './redis'

const emailQueue = new Queue('email-notifications', { connection: redisConnection })

// Producer — add a job

await emailQueue.add('order-confirmed', {

to: user.email,

orderId: order.id,

}, {

attempts: 3,

backoff: { type: 'exponential', delay: 2000 },

removeOnComplete: 100,

})

// Consumer — process jobs

const worker = new Worker('email-notifications', async (job) => {

await sendOrderConfirmationEmail(job.data)

}, {

connection: redisConnection,

concurrency: 5,

})

worker.on('failed', (job, err) => {

logger.error(Job ${job?.id} failed: ${err.message})

})

The 25% Throughput Improvement

The key insight: we stopped blocking the HTTP thread. Payment webhooks now enqueue a job and return 200 OK in milliseconds. The worker picks it up, retries if the downstream service is slow, and updates the DB when done.

Under peak load (3× normal traffic), our p99 response time dropped from ~1200ms to under 200ms.

Lessons Learned

  • Dead-letter queues matter — always configure a failed-job handler and alert on it
  • Job deduplication — use jobId to prevent duplicate webhook processing
  • Graceful shutdown — call worker.close() on SIGTERM to finish in-flight jobs
  • Redis memory — set removeOnComplete and removeOnFail limits or your Redis fills up

BullMQ turned our fragile synchronous pipeline into a resilient, observable async system. If your Node.js services are processing anything non-trivial inside HTTP handlers, it's time to queue it.