Super-charge Your SaaS & LLM Workflows

Part 1: Series Overview

Overview

In this series, I will show you how to leverage Apache Airflow to build robust, production-ready workflows for your SaaS, LLM-powered apps, data processes, and anything else you can dream up.

While there are excellent resources covering specific Airflow components like the official tutorials and Marc Lamberti's awesome guides, my series will focus on how to work with Airflow locally, in production, and with LLMs. I'll also share best practices and patterns I've seen work first-hand.

Skip to Part 2: Running Airflow Locally β†’

Why Use Airflow?

Working with Airflow forces you through a paradigm shift in how you think about your backend. It allows you to provide your customers with more value than you can with just a traditional request-response API. I hope to leave you wondering how you ever built without it once you're finished with this series.

Terminology

Let's go over some terminology first.

  • Workflow Orchestration - The automated coordination and management of complex tasks, processes, and workflows across different systems.
  • DAG (Directed Acyclic Graph) - A graph whose edges have directions and contains no cycles (paths that loop back to a starting node). In Airflow, your workflow is represented as a DAG. The inputs to a DAG may be customized using its run configuration.
  • Task - An atomic unit of work within a DAG. Tasks can have their own inputs and outputs. They're the building blocks of your workflows.
  • Rate Limiting - Restrictions placed on the frequency of API calls or operations, often by service providers to manage resource usage.
Airflow terminology

From Simple SaaS to Complex Workflows

You're likely familiar with the distinction between frontend and backend. The frontend (web client) makes a synchronous request to the backend (server) when the user needs something.

Simple enough. You've launched your sleek new SaaS, you have users rolling in, things are going well.

Comparison between traditional request-response vs Airflow workflows
Traditional request-response configuration

What happens when you need to handle operations that don't fit the request-response model? What does that even look like and why do you need anything more than a simple API?

Let's go through an example scenario to illustrate the need for a workflow orchestration tool.

Super-charged TODO App

Imagine your TODO app has reached 100 monthly active users.

Now you want to add AI-powered task prioritization as a premium feature.

You write a script to list all of a user's TODOs and the LLM ranks them by priority.

Here are three possible approaches to integrating this:

πŸ• Option 1: The Synchronous Approach

You integrate the script into your API server. You add a CTA (call to action) on the frontend that calls the API to process the TODO list. It works for small lists and for the first few users, but leads to timeout issues with large TODO lists and rate limiting from your LLM provider.

Users with many tasksβ€”your power usersβ€”start dropping off due to the poor experience.

Absent a workflow orchestration tool, your platform's ability to create value is limited by the maximum amount of time a synchronous request can take before users get frustrated and leave.

Traditional integration

βœ‹ Option 2: The Manual Script Approach

You run the script manually to process all users' TODO lists. This works to generate initial hype, but becomes unsustainable. You have better things to do, and new items aren't automatically processed. Lack of auditability will be a problem if you want to involve other developers in your project.

Traditional integration

Airflow logo Option 3: Asynchronous Workflow Approach

You write an Airflow DAG that neatly separates the workflow, is triggered by the API, provides real-time progress updates, and includes a way to notify users when results are ready. By tuning Airflow, its built-in queueing mechanism ensures that the workflow is processed in batches to avoid rate limits.

Airflow workflow integration

When integrated into your backend, your Airflow deployment will typically receive DAG invocation requests from your API server or, infrequently, by you using the Airflow web UI.

In most cases, your Airflow tasks should modify platform resources via requests to your backend API, but in some cases it may be justified that tasks read from and, less frequently, write directly to your database.

When Not to Use Airflow

While Airflow is powerful, it's not suitable for every use case. Avoid using Airflow when:

  • You need real-time responses (sub-second)
  • Your workflow duration varies significantly based on input size
  • You need continuous stream processing

Rule of thumb: if your workflow is guaranteed to complete in under 3 seconds, use the traditional request-response pattern. Otherwise, consider whether writing the workflow as an Airflow DAG is a good fit.

Ways To Run Airflow

You can and likely will run Airflow in at least two different ways if you decide to use it.

Locally

Ideal for when you're just getting started, experimenting with new Airflow concepts, building new workflows, or modifying existing workflows. Using local Airflow to write to production systems for one-off workflows may be tempting, but I discourage you from doing it especially if you've already launched and have users.

Cloud-based Deployment

Recommended for production purposes that require integration with the rest of your backend, for workflows that need to run on a schedule, or in response to any number of events within your platform. You'll want your CI/CD managing roll out of changes to your Airflow cloud deployment.

The subcategories of cloud-based deployment are self-managed and managed deployments.

Self-managed will be the focus of this series.

TypeCostMaintenanceInsight
Self-managedπŸ’ΈπŸ”§πŸ”§πŸ”§πŸŽ“πŸŽ“πŸŽ“
ManagedπŸ’°πŸ’°πŸ’°πŸ”§πŸŽ“

Self-Managed

Most cost-effective for small/mid-scale production use cases. You'll need to manage Airflow updates and DB migrations yourself if you want to use the latest and greatest version of Airflow. One of the upsides is that you'll learn more about Airflow and workflow orchestration along the way.

Managed Services

Popular options include AWS MWAA and Astronomer. You'll end up paying more out of pocket because updates and DB upgrades are managed for you. The advantage is that you'll get to focus on writing your DAGs instead of managing the infrastructure.

Series Roadmap

Enough chit chat. Let's get to the good stuff.

Part 2: Running Airflow Locally

Go to Part 2

Getting started with local Airflow development

  • Prerequisites
  • Quickstart
  • Code Review
Airflow local deployment diagram

Part 3: Working with DAGs

Go to Part 3

Running, fixing, and designing DAGs

  • Accessing the UI
  • Working with DAGs
  • Best Practices
DAG diagram

Part 4: Integrating LLMs

Part 4: Integrating LLMs Coming Soon

Integrating LLMs into your workflows

  • Requirements
  • Reliability and Fallbacks
  • Sample Workflow
LLM frameworks

Part 5: Robust Production Deployment

Part 5: Robust Production Deployment Coming Soon

Moving from localhost to a basic production deployment

  • Prerequisites
  • Configuration
  • Limitations
  • Logging, monitoring, and alerting
  • Integrating with CI/CD
Production cloud-based Airflow deployment

Part 6: Advanced LLM Integration

Part 6: Advanced LLM Integration Coming Soon

Building complex LLM workflows

  • Agents
  • Common Patterns
  • Autonomous Workflows
Advanced use cases

Ready to get started?

Go to Part 2 β†’

Need advice?

Ask a Question