Data Engineering

Hands-on DataOps with Databricks, Terraform & GitHub Actions

Chandan Kumar

Founder, beCloudReady

July 8, 20253 min read

Hands-on DataOps with Databricks, Terraform & GitHub Actions

A step-by-step guide to automating Databricks deployments using Infrastructure-as-Code — Terraform modules, Spark jobs, and GitHub Actions CI/CD.

A Step-by-Step Guide to Automating Databricks Deployments Using Infrastructure-as-Code

Why DataOps + DevOps for Databricks?

As teams scale their cloud-native data platforms, automation and reproducibility become essential. Manual provisioning and notebook execution just don't cut it anymore.

That's where Infrastructure as Code (IaC) and CI/CD come in.

In this post, we walk through a real-world automation pipeline that:

Provisions Azure Databricks using Terraform
Manages ETL notebooks and jobs
Automates scheduling using GitHub Actions

Whether you're just getting started or already running Spark jobs in production, this guide helps you think like a platform engineer while working with data tools.

Architecture Overview

Key Components

Terraform modules for reusable infrastructure
Azure (Databricks, Resource Groups, VNets)
GitHub Actions for automation
Databricks Jobs API for orchestration
Fivetran (optional for ingestion)

Modular Terraform Setup for Azure Databricks

We created two major layers:

1. `infra/`: Core Infrastructure

Includes:

Resource Group
Virtual Network
Azure Databricks Workspace
Network Security Groups

module "databricks_workspace" {
  source                      = "../../../modules/databricks_workspace"
  workspace_name              = "${local.prefix}-workspace"
  resource_group_name         = var.resource_group_name
  region                      = var.region
  managed_resource_group_name = "${local.prefix}-managed-rg"
  vnet_id                     = module.network.vnet_id
}

2. `apps/`: Jobs, Notebooks, and Workflows

We created a Spark job and uploaded it as a Databricks notebook:

resource "databricks_notebook" "nightly_job_notebook" {
  path           = "/Shared/nightly_task"
  language       = "PYTHON"
  content_base64 = base64encode(file(var.notebook_file_path))
}

Job Definition

resource "databricks_job" "nightly_serverless_job" {
  name = "Nightly Python Job - Serverless"
 
  notebook_task {
    notebook_path = databricks_notebook.nightly_job_notebook.path
  }
 
  schedule {
    quartz_cron_expression = "0 0 * * * ?"
    timezone_id            = "UTC"
  }
 
  job_cluster {
    job_cluster_key = "serverless_cluster"
 
    new_cluster {
      spark_version  = "13.3.x-scala2.12"
      runtime_engine = "PHOTON"
      num_workers    = 1
    }
  }
}

GitHub Actions CI/CD for Terraform

name: Deploy Databricks Infra
 
on:
  push:
    paths:
      - 'apps/**'
      - 'infra/**'
  workflow_dispatch:
 
jobs:
  deploy:
    runs-on: ubuntu-latest
 
    steps:
      - uses: actions/checkout@v3
 
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
 
      - name: Terraform Init
        run: terraform init
 
      - name: Terraform Apply
        run: terraform apply -auto-approve

This enables automatic or manual deployments on infra/app changes.

Testing with Databricks Community Edition

Create a free Databricks Community Edition account
Run jobs and notebooks without Azure billing
Sync code using GitHub or databricks-cli

What You'll Walk Away With

Deploy Azure Databricks workspaces using Terraform
Structure infra and application layers cleanly
Manage Spark jobs and workflows as code
Automate everything using GitHub Actions

What's Next?

Repository: azure-databricks-terraform on GitHub

Upcoming Topics

Secure secret management (Key Vault + Databricks secrets)
Advanced CI/CD pipelines
Integrating Fivetran, dbt, and Unity Catalog
Multi-environment (dev/staging/prod) strategies

DevOpsDatabricksTerraformGitHub ActionsAzure

Hands-on DataOps with Databricks, Terraform & GitHub Actions

A Step-by-Step Guide to Automating Databricks Deployments Using Infrastructure-as-Code

Why DataOps + DevOps for Databricks?

Architecture Overview

Key Components

Modular Terraform Setup for Azure Databricks

1. `infra/`: Core Infrastructure

2. `apps/`: Jobs, Notebooks, and Workflows

Job Definition

GitHub Actions CI/CD for Terraform

Testing with Databricks Community Edition

What You'll Walk Away With

What's Next?

Upcoming Topics

More from the blog

Amazon Redshift Distribution vs Partitioning: Why Redshift Has No PARTITION BY (and What to Use Instead)

The Evolving Database Landscape: Market Share, On-Prem vs Cloud, and the Need for Upskilling

A Step-by-Step Guide to Automating Databricks Deployments Using Infrastructure-as-Code

Why DataOps + DevOps for Databricks?

Architecture Overview

Key Components

Modular Terraform Setup for Azure Databricks

1. infra/: Core Infrastructure

2. apps/: Jobs, Notebooks, and Workflows

Job Definition

GitHub Actions CI/CD for Terraform

Testing with Databricks Community Edition

What You'll Walk Away With

What's Next?

Upcoming Topics

More from the blog

Amazon Redshift Distribution vs Partitioning: Why Redshift Has No PARTITION BY (and What to Use Instead)

The Evolving Database Landscape: Market Share, On-Prem vs Cloud, and the Need for Upskilling

1. `infra/`: Core Infrastructure

2. `apps/`: Jobs, Notebooks, and Workflows