A Practical Guide to Deploying Dagster Hybrid on GCP¶
In my previous post, "How I Built a Modern Data Orchestration Layer for Finfluencers.trade", I explained why I chose Dagster for my project — a platform that transforms financial content into structured, auditable data. The core challenge I wanted to solve was creating complete traceability and lineage for every piece of data, from raw podcast transcripts and article text to final predictions on the website.
This post dives into the how — a detailed, technical guide for setting up a Dagster Hybrid deployment on Google Cloud Platform (GCP).
Dagster offers several deployment options: Serverless (fully managed by Dagster Cloud), Self-Hosted (you manage everything), and Hybrid (Dagster manages the control plane, you manage the data plane). I'll focus on the Hybrid approach, which provides the best balance of security, control, and operational simplicity. If you're interested in why I selected Hybrid over the other options, see the first article in this series.
I use Docker with multi-stage builds, Poetry for dependency management, GCE Virtual Machines, and GitHub Actions workflows to create a robust CI/CD pipeline.
This is not just a theoretical overview; it's the exact production infrastructure running Finfluencers.trade today.
What is Dagster?¶
Before diving into deployment details, let me provide context for readers new to Dagster.
Dagster is a data orchestrator that helps you build, test, and monitor data pipelines. Think of it as a workflow management system designed for data engineering tasks.
Key Components:¶
-
Assets: These represent your data (files, database tables, ML models). Unlike traditional orchestrators that focus on tasks, Dagster is "asset-centric"—it tracks what data you have, how it was created, and what depends on it.
-
Jobs: Collections of assets that should be executed together. For example, a job might include downloading podcast audio, transcribing it, and extracting quotes.
-
Resources: Connections to external systems like databases, APIs, or cloud storage. These are defined once and reused across your assets.
-
Schedules & Sensors: Mechanisms to trigger jobs automatically—either on a time schedule or when certain conditions are met (like new files appearing in a bucket).
The Architecture Split:¶
Dagster separates into two main components:
- Control Plane: The web UI, job scheduler, metadata database, and orchestration logic
- Data Plane: Where your actual code runs and where your data lives
In the Hybrid model, Dagster Cloud manages the control plane (so you don't have to), while your code and data run in your own infrastructure.
Requirements for This Deployment¶
Here are the core requirements that shaped my architecture choice:
- Security First: Keep all credentials and data within my own GCP project, with proper service account isolation
- Reliable Deployments: Build once, deploy everywhere - the same container runs locally and in production
- Reproducible Dependency Management: Use Poetry for reproducible builds and dependency resolution
- Cost-Effectiveness: Leverage the Dagster Solo Plan ($10/month) with optimized GCE instances
- Comprehensive CI/CD: Reusable GitHub Actions workflows supporting both production and branch deployments
- Environment Isolation: Complete separation between local dev, staging, and production environments
The Architecture at a Glance¶
Here's how the production setup works:
- Dagster Cloud: Hosts the control plane (UI, scheduler, metadata). I use the Solo Plan with branch deployments enabled.
- My GCP Project:
- Compute Engine (GCE): Dynamic VMs (
e2-medium
) that are created/destroyed as needed for production and branch deployments - Artifact Registry: Stores your Dagster code images and modified agent images (with PostgreSQL support)
- Identity and Access Management (IAM): Dedicated service accounts with least-privilege permissions
- Data Resources: Separate BigQuery datasets and GCS buckets for production and staging
- Compute Engine (GCE): Dynamic VMs (
- Docker: Multi-stage builds for optimized images, with separate Dockerfiles for code and agents
- Poetry: Python dependency management with lock files for reproducible builds
- GitHub Actions: Reusable workflows that handle VM lifecycle, image building, and deployments
- Custom Agent: Standard Dagster agent with added PostgreSQL dependencies for compatibility
Architecture Overview¶
Let me break down the architecture into digestible diagrams that show how the components work together:
1. Development to Production Flow¶
This diagram shows the complete development workflow that ensures no code reaches production without proper validation. The key insight here is that staging testing is a mandatory gate before any code can be considered for production deployment.
flowchart TD
Dev["Local Development<br/>on staging branch<br/>dagster dev"] --> Commit["Commit to<br/>Staging Branch"]
Commit --> Deploy["Auto-Deploy to<br/>Staging Environment"]
Deploy --> Test["Test on<br/>Staging Data"]
Test --> PR["Create Pull Request<br/>staging → main"]
PR --> Review["Code Review<br/>& Approval"]
Review --> Merge["Merge to<br/>Main Branch"]
Merge --> Production["Auto-Deploy to<br/>Production Environment"]
Test -->|Issues Found| Dev
Review -->|Changes Requested| Dev
Why this workflow matters:
- Local testing with dagster dev
lets you catch issues early before any deployment
- Automatic staging deployment means every commit gets tested in a production-like environment
- Real staging data validates that your code works with actual data structures and volumes
- No shortcuts to production - you can't bypass staging validation even if you wanted to
- Fast feedback loops - issues found at any stage send you back to local development for quick fixes
2. GitHub Actions CI/CD Pipeline¶
This diagram illustrates the CI/CD automation that powers the deployment process. The powerhouse here is the reusable workflow pattern - one workflow handles both staging and production deployments with different parameters.
flowchart TD
Push["Code Push"] --> Check{Branch?}
Check -->|staging| BranchWorkflow["Staging Deployment<br/>Workflow"]
Check -->|main| ProdWorkflow["Production Deployment<br/>Workflow"]
BranchWorkflow --> Reusable["Reusable Workflow"]
ProdWorkflow --> Reusable
Reusable --> Build["Build Docker<br/>Images"]
Build --> Deploy["Deploy to GCP<br/>VM"]
Deploy --> Register["Register with<br/>Dagster Cloud"]
How the automation works: - Branch detection automatically determines whether this is a staging or production deployment - Reusable workflow eliminates code duplication - the same logic handles both environments with different inputs - Docker image building creates containers with your code, using Poetry for dependency management and multi-stage builds for efficiency - GCP VM deployment creates on-demand compute instances, deploys your containers, and configures the Dagster agent - Dagster Cloud registration connects your agent to the cloud control plane, enabling the web UI and job scheduling
3. GCP Infrastructure Components¶
This diagram shows the Google Cloud Platform architecture that provides the compute, storage, and container infrastructure. The design emphasizes environment isolation and resource efficiency.
flowchart TB
subgraph "Artifact Registry"
CodeImage["Dagster Code<br/>Images"]
AgentImage["Custom Agent<br/>Images (+PostgreSQL)"]
end
subgraph "Compute Engine"
ProdVM["Production VM<br/>(e2-medium)"]
StagingVM["Staging VM<br/>(e2-medium)"]
end
subgraph "Data Storage"
ProdData["Production<br/>BigQuery + GCS"]
StagingData["Staging<br/>BigQuery + GCS"]
end
CodeImage --> ProdVM
CodeImage --> StagingVM
AgentImage --> ProdVM
AgentImage --> StagingVM
ProdVM --> ProdData
StagingVM --> StagingData
Infrastructure design principles:
- Artifact Registry stores your Docker images centrally, enabling fast deployments and caching across environments
- On-demand VMs are created only when needed, keeping costs low (~$40/month total) compared to always-on infrastructure
- Environment isolation ensures staging and production data never mix - critical for data safety and testing accuracy
- Identical VM specs (e2-medium) between staging and production eliminate "works on staging but fails in prod" issues
- Custom agent images add PostgreSQL support while maintaining compatibility with Dagster Cloud's standard agents
4. Dagster Cloud Integration¶
This diagram illustrates the hybrid architecture - where Dagster Cloud provides the control plane (UI, scheduling, metadata) while your GCP infrastructure runs the actual workloads. This gives you the best of both worlds: managed services convenience with infrastructure control.
flowchart LR
subgraph "Your GCP Project"
Agent1["Production Agent<br/>(on GCE VM)"]
Agent2["Staging Agent<br/>(on GCE VM)"]
end
subgraph "Dagster Cloud"
UI["Web UI"]
Scheduler["Job Scheduler"]
Metadata["Asset Metadata"]
end
Agent1 -.->|secure connection| UI
Agent2 -.->|secure connection| UI
UI --> Scheduler
Scheduler --> Agent1
Scheduler --> Agent2
Agent1 --> Metadata
Agent2 --> Metadata
How the hybrid model works: - Dagster Cloud handles the complex parts: web UI, job scheduling, asset lineage, and metadata storage - no need to run databases or web servers - Your GCP agents run in your environment with your data, security policies, and network access - you maintain full control - Secure agent connections use API tokens, not VPNs or firewall rules - agents initiate outbound connections to Dagster Cloud - No vendor lock-in for data - your data never leaves your GCP project, only job metadata and logs are shared with Dagster Cloud - Environment parity - both staging and production agents connect to the same Dagster Cloud deployment but operate on isolated data resources
Step-by-Step Implementation Guide¶
1. Set Up Dagster Cloud (Solo Plan)¶
This step is crucial for the Solo plan setup, where the official documentation can be a bit confusing.
- Create One Deployment: In the Dagster Cloud UI, create only one deployment. Let's name it
prod
for clarity. - Enable Branch Deployments: In your deployment's settings, go to the "Branch Deployments" tab and enable it. This allows you to deploy code from feature branches to temporary staging environments.
- Generate Two API Tokens:
- Agent Token: For the GCE agents to connect to Dagster Cloud
- User API Token: For CLI operations in GitHub Actions workflows
2. Configure Your GCP Project¶
Set up a GCP environment with proper security and isolation:
- Enable APIs: Enable Compute Engine API, Artifact Registry API, and BigQuery API.
- Create Artifact Registries:
- One for your Dagster code images
- Same registry can store your custom agent images
- Create Service Accounts:
- GitHub Actions Service Account: For CI/CD operations
- Agent Runner Service Account: For the VMs running Dagster agents
- Grant appropriate roles with least-privilege principle
- Create Service Account Keys: Generate JSON keys for GitHub Actions and agent authentication
-
Create Environment-Specific Data Resources:
This is an important architectural decision: your data must be completely isolated between environments. Here's why and how:
Why Separate Data Resources? - Safety: Prevent development code from accidentally modifying production data - Testing: Test data transformations against realistic data without risk - Performance: Development workloads don't impact production queries - Compliance: Meet data governance requirements for production isolation
What to Create: - Production BigQuery dataset:
your_project_prod
(your real business data) - Staging BigQuery dataset:your_project_staging
(copy/subset of production data) - Production GCS bucket:your-project-data-prod
(live data files, processed assets) - Staging GCS bucket:your-project-data-staging
(test files, development data) - Additional data sources: If you use other services (Cloud SQL, Firestore, external APIs), create separate staging instances/endpoints for those as wellHow It Works: - Local development connects to staging resources by default - Branch deployments use staging data for safe testing - Production deployments only touch production data - Your Dagster resources automatically switch based on environment variables
Local Development Data Strategy: You have two options for local development: 1. Share staging data (my approach): Local dev uses the same staging resources as branch deployments. Works well for solo developers like me, but doesn't scale well for multiple developer teams where people might interfere with each other's work. 2. Separate local data: Create additional
your_project_local
datasets/buckets for each developer. Better for teams but requires more setup and maintenance.
3. Project Structure and Configuration Files¶
Before diving into specific files, let's understand the complete file structure and what each component does:
File Structure Overview¶
Your Dagster project will need several types of configuration files:
Core Dagster Files:
- definitions.py
- Main entry point that defines your assets, jobs, and resources
- workspace.yaml
- Tells Dagster where to find your code (used by dagster dev
)
- Your asset files in subdirectories - Contains your actual data pipeline logic
Dependency Management:
- pyproject.toml
- Defines project metadata and dependencies (Poetry format)
- poetry.lock
- Locks exact dependency versions for reproducible builds
Docker Configuration:
- Dockerfile
- Builds your Dagster code into a container
- agent.Dockerfile
- Customizes the Dagster agent with additional dependencies
Environment Configuration:
- env.example
- Template showing what environment variables you need
- .env
(local only) - Your actual environment variables for local development
CI/CD Pipeline:
- .github/workflows/docker-deploy-production.yml
- Deploys main branch to production
- .github/workflows/docker-deploy-branch.yml
- Deploys feature branches to staging
- .github/workflows/reusable-docker-gce-deploy.yml
- The complex reusable workflow
GitHub Configuration: - Repository Variables (in GitHub UI) - Non-sensitive configuration - Repository Secrets (in GitHub UI) - Sensitive credentials and tokens
Project File Tree¶
Here's what your complete project structure should look like:
your-dagster-project/
├── pyproject.toml # Poetry dependencies
├── poetry.lock # Locked dependency versions
├── Dockerfile # Your Dagster code container
├── agent.Dockerfile # Custom agent container
├── workspace.yaml # Local development config
├── definitions.py # Main Dagster entry point
├── env.example # Environment variable template
├── .env # Local environment variables (gitignored)
├── .github/
│ └── workflows/
│ ├── docker-deploy-production.yml # Production deployment
│ ├── docker-deploy-branch.yml # Branch deployment
│ └── reusable-docker-gce-deploy.yml # Reusable workflow (480 lines)
└── your_project/ # Your Dagster code
├── __init__.py
├── assets/ # Your data pipeline logic
│ ├── __init__.py
│ ├── data_ingestion.py
│ └── data_processing.py
└── resources/ # Environment-aware connections
├── __init__.py
└── gcp_resources.py
Now let's walk through each file and understand what it does:
pyproject.toml
¶
This file defines your project metadata and all dependencies. Poetry uses this to create reproducible builds:
[tool.poetry]
name = "your-dagster-project"
version = "0.1.0"
description = "Production Dagster project with sophisticated CI/CD"
authors = ["Your Name <you@example.com>"]
license = "Apache-2.0"
readme = "README.md"
[tool.poetry.dependencies]
python = ">=3.9,<3.13"
dagster = "1.10.19"
dagster-webserver = "1.10.19"
dagster-gcp = "^0.26.0"
dagster-cloud = "1.10.19"
google-cloud-bigquery = "^3.0"
google-cloud-storage = "^2.0"
pandas = "^2.0"
python-dotenv = "^1.0.0"
feedparser = "^6.0.11"
requests = "^2.32.3"
dagster-postgres = "^0.26.0"
psycopg2-binary = "^2.9.9"
[tool.poetry.group.dev.dependencies]
pytest = "^7.0"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
Dockerfile
(Multi-stage build)¶
This creates a Docker container with your Dagster code. The multi-stage approach builds dependencies in one stage and copies only what's needed to the final image, making it smaller and more secure:
# Multi-stage build for better caching and smaller final image
FROM python:3.12-slim as builder
# Install system dependencies in one layer
RUN apt-get update && apt-get install -y \
gcc \
g++ \
&& rm -rf /var/lib/apt/lists/*
# Install poetry
RUN pip install --no-cache-dir poetry==1.8.3
# Set working directory
WORKDIR /opt/dagster/app
# Copy dependency files first (better caching)
COPY pyproject.toml poetry.lock ./
# Configure poetry and install dependencies in one layer
RUN poetry config virtualenvs.create false \
&& poetry install --only=main --no-interaction --no-ansi --no-cache
# Final stage
FROM python:3.12-slim
# Install minimal runtime dependencies
RUN apt-get update && apt-get install -y \
&& rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /opt/dagster/app
# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
# Copy the entire project
COPY . .
# Set PYTHONPATH to include the project directory
ENV PYTHONPATH="${PYTHONPATH}:/opt/dagster/app"
# No CMD - let the agent specify the command
agent.Dockerfile
(Agent with PostgreSQL support)¶
This customizes the standard Dagster Cloud agent to include additional dependencies your project needs. In this case, adding PostgreSQL support:
FROM dagster/dagster-cloud-agent:1.10.18
# Install dagster-postgres and its psycopg2 dependency
# Using psycopg2-binary to avoid needing to install build dependencies for psycopg2
RUN pip install dagster-postgres psycopg2-binary
workspace.yaml
¶
This simple file tells Dagster where to find your code when running dagster dev
locally:
4. Resource Configuration¶
Resources in Dagster represent connections to external systems (databases, cloud storage, APIs). The key is making them environment-aware so the same code works in local, staging, and production environments with different data sources.
Create a modular resource system that handles different environments and credential management:
your_project/resources/gcp_resources.py
¶
This file is to define environment-aware GCP connections. Here's what's happening:
Key Concepts:
- Resource Classes: AppBigQueryResource
and AppGCSResource
wrap the Google Cloud clients
- Credential Handling: Tries JSON keys from environment variables first, falls back to default authentication
- Lazy Loading: Clients are created only when needed (see the @property
decorators)
- Resource Factories: The @resource
decorated functions create configured instances for Dagster
Why This Matters: - Same code works locally (uses your personal credentials) and in production (uses service account keys) - Environment variables control which BigQuery dataset and GCS bucket to use - No hardcoded credentials anywhere in the code
import os
import json
from dagster import resource, Field, String
from google.cloud import bigquery, storage
from google.oauth2 import service_account
class AppBigQueryResource:
def __init__(self, project_id: str, dataset_id: str, location: str = "US"):
self.project_id = project_id
self.dataset_id = dataset_id
self.location = location
self._client = None
self._credentials = None
# Try to load credentials from JSON string if provided
creds_json_str = os.getenv("GCP_SERVICE_ACCOUNT_JSON_KEY")
if creds_json_str:
try:
creds_info = json.loads(creds_json_str)
self._credentials = service_account.Credentials.from_service_account_info(creds_info)
except json.JSONDecodeError as e:
print(f"Error decoding GCP_SERVICE_ACCOUNT_JSON_KEY: {e}")
except Exception as e:
print(f"Error loading credentials from GCP_SERVICE_ACCOUNT_JSON_KEY: {e}")
@property
def client(self):
if self._client is None:
if self._credentials:
self._client = bigquery.Client(project=self.project_id, credentials=self._credentials)
else:
# Fallback to default ADC (e.g., GOOGLE_APPLICATION_CREDENTIALS file path)
self._client = bigquery.Client(project=self.project_id)
return self._client
def execute_query(self, sql: str, job_config=None):
return self.client.query(sql, job_config=job_config).result()
class AppGCSResource:
def __init__(self, project_id: str, gcs_bucket: str):
self.project_id = project_id
self.gcs_bucket = gcs_bucket
self._client = None
self._credentials = None
# Try to load credentials from JSON string if provided
creds_json_str = os.getenv("GCP_SERVICE_ACCOUNT_JSON_KEY")
if creds_json_str:
try:
creds_info = json.loads(creds_json_str)
self._credentials = service_account.Credentials.from_service_account_info(creds_info)
except json.JSONDecodeError as e:
print(f"Error decoding GCP_SERVICE_ACCOUNT_JSON_KEY: {e}")
except Exception as e:
print(f"Error loading credentials from GCP_SERVICE_ACCOUNT_JSON_KEY: {e}")
@property
def client(self):
if self._client is None:
if self._credentials:
self._client = storage.Client(project=self.project_id, credentials=self._credentials)
else:
# Fallback to default ADC
self._client = storage.Client(project=self.project_id)
return self._client
@property
def bucket_name(self) -> str:
return self.gcs_bucket
@resource(
config_schema={
"project_id": Field(String, description="GCP Project ID"),
"dataset_id": Field(String, description="BigQuery Dataset ID"),
"location": Field(String, default_value="US", description="BigQuery location")
}
)
def app_bq_resource(init_context):
"""BigQuery resource with project and dataset configuration."""
project_id = init_context.resource_config["project_id"]
dataset_id = init_context.resource_config["dataset_id"]
location = init_context.resource_config["location"]
return AppBigQueryResource(project_id, dataset_id, location)
@resource(
config_schema={
"project_id": Field(String, description="GCP Project ID"),
"gcs_bucket": Field(String, description="GCS Bucket Name")
}
)
def app_gcs_resource(init_context):
"""GCS resource with project and bucket configuration."""
project_id = init_context.resource_config["project_id"]
gcs_bucket = init_context.resource_config["gcs_bucket"]
return AppGCSResource(project_id, gcs_bucket)
Environment Isolation Strategy¶
This resource configuration enables an environment isolation strategy:
The environment handling ensures that: - Local development uses staging resources - Safe to experiment without affecting production - Branch deployments get isolated staging environments - Test new features against realistic data - Production deployments use production resources - Only live deployments touch real business data - Complete data isolation - No risk of development work corrupting production datasets - All credential handling is secure and environment-appropriate
This data isolation strategy means you can confidently test data transformations, schema changes, and new features without any risk to your production data assets. The same Dagster code automatically connects to different data sources based on environment variables, providing both safety and operational simplicity.
definitions.py
(Environment-aware configuration)¶
This is the main entry point for your Dagster project. It brings together your assets, jobs, and resources, and handles the environment switching logic:
import os
from dotenv import load_dotenv
from dagster import Definitions, load_assets_from_modules, define_asset_job, AssetSelection
from dagster.core.storage.fs_io_manager import fs_io_manager
from dagster_gcp.gcs import GCSResource
import logging
# Load environment variables from .env file
load_dotenv()
# Get a logger instance
logger = logging.getLogger(__name__)
# Determine the current environment: 'local', 'staging', 'production'
DAGSTER_ENVIRONMENT = os.getenv("DAGSTER_ENVIRONMENT", "local").lower()
logger.info(f"DAGSTER_ENVIRONMENT set to: {DAGSTER_ENVIRONMENT}")
# Get environment variables with validation
GCP_PROJECT_ID = os.getenv("GCP_PROJECT_ID")
BIGQUERY_DATASET_ID = os.getenv("BIGQUERY_DATASET_ID")
GCS_BUCKET_NAME = os.getenv("GCS_BUCKET_NAME")
GCP_BQ_LOCATION = os.getenv("GCP_BQ_LOCATION", "europe-north1")
# Validation for non-local environments
if DAGSTER_ENVIRONMENT != "local":
required = {
"GCP_PROJECT_ID": GCP_PROJECT_ID,
"BIGQUERY_DATASET_ID": BIGQUERY_DATASET_ID,
"GCS_BUCKET_NAME": GCS_BUCKET_NAME,
}
missing = [k for k, v in required.items() if not v]
if missing:
raise ValueError(f"Missing required env vars for {DAGSTER_ENVIRONMENT}: {', '.join(missing)}")
# Import your assets and resources
from your_project.assets import migration_phase1, admin_tasks, content_source_management
from your_project.resources import app_bq_resource, app_gcs_resource
# Load all assets
all_asset_modules = [migration_phase1, admin_tasks, content_source_management]
all_loaded_assets = load_assets_from_modules(all_asset_modules)
# Define jobs
rss_pipeline_job = define_asset_job(
name="rss_pipeline",
description="Complete RSS pipeline with asset dependencies",
selection=AssetSelection.keys(
["content_sources", "setup", "new_from_rss"],
["content_sources", "raw", "rss_metadata_gcs"],
["content_sources", "core", "details_bq"],
["content_sources", "core", "image_bq_gcs"]
)
)
# Configure IO manager
io_manager_def = fs_io_manager.configured({
"base_dir": os.path.join(os.getenv("DAGSTER_HOME", "."), "storage")
})
# Define the Dagster repository with sophisticated resource configuration
defs = Definitions(
assets=all_loaded_assets,
jobs=[rss_pipeline_job],
resources={
"bigquery_resource": app_bq_resource.configured({
"project_id": GCP_PROJECT_ID,
"dataset_id": BIGQUERY_DATASET_ID,
"location": GCP_BQ_LOCATION
}),
"gcs_resource": app_gcs_resource.configured({
"project_id": GCP_PROJECT_ID,
"gcs_bucket": GCS_BUCKET_NAME
}),
"gcs": GCSResource(project=GCP_PROJECT_ID),
"io_manager": io_manager_def
}
)
5. GitHub Actions Workflows¶
Create a CI/CD system with reusable workflows:
The GitHub Actions workflows need configuration to know where to deploy and how to connect. GitHub provides two ways to store this information:
Repository Variables - Non-sensitive configuration that workflows need:
- Where: GitHub repository → Settings → Secrets and variables → Actions → Variables tab
- Why: These control deployment behavior but aren't secret (VM sizes, region names, etc.)
- Usage: Referenced in workflows as ${{ vars.VARIABLE_NAME }}
Repository Variables (set in GitHub)¶
DAGSTER_CLOUD_URL=https://your-org.dagster.cloud # Your Dagster Cloud instance
GCP_PROJECT_ID=your-gcp-project-id # Where to create VMs and store images
REGION=europe-north1 # GCP region for all resources
REGISTRY_NAME=gcf-artifacts # Artifact Registry name
DOCKER_IMAGE_NAME_BASE=your_dagster_project # Base name for Docker images
LOCATION_NAME=your_location # Dagster code location name
BASE_DEPLOYMENT_NAME=prod # Production deployment name
ORGANIZATION_NAME=your-org # Your Dagster Cloud org
AGENT_VM_MACHINE_TYPE=e2-medium # VM size (controls cost/performance)
AGENT_VM_IMAGE_FAMILY=cos-stable # Google's OS images
AGENT_VM_IMAGE_PROJECT=cos-cloud # Google's OS images
AGENT_VM_DISK_SIZE=10GB # VM disk size
AGENT_VM_SCOPES=https://www.googleapis.com/auth/cloud-platform # VM permissions
AGENT_VM_SA_EMAIL=your-agent-sa@your-project.iam.gserviceaccount.com # Service account for VMs
Repository Secrets - Sensitive credentials that must be protected:
- Where: GitHub repository → Settings → Secrets and variables → Actions → Secrets tab
- Why: These contain API tokens and credentials that must never be exposed in logs
- Usage: Referenced in workflows as ${{ secrets.SECRET_NAME }}
Repository Secrets (set in GitHub)¶
DAGSTER_CLOUD_AGENT_API_TOKEN=your-agent-token # For agents to connect to Dagster Cloud
DAGSTER_CLOUD_USER_API_TOKEN=your-user-token # For CLI operations in workflows
GCP_ACTIONS_SA_KEY=your-github-actions-service-account-key # GitHub Actions GCP authentication
GCP_SERVICE_ACCOUNT_JSON_KEY=your-agent-service-account-key # Agent VM GCP authentication
BIGQUERY_DATASET_ID_PROD=your_prod_dataset # Production BigQuery dataset
BIGQUERY_DATASET_ID_STAGING=your_staging_dataset # Staging BigQuery dataset
GCS_BUCKET_NAME_PROD=your-prod-bucket # Production GCS bucket
GCS_BUCKET_NAME_STAGING=your-staging-bucket # Staging GCS bucket
Workflow Structure and Triggers¶
Before looking at the specific files, understand how the workflow system is organized:
Trigger Logic:
- Production workflow: Runs when you push to main
branch or manually trigger it
- Branch workflow: Runs when you push to any branch except main
- Reusable workflow: Contains the actual deployment logic, called by both production and branch workflows
Why This Structure? - DRY Principle: The complex deployment logic is written once in the reusable workflow - Different Parameters: Production and branch workflows pass different configuration to the same reusable logic - Maintainability: Changes to deployment logic only need to be made in one place
What Each File Does:
- Production workflow: Simple caller that says "deploy main branch to production"
- Branch workflow: Simple caller that says "deploy this branch to staging"
- Reusable workflow: 480 lines of logic for VM management, Docker builds, and Dagster deployment
.github/workflows/docker-deploy-production.yml
¶
This workflow is triggered by pushes to main
branch and handles production deployments. It's a thin wrapper that calls the reusable workflow with production-specific parameters:
name: Docker Production Deployment
on:
workflow_dispatch:
inputs:
force_deploy:
description: 'Force deployment even if no changes'
required: false
type: boolean
default: false
push:
branches:
- 'main'
jobs:
call_reusable_production_deployment:
uses: ./.github/workflows/reusable-docker-gce-deploy.yml
permissions:
contents: read
id-token: write
with:
run_identifier: "latest"
is_production_deployment: true
explicit_gce_vm_name: dagster-agent-${{ vars.BASE_DEPLOYMENT_NAME }}
dagster_cloud_deployment_target: ${{ vars.BASE_DEPLOYMENT_NAME }}
dagster_code_location_name: ${{ vars.LOCATION_NAME }}
dagster_code_python_file: 'definitions.py'
dagster_code_env_label: 'production'
gce_startup_metadata_branch_deployments_flag: false
dagster_base_deployment_name_for_agent_metadata: ${{ vars.BASE_DEPLOYMENT_NAME }}
gce_vm_label_key: 'dagster-agent-environment'
gce_vm_label_value: ${{ vars.BASE_DEPLOYMENT_NAME }}
docker_build_cache_flags: 'type=gha,ref=refs/heads/main'
agent_docker_build_cache_flags: 'type=gha,ref=refs/heads/main'
secrets:
DAGSTER_CLOUD_AGENT_API_TOKEN: ${{ secrets.DAGSTER_CLOUD_AGENT_API_TOKEN }}
DAGSTER_CLOUD_CLI_API_TOKEN: ${{ secrets.DAGSTER_CLOUD_USER_API_TOKEN }}
GCP_ACTIONS_SA_KEY_CONTENT: ${{ secrets.GCP_ACTIONS_SA_KEY }}
GCP_SERVICE_ACCOUNT_JSON_KEY_CONTENT: ${{ secrets.GCP_SERVICE_ACCOUNT_JSON_KEY }}
REPO_SECRET_BIGQUERY_DATASET_ID: ${{ secrets.BIGQUERY_DATASET_ID_PROD }}
REPO_SECRET_USER_MEDIA_GCS_BUCKET: ${{ secrets.GCS_BUCKET_NAME_PROD }}
.github/workflows/docker-deploy-branch.yml
¶
This workflow is triggered by pushes to any branch except main
and handles staging deployments. Notice how it uses the branch name to create isolated staging environments:
name: Docker Branch Deployment
on:
push:
branches-ignore:
- 'main'
jobs:
call_reusable_branch_deployment:
uses: ./.github/workflows/reusable-docker-gce-deploy.yml
permissions:
contents: read
id-token: write
with:
run_identifier: ${{ github.ref_name }}
is_production_deployment: false
dagster_cloud_deployment_target: ${{ vars.BASE_DEPLOYMENT_NAME }}_${{ github.ref_name }}
dagster_code_location_name: ${{ vars.LOCATION_NAME }}
dagster_code_python_file: 'definitions.py'
dagster_code_env_label: 'staging'
gce_startup_metadata_branch_deployments_flag: true
gce_startup_metadata_branch_deployment_name: ${{ vars.BASE_DEPLOYMENT_NAME }}_${{ github.ref_name }}
gce_vm_label_key: 'dagster-agent-environment'
gce_vm_label_value: 'branch-${{ github.ref_name }}'
docker_build_cache_flags: 'type=gha,ref=refs/heads/${{ github.ref_name }}'
secrets:
DAGSTER_CLOUD_AGENT_API_TOKEN: ${{ secrets.DAGSTER_CLOUD_AGENT_API_TOKEN }}
DAGSTER_CLOUD_CLI_API_TOKEN: ${{ secrets.DAGSTER_CLOUD_USER_API_TOKEN }}
GCP_ACTIONS_SA_KEY_CONTENT: ${{ secrets.GCP_ACTIONS_SA_KEY }}
GCP_SERVICE_ACCOUNT_JSON_KEY_CONTENT: ${{ secrets.GCP_SERVICE_ACCOUNT_JSON_KEY }}
REPO_SECRET_BIGQUERY_DATASET_ID: ${{ secrets.BIGQUERY_DATASET_ID_STAGING }}
REPO_SECRET_USER_MEDIA_GCS_BUCKET: ${{ secrets.GCS_BUCKET_NAME_STAGING }}
.github/workflows/reusable-docker-gce-deploy.yml
¶
This is where the real work happens - a 480-line workflow that both production and branch deployments call. It accepts many parameters to customize behavior and handles:
- Poetry dependency management and caching
- Multi-stage Docker builds for both code and agent images
- Dynamic VM creation and lifecycle management
- Proper credential injection and environment configuration
- Dagster Cloud integration with location management
- Comprehensive error handling and logging
Key Differences in How They Call the Reusable Workflow¶
Production Workflow Parameters:
with:
run_identifier: "latest" # Uses "latest" tag
is_production_deployment: true # Production flag
explicit_gce_vm_name: dagster-agent-prod # Fixed VM name
dagster_cloud_deployment_target: prod # Production deployment
dagster_code_env_label: 'production' # Sets DAGSTER_ENVIRONMENT=production
Branch Workflow Parameters:
with:
run_identifier: ${{ github.ref_name }} # Uses branch name as tag
is_production_deployment: false # Staging flag
dagster_cloud_deployment_target: prod_feature-x # Branch-specific deployment
dagster_code_env_label: 'staging' # Sets DAGSTER_ENVIRONMENT=staging
These different parameters make the reusable workflow create different VMs, use different Docker tags, and connect to different data resources.
6. Local Development Workflow¶
For local development, you'll need to set up your environment variables. Create a .env
file based on the following env.example
template:
env.example
¶
# GCP Configuration
GCP_PROJECT_ID=your-gcp-project-id
REGION=europe-north1
REGISTRY_NAME=gcf-artifacts
# Dagster Cloud
DAGSTER_CLOUD_URL=https://your-org.dagster.cloud
# Local development (points to staging data assets)
DAGSTER_ENVIRONMENT=local
BIGQUERY_DATASET_ID=your_staging_bigquery_dataset
GCS_BUCKET_NAME=your_staging_user_media_bucket
GCP_BQ_LOCATION=EU
# Docker Agent Configuration
DEPLOYMENT_NAME=prod
- Copy the template:
cp env.example .env
- Fill in your actual values - Replace the placeholder values with your real GCP project details
- Set
DAGSTER_ENVIRONMENT=local
- This connects to staging resources by default - Point to staging data assets - Ensures you're testing against real infrastructure without affecting production
- Use
poetry install
for dependency management - Use
dagster dev
for local development server
Performance Metrics:¶
- Deployment Time: ~7:45 minutes for branch deployments, ~6:00 minutes for production (including VM creation, image building, and agent startup)
- Build Cache Efficiency: Poetry and Docker layer caching reduces subsequent builds to ~3-4 minutes
- VM Startup: Container-Optimized OS with pre-installed Docker starts agents in under 60 seconds
- Real-world Production Numbers: These are actual timings from Finfluencers.trade deployments, not estimates
Note: These performance metrics are based on e2-medium
instances (2 vCPUs, 4GB RAM). Larger instances like e2-standard-4
or e2-highmem-4
will significantly reduce deployment times, especially for Docker builds and Poetry dependency installation, but will increase operational costs proportionally.
Cost Estimation:¶
- Monthly Cost: ~$40 total
- $10 for Dagster Solo Plan
- ~$30 for dynamic
e2-medium
VMs (cost scales with usage) - Resource Efficiency: VMs are created on-demand and can be auto-stopped, reducing idle costs
- Storage: Artifact Registry costs ~$2-3/month for image storage