Attention
This library is used for a few apps in production, but it is still early in development. Like the idea of it? Please star us on GitHub and contribute via the issues board and roadmap.
Django GCP¶
django-gcp
is a library of tools to help you deploy and use django on Google Cloud Platform.
Helpers are provided for:
Cloud Tasks and
Aims¶
The ultimate goals are to:
Allow serverless django (for actual fully-fledged apps, not toybox tutorials).
Enable event-based integration between django and various GCP services.
Simplify the use of GCP resources in django including Storage, Logging, Erorr Reporting, Run, PubSub, Tasks and Scheduler.
Tip
For example, if we have both a Store and a PubSub subscription to events on that store, we can do smart things in django when files or their metadata change.
Background¶
To run a “reasonably comprehensive” django server on GCP, we have been using 4-5 libraries. Each covers a little bit of functionality, and we put in a lot of time to:
engage maintainers -> fork -> patch -> PR -> wait -> wait more -> release (maybe) -> update dependencies
Lots of the maintainers of those libraries have given up or are snowed under, which we have a lot of compassion for. Some, like django-storages, are (admirably) maintaining a uniform API across many compute providers, whereas we don’t change providers often enough to need that, so would rather have the flexibility to do platform-specific things.
We’ll be using GCP for the foreseeable future, so can accept a platform-specific API in order to use latest GCP features and best practices.
Contents¶
Getting Started¶
Tip
A complete example of a working server with django-gcp is provided in the tests folder of the source code.
Install the library¶
django-gcp is available on pypi, so installation into your python virtual environment is dead simple:
poetry add django-gcp
Not using poetry? It’s highly opinionated, but it’s your friend. Google it.
Install the django app¶
Next, you’ll need to install this as an app in your django settings:
INSTALLED_APPS = [
# ...
'django_gcp'
# ...
]
Add the endpoints¶
Tip
If you’re only using storage, and not events or tasks, you can skip this step.
Include the django-gcp URLS in your your_app/urls.py
:
from django.urls import include, re_path
from django_gcp import urls as django_gcp_urls
urlpatterns = [
# ...other routes
# Use whatever regex you want:
re_path(r"^django-gcp/", include(django_gcp_urls))
]
Using python manage.py show_urls
you can now see the endpoints for both events and tasks appear in your app.
Authentication¶
There are two aspects to authentication with django-gcp: authenticating the server to interact with GCP, and authenticating incoming webhooks or messages from PubSub.
Authenticating the Server¶
Authenticating the server requires Service Account Credentials or Application Default Credentials.
Attention
At the time of writing, Google’s process for managing authentication in their SDKs is somewhat intractable, with difficult to navigate guidance and varying practices implemented and recommended across the platform. It is very easy to leak credentials as a result, so please take care.
However, there are some promising developments currently happening (like Workload Identify Federation and Service Account Impersonation) so we hope that soon it’ll be much easier to have a single workflow for this. In the meantime it’s worth following this guy.
A major issue in particular for storage is that we need the ability to sign files in GCS, which either requires a dedicated service account with the full key available (no longer recommended by google), or requires additional calls to a google-hosted API, significantly slowing any interaction requiring signed URLs.
If you’re not using media storage - only tasks/events and static (public) storage - this should not be an issue and you can use service account impersonation, federation or ADCs as appropriate.
Create a service account¶
In most cases, the default service accounts are not sufficient to read/write and sign files in GCS, so you will need to create a dedicated service account:
Create a service account. (Google Getting Started Guide)
Make sure your service account has access to the bucket and appropriate permissions. (Using IAM Permissions)
On GCP infrastructure¶
This library will attempt to read the credentials provided when running on google cloud infrastructure.
Ensure your service account is being used by the deployed GCR / GKE / GCE instance.
Warning
Default Google Compute Engine (GCE) Service accounts are unable to sign urls.
On GitHub Actions¶
You may need to use the library on infrastructure external to Google like Github Actions - for example running collectstatic
within a GitHub Actions release flow.
You’ll want to avoid injecting a service account json file into your github actions if possible, so you should consider Workload Identity Federation which is made pretty easy by these glorious github actions.
Locally¶
We’re working on using service account impersonation, but it’s not fully available for all the SDKs yet, still a lot of teething problems (like this one, solved 6 days ago at the time of writing).
So you should totally try that (please submit a PR here to show the process if you get it to work!!). In the meantime…
Create the key and download your-project-XXXXX.json file.
Danger
It’s best not to store this in your project, to prevent accidentally committing it or building it into a docker image layer. Instead, bind monut it into docker images and devcontainers from somewhere else on your local system.
If you must keep within your project, it’s good practice to name the file gha-greds-<whatever>.json
and make sure that gha-creds-*
is in your .gitignore
and .dockerignore
files.
If you’re developing in a container (like a VSCode
.devcontainer
), mount the file into the container. You can make gcloud available too - check out this tutorial.Set an environment variable of GOOGLE_APPLICATION_CREDENTIALS to the path of the json file.
Authenticating Webhooks and PubSub messages¶
Warning
We are yet to add the ability to accept _JWT-authenticated push subscriptions from PubSub, EventArc, Cloud Tasks or Cloud Scheduler so that authentication is handled out of the box.
In the meantime, it’s your responsibility to ensure that your handlers are protected (or otherwise wrap the urls in a decorator to manage authentication).
The best way of doing this is to generate a single use token and supply it as an event parameter (see :ref:`generating_endpoint_urls`_).
We want to work on this so if you’d like to sponsor that, find us on GitHub!
Events¶
This module provides a simple interface allowing django to absorb events, eg from Pub/Sub push subscriptions or EventArc.
Events are communicated using django’s signals framework. They can be handled by any app (not just django-gcp) simply by creating a signal receiver.
Warning
Please see Authenticating Webhooks and PubSub messages to learn about authenticating incoming messages
Events Endpoints¶
If you have django_gcp
installed correctly (see Add the endpoints), using python manage.py show_urls
will show the endpoints for events.
Endpoints are POST
-only and require two URL parameters, an event_kind
and an event_reference
. The body of the POST
request forms the event_payload
.
So, if you POST
data to https://your-server.com/django-gcp/events/my-kind/my-reference/
then a signal will be dispatched
with event_kind="my-event"
and event_reference="my-reference"
.
Creating A Receiver¶
This is how you attach your handler. In your-app/signals.py
file, do:
import logging
from django.dispatch import receiver
from django_gcp.events.signals import event_received
from django_gcp.events.utils import decode_pubsub_message
logger = logging.getLogger(__name__)
@receiver(event_received)
def receive_event(sender, event_kind, event_reference, event_payload, event_parameters):
"""Handle question updates received via pubsub
:param event_kind (str): A kind/variety allowing you to determine the handler to use (eg "something-update"). Required.
:param event_reference (str): A reference value provided by the client allowing events to be sorted/filtered. Required.
:param event_payload (dict, array): The event payload to process, already decoded.
:param event_parameters (dict): Extra parameters passed to the endpoint using URL query parameters
:return: None
"""
# There could be many different event types, from your own or other apps, and
# django-gcp itself (when we get going with some more advanced features)
# so make sure you only act on the specific kind(s) you want to handle
if event_kind is "something-important":
# Here is where you handle the event using whatever logic you want
# CAREFUL: See the tip above about authentication (verifying the payload is not malicious)
print("DO SOMETHING IMPORTANT WITH THE PAYLOAD:", event_payload)
#
# Your payload can be from any arbitrary source, and is in the form of decoded json.
# However, if the source is Eventarc or Pub/Sub, the payload contains a formatted message
# with base64 encoded data; we provide a utility to further decode this into something sensible:
message = decode_pubsub_message(event_payload)
print("DECODED PUBSUB MESSAGE:" message)
Tip
To handle a range of events, use a uniform prefix for all their kinds, eg:
if event_kind.startswith("my-"):
my_handler(event_kind, event_reference, event_payload)
Generating Endpoint URLs¶
A utility is provided to help generate URLs for the events endpoint.
This is similar to, but easier than, generating URLs with django’s built-in reverse()
function.
It generates absolute URLs by default, because integration with external systems is the most common use case.
import logging
from django_gcp.events.utils import get_event_url
logger = logging.getLogger(__name__)
get_event_url(
'the-kind',
'the-reference',
event_parameters={"a":"parameter"}, # These get encoded as a querystring, and are decoded back to a dict by the events endpoint. Keep it short!
url_namespace="gcp-events", # You only need to edit this if you define your own urlpatterns with a different namespace
)
Tip
By default, get_event_url
generates an absolute URL, using the configured settings.BASE_URL
.
To specify a different base url, you can pass it explicitly:
relative_url = get_event_url(
'the-kind',
'the-reference',
base_url=''
)
non_default_base_url = get_event_url(
'the-kind',
'the-reference',
base_url='https://somewhere.else.com'
)
Generating and Consuming Pub/Sub Messages¶
When hooked up to GCP Pub/Sub or eventarc, the event payload is in the form of a Pub/Sub message.
These messages have a specific format (see https://cloud.google.com/pubsub/docs/reference/rest/v1/PubsubMessage).
To allow you to interact directly with Pub/Sub (i.e. publish messages to a topic), or for the purposes of testing your signals,
django-gcp
includes a make_pubsub_message utility that provides an easy and pythonic way of constructing a Pub/Sub message.
For example, to test the signal receiver above with a replica of a real pubsub message payload, you might do:
from django_gcp.events.utils import make_pubsub_message
from datetime import datetime
class YourTests(TestCase):
def test_your_code_handles_a_payload_from_pubsub(self):
payload = make_pubsub_message({"my": "data"}, publish_time=datetime.now())
response = self.client.post(
reverse("gcp-events", args=["the-event-kind", "the-event-reference"]),
data=payload,
content_type="application/json",
)
self.assertEqual(response.status_code, 201)
Exception Handling¶
Any exception that gets raised in the handlers will be hidden from the user to prevent disclosure of information that may lead to attack.
Instead, a BAD_REQUEST (400)
status code is returned with a generic error message.
Note
We’ll work on adding a way of returning more useful information to the end user, which will probably be based on raising a ValidationError or similar, a bit like using DRF serialisers.
However, this is low priority right now so as always, if you need this feature, ping us on GitHub!
Cloud Run¶
Metadata¶
The container contract for Google Cloud Run specifies an internal server for metadata about the running service. This is useful for:
determining if your app is running on Cloud Run or somewhere else,
fetching values like the
project_id
which are required for structured logginggenerating tokens that can be used to sign blobs without a private key.
To avoid the need to write requests to the internal server yet again, django_gcp
provides a wrapper class for those query, exposing the results as properties.
See django_gcp.metadata.metadata.CloudRunMetadata
.
from django_gcp.metadata import CloudRunMetadata
meta = CloudRunMetadata()
# On your local machine, `meta.is_cloud_run` will be False, and accessing these
# attributes will raise a NotOnCloudRunError
if meta.is_cloud_run:
print(meta.project_id)
print(meta.project_number)
print(meta.region)
print(meta.compute_instance_id)
print(meta.email)
print(meta.token)
Attention
The log handlers included here work great but we suspect some improvements could be made
to the structure of the logs to give fuller / more easily filterable results, especially
around trace
/span
and the contents of the httpRequest
object. Pick up the issue here: PRs are welcome!
Logging¶
Tip
Quickly set up logging out of the box, by dropping the LOGGING entry from the example test server
into your settings.py
.
Structured logs¶
On Google Cloud, if you use structured logging, your entries can be filtered and inspected much more powerfully than if you log in plain text.
Django has its own default logging configuration, and we need to do some
tweaking to it to make sure we capture the information in a structured way. Notice particularly that the django
and django.server
modules have specific setups to record,
for example, request-level information.
django-gcp
provides django_gcp.logging.GoogleStructuredLogsHandler
to add django-specific
behaviour to the Google StructuredLogsHandler
that is used under the hood.
Error Reporting¶
This isn’t the same thing as structured logging.
If you use Google Cloud Error Reporting (as opposed to sentry or similar), django-gcp
provides
a handler enabling you to send errors/exceptions directly from django. Then you can configure Error Reporting
as you wish (eg to track unresolved errors, email teams, connect issue trackers, etc).
django-gcp
provides django_gcp.logging.GoogleErrorReportingHandler
to do this. You need to set the
GCP_ERROR_REPORTING_SERVICE_NAME
value in your settings.py.
Storage¶
This module provides helpers for working with Google Cloud Storage, including:
A django
Storage
class allowing django’sFileField
to use GCS as a storage backend. This incorporates the GCS-specific parts of django-storages.A
BlobField
with an associated widget to facilitate direct uploads and provide more powerful ways of working with GCS features including metadata and revisions.

The widget provides a better user experience for blankable and overwriting options.¶
Installation and Authentication¶
First, follow the instructions to install, authenticate and (if necessary) set your project.
Create bucket(s)¶
This library doesn’t create buckets for you: infrastructure operations should be kept separate and dealt with using tools built for the purpose, like terraform or Deployment Manager.
If you’re setting up for the first time and don’t want to get into that kind of infrastructure-as-code stuff, then manually create two buckets in your project:
One with object-level permissions for media files.
One with uniform, public permissions for static files.
Tip
Having two buckets like this means it’s easier to configure which files are public and which aren’t. Plus, you can serve your static files much more efficiently - publicly shared files are cached in google’s cloud CDN, so they’re lightning quick for users to download, and egress costs you amost nothing.
Tip
To make it easy and consistent to set up (and remember which is which!), we always use kebab case for our bucket names in the form:
<app>-<purpose>-<environment>-<media-or-static>
The buckets for a staging environment in one of our apps look like this:

Setup Media and Static Storage¶
The most common types of storage are for media and static files, using the storage backend. We derived a custom storage type for each, making it easier to name them.
In your settings.py
file, do:
# Set the default storage (for media files)
DEFAULT_FILE_STORAGE = "django_gcp.storage.GoogleCloudMediaStorage"
GCP_STORAGE_MEDIA = {
"bucket_name": "app-assets-environment-media" # Or whatever name you chose
}
# Set the static file storage
# This allows `manage.py collectstatic` to automatically upload your static files
STATICFILES_STORAGE = "django_gcp.storage.GoogleCloudStaticStorage"
GCP_STORAGE_STATIC = {
"bucket_name": "app-assets-environment-static" # or whatever name you chose
}
# Point the urls to the store locations
# You could customise the base URLs later with your own cdn, eg https://static.you.com
# But that's only if you feel like being ultra fancy
MEDIA_URL = f"https://storage.googleapis.com/{GCP_STORAGE_MEDIA_NAME}/"
MEDIA_ROOT = "/media/"
STATIC_URL = f"https://storage.googleapis.com/{GCP_STORAGE_STATIC_NAME}/"
STATIC_ROOT = "/static/"
Default and Extra stores¶
Any number of extra stores can be added, each corresponding to a different bucket in GCS.
You’ll need to give each one a “storage key” to identify it. In your settings.py
, include extra stores as:
GCP_STORAGE_EXTRA_STORES = {
"my_fun_store_key": {
"bucket_name": "all-the-fun-datafiles"
},
"my_sad_store_key": {
"bucket_name": "all-the-sad-datafiles"
}
}
Once you’re done, default_storage will be your Google Cloud Media Storage:
>>> from django.core.files.storage import default_storage
>>> print(default_storage.__class__)
<class 'django_gcp.storage.GoogleCloudMediaStorage'>
This way, if you define a new FileField, it will use that storage bucket:
>>> from django.db import models
>>> class MyModel(models.Model):
... my_file_field = models.FileField(upload_to='pdfs')
... my_image_field = models.ImageField(upload_to='photos')
...
>>> obj1 = MyModel()
>>> print(resume.pdf.storage)
<django_gcp.storage.GoogleCloudMediaStorage object at ...>
BlobField Storage¶
The benefit of a BlobField is that you can do direct upload of objects to the cloud.
This allows you to accept uploads of files > 32mb whilst on request-size-limited services like Cloud Run.
To enable this and other advanced features (like caching of metadata and blob version tracking),
BlobField``s intentionally don't maintain the ``FileField
api. Under the hood,
a BlobField is actually a JSONField allowing properties other than just the blob name to be stored in the database.
We’ll flesh out these instructions later (or Pull requests accepted!) but in the meantime, see the example implementation here.
You’ll need to:
Add a django_gcp.storage.fields.BlobField field to a model.
Define a get_destination_path callback to generate the eventual name of the blob in the store.
Tip
On upload, blobs are always ingressed to a temporary location then moved to their eventual destination on save of
the model. Two steps (ingress -> rename) seems unnecessary, but this allows the eventual destination to use
the other model fields. It also avoids problems where you require deterministic object names: where object
versioning or retention is enabled on your bucket, an unrelated failure in
the model save()
process will prevent future uploads to the same pathname.
Warning
Migrating from an existing FileField
to a BlobField
is possible but a bit tricky.
We provide an example of how to do that migration in the example server model (see the instructions in the model, and the corresponding migration files)
FileField Storage¶
Works as a standard drop-in storage backend.
Standard file access options are available, and work as expected
>>> default_storage.exists('storage_test')
False
>>> file = default_storage.open('storage_test', 'w')
>>> file.write('storage contents')
>>> file.close()
>>> default_storage.exists('storage_test')
True
>>> file = default_storage.open('storage_test', 'r')
>>> file.read()
'storage contents'
>>> file.close()
>>> default_storage.delete('storage_test')
>>> default_storage.exists('storage_test')
False
An object without a file has limited functionality
>>> obj1 = MyModel()
>>> obj1.my_file_field
<FieldFile: None>
>>> obj1.my_file_field.size
Traceback (most recent call last):
...
ValueError: The 'my_file_field' attribute has no file associated with it.
Saving a file enables full functionality
>>> obj1.my_file_field.save('django_test.txt', ContentFile('content'))
>>> obj1.my_file_field
<FieldFile: tests/django_test.txt>
>>> obj1.my_file_field.size
7
>>> obj1.my_file_field.read()
'content'
Files can be read in a little at a time, if necessary
>>> obj1.my_file_field.open()
>>> obj1.my_file_field.read(3)
'con'
>>> obj1.my_file_field.read()
'tent'
>>> '-'.join(obj1.my_file_field.chunks(chunk_size=2))
'co-nt-en-t'
Save another file with the same name
>>> obj2 = MyModel()
>>> obj2.my_file_field.save('django_test.txt', ContentFile('more content'))
>>> obj2.my_file_field
<FieldFile: tests/django_test_.txt>
>>> obj2.my_file_field.size
12
Push the objects into the cache to make sure they pickle properly
>>> cache.set('obj1', obj1)
>>> cache.set('obj2', obj2)
>>> cache.get('obj2').my_file_field
<FieldFile: tests/django_test_.txt>
Storage Settings Options¶
Each store can be set up with different options, passed within the dict given to GCP_STORAGE_MEDIA
, GCP_STORAGE_STATIC
or within the dicts given to GCP_STORAGE_EXTRA_STORES
.
For example, to set the media storage up so that files go to a different location than the root of the bucket, you’d use:
GCP_STORAGE_MEDIA = {
"bucket_name": "app-assets-environment-media"
"location": "not/the/bucket/root/",
# ... and whatever other options you want
}
The full range of options (and their defaults, which apply to all stores) is as follows:
gzip
¶
Type: boolean
Default: False
Whether or not to enable gzipping of content types specified by GZIP_CONTENT_TYPES
gzip_content_types
¶
Type: tuple
Default: (text/css
, text/javascript
, application/javascript
, application/x-javascript
, image/svg+xml
)
Content types which will be gzipped when GCP_STORAGE_IS_GZIPPED
is True
default_acl
¶
Type: string or None
Default: None
ACL used when creating a new blob, from the list of predefined ACLs. (A “JSON API” ACL is preferred but an “XML API/gsutil” ACL will be translated.)
For most cases, the blob will need to be set to the publicRead
ACL in order for the file to be viewed.
If GCP_STORAGE_DEFAULT_ACL
is not set, the blob will have the default permissions set by the bucket.
publicRead
files will return a public, non-expiring url. All other files return
a signed (expiring) url.
ACL Options are: projectPrivate
, bucketOwnerRead
, bucketOwnerFullControl
, private
, authenticatedRead
, publicRead
, publicReadWrite
Note
GCP_STORAGE_DEFAULT_ACL must be set to ‘publicRead’ to return a public url. Even if you set the bucket to public or set the file permissions directly in GCS to public.
Note
When using this setting, make sure you have fine-grained
access control enabled on your bucket,
as opposed to Uniform
access control, or else, file uploads will return with HTTP 400. If you
already have a bucket with Uniform
access control set to public read, please keep
GCP_STORAGE_DEFAULT_ACL
to None
and set GCP_STORAGE_QUERYSTRING_AUTH
to False
.
querystring_auth
¶
Type: boolean
Default: True
If set to False
it forces the url not to be signed. This setting is useful if you need to have a
bucket configured with Uniform
access control configured with public read. In that case you should
force the flag GCP_STORAGE_QUERYSTRING_AUTH = False
and GCP_STORAGE_DEFAULT_ACL = None
file_overwrite
¶
Type: boolean
Default: True
By default files with the same name will overwrite each other. Set this to False
to have extra characters appended.
max_memory_size
¶
Type: integer
Default: 0
(do not roll over)
The maximum amount of memory a returned file can take up (in bytes) before being rolled over into a temporary file on disk. Default is 0: Do not roll over.
blob_chunk_size
¶
Type: integer
or None
Default None
The size of blob chunks that are sent via resumable upload. If this is not set then the generated request must fit in memory. Recommended if you are going to be uploading large files.
Note
This must be a multiple of 256K (1024 * 256)
object_parameters
¶
Type: dict
Default: {}
Dictionary of key-value pairs mapping from blob property name to value.
Use this to set parameters on all objects. To set these on a per-object
basis, subclass the backend and override GoogleCloudStorage.get_object_parameters
.
The valid property names are
acl
cache_control
content_disposition
content_encoding
content_language
content_type
metadata
storage_class
If not set, the content_type
property will be guessed.
If set, acl
overrides GCP_STORAGE_DEFAULT_ACL.
Warning
Do not set name
. This is set automatically based on the filename.
custom_endpoint
¶
Type: string
or None
Default: None
Sets a custom endpoint,
that will be used instead of https://storage.googleapis.com
when generating URLs for files.
location
¶
Type: string
Default: ""
Subdirectory in which the files will be stored. Defaults to the root of the bucket.
expiration
¶
Type: datetime.timedelta
datetime.datetime
, integer
(seconds since epoch)
Default: timedelta(seconds=86400)
The time that a generated URL is valid before expiration. The default is 1 day. Public files will return a url that does not expire. Files will be signed by the credentials provided during authentication.
The GCP_STORAGE_EXPIRATION
value is handled by the underlying Google library.
It supports timedelta, datetime, or integer seconds since epoch time.
Tasks¶
In django, tasks are used to handle processing work that happens outside of the main request-response cycle.
django-gcp allows tasks to be processed in a serverless environment like cloud run, triggered by managed services like Cloud Tasks, Cloud Scheduler or Pub/Sub topics.
About tasks in django¶
Tasks in django include, for example, dispatching jobs whose execution is too long to occur within a request (anything more than a few hundred milliseconds should probably be offloaded), running scheduled maintenance tasks (like refreshing a cache), or processing data that doesn’t need to be done within a request loop.
The classic example is sending email to a user responding to a registration request: a task requiring interaction with a third party API, making the request slow.
Existing solutions¶
Historically, to manage the queue of tasks django has required the use of libraries like celery (which is very tricky to set up correctly) or django-dramatiq (a much cleaner API than Celery, still a great option today) with an external message handler/store like REDIS.
However, managing these queues requires the dev team to think about exactly-once delivery, retries and throttling. A redis or rabbitmq instance must be created and managed. To invoke tasks periodically, a cron job is required (meaning yet another working part somewhere in the devops maze). Finally, these systems operate on a pull-based model, meaning that you constantly have to have workers alive, listening to the queue.
All that makes it difficult to run django in a serverless environment like Cloud Run. Plus, where tasks are only intermittent, it wastes a lot of money having workers up all the time.
Why django-gcp
for tasks?¶
django-gcp uses a push-based model, meaning that workers can be serverless: autoscaled from zero in response to task requests.
It uses Google’s managed services, Cloud Tasks and Cloud Scheduler enabling very quick and easy configuration of robust task queues and periodic triggers.
Creating and using tasks¶
TODO: I’ve written SO MUCH already and need to get this into production. This week.
I’ll come back and explain this, I promise.
~~ Tom ~~
IN THE MEANTIME:
Look at management commands available (both in django gcp and the example app), and look at the full example implementation here to pick up how to define and use tasks :)
If you need to use this library and can’t figure it out, get in touch by raising an issue on GitHub and we’ll help you configure your app (and write the docs at the same time).
Deduplicating tasks¶
OnDemandTask classes with the attribute deduplicate = true have the special property that the task cannot be repeated.
Duplication is done using both the task name AND a short_sha of the payload data. That is:
You can enqueue the same task twice in succession with different payloads.
If you enqueue the same task with the same payload twice in quick succession, you will get a DuplicateTaskError.
A duplicate task will fail for ~1 hour after it is either executed or deleted from the queue.
Tip
Deduplicating tasks introduces significant additional latency into the task queue. So don’t do it unless you have to!
Note
GCP requires a task ID to deduplicate tasks, whose string ordering should be optimally binomaially distributed.
django-gcp
always prefixes the short_sha
of the payload to ensure that the created task IDs are approximately
binomially distributed (as opposed to using the task name as a prefix, which would give a highly non-optimal distribution
in N clusters, where N is the number of differently named tasks).
Tasks Settings¶
There are a number of settings required to enable On-Demand and Scheduled tasks to work, we recommend you go through the following one-by-one…
In order of importance!
GCP_TASKS_DEFAULT_QUEUE_NAME
¶
Type: string
(required)
The name of the task queue on GCP used for on-demand tasks. This will be created (if not already present) when you enqueue your first task.
GCP_TASKS_DOMAIN
¶
Type: string
(required)
The base url of the server to which tasks will be pushed. In production, this’ll need to be set to the URL of your worker service (see Deploying Workers).
Tip
In local development, set up localtunnel and use its -s option to set
yourself an amusing subdomain. You can then set GCP_TASKS_DOMAIN = "https://king-julian-in-da-house.loca.lt"
in your
local environment, and receive https://
traffic.
That’s awesome, because (assuming you’ve installed local credentials per Locally it allows you to spin up actual real queues and schedules on GCP to get a feel for how this all works.
GCP_TASKS_RESOURCE_AFFIX
¶
Type: string
Default: None
This is a label which is affixed to the names of all resources created by django_gcp
. It’s HIGHLY RECOMMENDED that you
set this to avoid confusion about what resources belong to what applications, and to enable cleanup of old resources.
If left unset, there’ll be no affix applied. This might be exactly what you want, for example if you manage all your task queues and scheduler jobs on existing infrastructure or using terraform, your own naming convention may already apply.
Warning
Without setting GCP_TASKS_RESOURCE_AFFIX
, django-gcp
won’t be able to clean up after itself so
you’ll have to remove any resources manually yourself.
Make sure you don’t have multiple independent django apps with the same affix, or one app may delete resources for another.
Note
SubscriberTask
subclasses that listen to a PubSub topic don’t automatically add a prefix to the topic name they listen to.
This allows you to subscribe to any topic on GCP for triggering tasks; if you want to use the prefix you can do so when
defining the topic_name
override
GCP_TASKS_REGION
¶
Type: string
Default: “europe-west1”
The region in which resources (Task Queues, Scheduler Jobs, and PubSub topics) are accessed and/or created.
GCP_TASKS_DELIMITER
¶
Type: string
Default: “–”
The delimiter used when creating resource names with an affix or other identifier.
GCP_TASKS_EAGER_EXECUTE
¶
Type: boolean
Default: False
If set to true, tasks will synchronously execute when their enqueue()
method ``is called (eg within a request).
Whilst not generally useful in production, this can be quite helpful for straightforward debugging of tasks in local environments.
GCP_TASKS_DISABLE_EXECUTE
¶
Type: boolean
Default: False
If set to true, tasks will not be enqueued for processing when their enqueue()
or enqueue_later()
methods are called. Instead, the method will simply return None without enqueuing the task, allowing for the
disabling of task execution. This can be useful in scenarios where tasks need to be temporarily disabled or
when testing/debugging task code.
It is important to note that this setting only affect tasks when their enqueue()
or enqueue_later()
methods are called and that tasks can still be executed manually even if this setting is set to True.
Task Workers¶
A worker is a server instance, running the django application, whose sole job it is to do the tasks that get placed onto the queue (or pushed via scheduler or PubSub).
Deploying Workers¶
In the most straightforward usage, you don’t even need a separate worker. To get up and running minimally, you can point GCP_TASKS_DOMAIN straight back to the app itself! Check the docs of that setting for more tips on local development.
However, in most cases you’ll want the server to scale independently of the worker service and that’s not hard to achieve.
Begin using exactly the same configuration and deployment process that you use for the main server (eg deploy to Cloud Run, but use a -worker suffix in the app name).
Get the release-specific URL to that deployment.
Set that URL as the GCP_TASKS_DOMAIN value on the server.
Tip
Using Cloud Run, you can provide a tag to create a revision-specific URL as part of the worker deployment process. If you deploy worker and server at the same time, and configure the server with the revision-specific URL, the server will always send tasks to the same version of code that it’s running on itself. This is great for maintaining continuous uptime without worrying about breaking changes in the data required by your tasks.
On GitHub Actions, that looks something like:
# ... build an image, then ...
- name: Deploy to Cloud Run Worker
id: deploy_worker
uses: google-github-actions/deploy-cloudrun@v0
with:
service: yourapp-worker-${{ needs.build.outputs.environment }}
image: ${{ needs.build.outputs.image_version_artefact }}
region: europe-west1
tag: ${{ needs.build.outputs.short_sha }}
- name: Deploy to Cloud Run Server
id: deploy_server
uses: google-github-actions/deploy-cloudrun@v0
with:
env_vars: |
GCP_TASKS_DOMAIN=${{ steps.deploy_worker.outputs.url }}
image: ${{ needs.build.outputs.image_version_artefact }}
region: europe-west1
service: yourapp-server-${{ needs.build.outputs.environment }}
tag: ${{ needs.build.outputs.short_sha }}
Microservices as Workers¶
There’s nothing special or django-gcp specific about the data passed to tasks.
So, there’s absolutely no reason why you shouldn’t use entirely separate
microservices to receive and process tasks created by django-gcp
!
Enjoy yourself, and let us know what you build :)
Projects¶
In most cases, the id of the GCP project you’re working on will be inferred from your Application Default Credentials or Service Account (see Authentication).
If that’s not correct (eg your service account has privileges across projects), you may need to set it explicitly.
Settings¶
GCP_PROJECT_ID
(optional)
Your Google Cloud project ID. If unset, falls back to the default inferred from the environment.
License¶
The Boring Bit¶
Third Party Libraries¶
django-gcp includes or is linked against code from third party libraries, see our attributions page.
Version History¶
We used to recommend people create version histories. But we now do it automatically using our conventional commits tools for completely automating code versions, release numbering and release history.
So for a full version history, check our releases page.
Thanks¶
This project is heavily based on a couple of really great libraries, particularly django-storages
and django-cloud-tasks
.
See our attributions page.
Thank you so much to the (many) authors of these libraries :)
Also, this library boilerplate is from the django-rabid-armadillo
project…