Storage

Configure storage backends for content, media, and artifacts.

What Are Storage Systems?

Storage systems are the configurable backends where the platform stores files, media, and content. Every piece of uploaded content — videos, images, documents, audio files, supplementary attachments — lives in a storage system. The platform supports multiple storage backends and can be configured to use different backends for different purposes, giving administrators flexibility to optimize for cost, performance, and geographic requirements.

Storage configuration is a foundational administrative task. The choices you make here affect how content is stored, accessed, and served across the entire platform.

Supported Backends

The platform supports three categories of storage backends, each suited to different deployment scenarios:

Backend	Best For	Details
S3-Compatible Object Storage	Production deployments, scalable storage	Works with Amazon S3, MinIO, DigitalOcean Spaces, Backblaze B2, and any S3-compatible service. Provides durable, scalable storage with built-in redundancy.
Google Cloud Storage	Google Cloud deployments	Native integration with Google Cloud Storage buckets. Supports storage classes, lifecycle management, and regional or multi-regional storage.
Local Filesystem	Development, testing, small deployments	Stores files on the server's local disk. Simple to set up but not suitable for multi-server deployments or production workloads that require redundancy.

S3-compatible storage is the most versatile option. It works with dozens of cloud and self-hosted providers, making it the recommended choice for production deployments regardless of your cloud provider.

Storage System Configuration

Each storage backend is configured as a named storage system with its own connection settings. A storage system configuration includes:

Name and identifier. A unique name that identifies the storage system within the platform. Other parts of the system reference storage by this identifier.
Backend type. Which storage technology to use — S3-compatible, Google Cloud Storage, or local filesystem.
Connection details. Endpoint URLs, bucket names, access credentials, and region settings specific to the chosen backend type.
Path configuration. How file paths are organized within the storage backend. The platform uses content-addressable paths based on content hashes to ensure efficient deduplication.

Connection credentials (access keys, service account keys) are stored as encrypted configuration values, keeping them secure even if the configuration database is accessed directly. See Configuration for details on encrypted values.

Content-Addressable Blob Storage

The platform uses content-addressable storage for file deduplication. When a file is uploaded, the platform computes a hash of the file's contents and uses that hash as the storage key. This approach provides several benefits:

Automatic deduplication. If the same file is uploaded multiple times — for example, the same logo used across many content items — only one copy is stored. All references point to the same blob.
Integrity verification. The content hash serves as a checksum. When a file is retrieved, the platform can verify that the stored data has not been corrupted or tampered with.
Reference counting. The platform tracks how many content items reference each blob. This count is used to determine when a blob is no longer needed and can be safely removed during garbage collection.

Content-addressable storage means that deleting a content item does not necessarily delete the underlying file. The file is only removed when its reference count drops to zero — meaning no other content item is using it.

Multiple Storage Systems

The platform supports configuring multiple storage systems simultaneously. This allows administrators to route different types of content to different backends based on requirements:

Media on high-performance storage. Video and audio files that need fast streaming can be stored on a high-throughput storage backend.
Archives on cost-effective storage. Older content and backups can be stored on less expensive storage tiers.
Regional storage. Content can be stored in specific geographic regions to comply with data residency requirements or reduce latency for regional audiences.
Separation of concerns. Keep application artifacts, user-uploaded content, and system data in separate storage systems for easier management and backup.

Artifact Registry Support

Beyond content files, the storage system also supports serving as an artifact registry. This enables the platform to store and serve software artifacts used in development and deployment workflows:

Artifact Type	Use Case
Docker Images	Store and serve container images for platform services and custom deployments. Works with standard Docker pull and push commands.
Maven Packages	Host Java and Kotlin libraries and plugins used by platform scripts and extensions.
npm Packages	Serve JavaScript and TypeScript packages for front-end applications and custom integrations.

The artifact registry uses the same underlying storage systems as content storage, so the same configuration, redundancy, and access controls apply to software artifacts.

Storage Initialization

When setting up a new storage backend, the platform needs to initialize the necessary metadata structures. Initialization creates the required indexes, tracking tables, and configuration entries that the platform uses to manage files within the storage system.

Initialization is typically performed once when a new storage system is first configured. The process verifies that the backend is accessible with the provided credentials, creates any required directory structures or bucket configurations, and registers the storage system with the platform's metadata layer.

Always test storage system connectivity before initializing a new backend. Misconfigured credentials or inaccessible endpoints will cause initialization to fail and may leave the storage system in a partially configured state.

Garbage Collection

Over time, content items are deleted, updated, and replaced. The files that backed the old content may no longer be needed. The platform's garbage collection process safely removes unreferenced blobs to reclaim storage space.

Garbage collection works through reference counting:

Track references. Every time a content item is created or updated with a file, the reference count for that file's blob is incremented. Every time a content item is deleted or its file is replaced, the reference count is decremented.
Identify orphans. Blobs with a reference count of zero are candidates for removal. They are no longer associated with any content item.
Safe removal. Orphaned blobs are removed from storage after a safety delay. This delay protects against race conditions where a blob's reference count temporarily drops to zero during a content update operation.

Garbage collection runs as a background job and can be monitored through the Jobs interface. Administrators can also trigger garbage collection manually when they need to reclaim space immediately.

Monitor storage usage trends alongside garbage collection activity. If storage usage grows faster than expected, it may indicate that garbage collection is not running frequently enough or that deduplication is not being effective for your content patterns.