
Alibaba Cloud Full Stack (4): OSS — Object Storage Done Right
Master Alibaba Cloud OSS: bucket types, storage classes, access control (ACL, RAM, STS, signed URLs), lifecycle rules, cross-region replication, CDN integration, and custom domains. Build a complete media storage backend.
I used to store user uploads on the ECS disk. Profile pictures, PDF invoices, CSV exports — all dumped into /var/data/uploads/ on a single ecs.g7.large running my Flask app. I had a cron job that rsynced the directory to a second ECS instance every six hours as a “backup.” Then one Friday at 3am, the system disk hit 100% because a batch job generated 40GB of reports nobody ever downloaded, the instance went read-only, the app crashed, and the rsync hadn’t run since the previous evening. I lost six hours of user uploads and spent the weekend apologizing to customers. That was the week I learned that object storage is not a nice-to-have — it is the foundation of everything you build in the cloud. Your application server is ephemeral. Your data is not.
This article covers Alibaba Cloud’s Object Storage Service from first principles through production deployment. By the end, you will have a working media storage backend with lifecycle management, CDN acceleration, and presigned uploads from a Python API. We set up the VPC and ECS foundation in Part 2 and Part 3 — now we add the storage layer that survives instance failures, scales to petabytes, and costs a fraction of block storage.
What Is OSS?#
Object Storage Service is Alibaba Cloud’s equivalent of AWS S3. You store files — called “objects” — in containers called “buckets.” Each object has a unique key (its path), the data itself, and metadata. That is the entire data model. There are no directories, no file hierarchies, no POSIX semantics. When you see images/2026/05/avatar.png in OSS, the slashes are part of the key string, not a directory structure. The console renders them as folders for convenience, but the storage layer is flat.
This simplicity is the point. Because OSS does not need to maintain a filesystem tree, it can distribute objects across thousands of storage nodes transparently. You never think about capacity planning, disk IOPS, or RAID configurations. You PUT an object, and OSS figures out where to store it, how to replicate it across zones for durability, and how to serve it back when you GET it. The durability guarantee is 99.9999999999% (twelve nines) for Standard storage. That is “designed to lose at most one object if you store ten billion.”
Three types of cloud storage#
Alibaba Cloud offers three fundamentally different storage products, and using the wrong one is a common mistake:
| Storage type | Product | Access pattern | Analogy |
|---|---|---|---|
| Block storage | EBS (Cloud Disks) | Attach to one ECS, random read/write | A hard drive plugged into your computer |
| File storage | NAS / CPFS | Shared across multiple ECS via NFS/SMB | A network file share in your office |
| Object storage | OSS | HTTP API, no mount, unlimited capacity | Dropbox with an API |
Block storage (cloud disks attached to ECS) gives you a raw block device that the OS formats with ext4 or xfs. It is fast, low-latency, and supports random I/O — perfect for databases, OS boot volumes, and anything that needs POSIX filesystem semantics. But it can only be attached to one instance at a time, and you pay for provisioned capacity whether you use it or not.
File storage (NAS) provides a shared filesystem that multiple ECS instances can mount simultaneously via NFS v3/v4 or SMB. Great for legacy applications that need a shared /data directory, CMS systems, or development environments. But it is expensive per GB and performance depends on the capacity tier you purchase.
Object storage (OSS) is for everything else — and “everything else” is usually 90% of your data. Static assets, user uploads, backups, logs, data lake files, ML training datasets, video, audio, documents. If you access it via HTTP and do not need to edit bytes in the middle of the file, OSS is the right answer.
OSS vs AWS S3#
If you are coming from AWS, the mapping is straightforward:
| AWS S3 concept | OSS equivalent | Notes |
|---|---|---|
| Bucket | Bucket | Same 3-63 character naming rules |
| Object | Object | Same key/value/metadata model |
| Region | Region | Same region-scoped bucket concept |
| S3 Standard | Standard | Hot data, frequent access |
| S3 Standard-IA | Infrequent Access (IA) | 30-day minimum storage |
| S3 Glacier | Archive | 90-day minimum, 1-minute restore |
| S3 Glacier Deep Archive | Deep Cold Archive | 180-day minimum, hours to restore |
| Presigned URL | Signed URL | Same concept, different SDK method names |
| Bucket Policy | Bucket Policy | JSON-based, similar syntax |
| S3 Lifecycle | Lifecycle Rules | Same transition/expiration model |
| Cross-Region Replication | Cross-Region Replication | Same async replication model |
| CloudFront + S3 | CDN + OSS | Native integration, same back-to-origin pattern |
The main differences: OSS uses AccessKey ID/Secret instead of AWS Signature V4 (though the SDK handles this). OSS endpoints follow the pattern oss-{region}.aliyuncs.com rather than s3.{region}.amazonaws.com. And OSS has a unique “internal endpoint” for each region (e.g., oss-cn-beijing-internal.aliyuncs.com) that provides free data transfer when accessed from ECS instances in the same region — AWS charges for the same traffic.
Key concepts#
Four things you need to understand before writing any code:
Bucket — A globally unique container for objects. Bucket names must be 3-63 characters, lowercase letters, numbers, and hyphens only. They are region-scoped — a bucket in cn-beijing stores data in Beijing. You cannot rename or move a bucket after creation.
Object — A file stored in a bucket, identified by a key (path string). Maximum object size is 48.8 TB. Objects are immutable — you replace the entire object on update, you cannot modify bytes in place.
Region and Endpoint — Each bucket lives in one region. Access it via the public endpoint (oss-cn-beijing.aliyuncs.com), the internal endpoint (free from ECS in the same region), or a custom domain you bind.
AccessKey — Your credentials for API access. In production, never use your root account AccessKey. Use RAM users or STS temporary credentials, which we cover in the Access Control section below.
Storage Classes#
OSS has five storage classes, and choosing the right one can cut your bill by 80% or inflate it by 10x. The mental model: the cheaper the storage, the more expensive and slower the retrieval.

| Storage class | $/GB/month | Minimum duration | Retrieval cost | Restore time | Best for |
|---|---|---|---|---|---|
| Standard | ~0.020 | None | Free | Instant | Hot data, frequently accessed files |
| Infrequent Access (IA) | ~0.012 | 30 days | ~0.010/GB | Instant | Data accessed < 1-2x per month |
| Archive | ~0.005 | 90 days | ~0.020/GB | 1 minute (Expedited) | Quarterly reports, old backups |
| Cold Archive | ~0.002 | 180 days | ~0.030/GB | 1-10 hours | Compliance archives, legal hold |
| Deep Cold Archive | ~0.001 | 180 days | ~0.050/GB | 12-48 hours | Data you never want to read again |
Prices are approximate for cn-beijing. Check the OSS Pricing Page for current rates and regional variations.
A few things that trip people up:
Minimum storage duration is billed, not stored. If you upload a file to Archive storage and delete it after 10 days, you are still charged for 90 days. This is true for all classes except Standard.
Retrieval costs are per-GB. Restoring 1TB from Cold Archive costs about $30 just for the retrieval, on top of the transfer costs. Think before you archive.
IA has a minimum object size. Objects smaller than 64KB are charged as 64KB. If you are storing millions of tiny JSON files, IA will cost more than Standard.
Archive and Cold Archive require a restore step. You cannot read the object directly. You issue a restore request, wait for the restore to complete, then the object is readable for a configurable period (1-7 days). After that, it goes back to archived state.
The golden rule: start everything in Standard, measure your access patterns for 30 days using OSS access logging, then set lifecycle rules to auto-transition cold data. Do not guess.
Creating and Managing Buckets#
Console walkthrough#

The fastest way to create your first bucket:
- Open the OSS Console
- Click Create Bucket
- Set the bucket name (globally unique, e.g.,
myapp-prod-media-cn) - Select region (e.g.,
cn-beijing) - Storage class: Standard (change later via lifecycle rules)
- Access Control: Private (always start private)
- Versioning: Enable (you can always suspend it later, but enabling retroactively does not version existing objects)
- Server-Side Encryption: AES-256 or KMS (I recommend AES-256 for most workloads — it is free and transparent)
- Click OK
CLI with ossutil#
ossutil is the OSS command-line tool. Install it first:
| |
Now create a bucket and start working with objects:
| |
Bucket naming rules#
- 3-63 characters
- Lowercase letters, numbers, hyphens only
- Must start and end with a letter or number
- Globally unique across all of Alibaba Cloud (not just your account)
- Cannot be renamed after creation
I use the pattern {app}-{env}-{purpose}-{region-short} — e.g., myapp-prod-media-cn, myapp-staging-logs-cn. This prevents naming collisions and makes it obvious what each bucket is for when you are staring at a list of 30 buckets at 2am.
Versioning#
Versioning keeps every version of every object. When you overwrite report.pdf, the old version is not deleted — it becomes a non-current version. When you delete report.pdf, it gets a delete marker but the data remains.
| |
Versioning is essential for any bucket containing user data. The storage cost doubles (because you keep old versions), but the alternative — losing data permanently on accidental overwrite — is worse. Combine versioning with lifecycle rules to auto-delete non-current versions after 30 days, which keeps costs controlled.
Access Control Deep Dive#
OSS access control has four layers, and understanding how they interact is the difference between a secure system and a public data breach. They are evaluated from most specific to least specific: STS/RAM policies override bucket policies, which override bucket ACLs.

Layer 1: Bucket ACL#
The simplest and coarsest control. Three options:
| ACL | Anonymous read | Anonymous write | Use case |
|---|---|---|---|
| private | No | No | Default. Almost everything. |
| public-read | Yes | No | Static websites, public CDN origin |
| public-read-write | Yes | Yes | Never use this. |
| |
I am not exaggerating about public-read-write. Setting a bucket to public-read-write means anyone on the internet can upload arbitrary files to your bucket, run up your storage bill, and use your bucket as a malware distribution point. I have seen this in production. Do not do it.
public-read is appropriate only for static assets served directly from OSS (without CDN) where you want the simplest possible setup. Even then, I prefer private plus CDN with origin access identity — but we will get to that.
Layer 2: Bucket Policy#
Bucket policies are JSON documents attached to the bucket that define who can do what. They are resource-based policies, similar to S3 bucket policies. This is the recommended way to grant cross-account access or fine-grained permissions without touching RAM.
| |
This policy says: “Allow Alibaba Cloud account 203917385849**** to read objects under the shared/ prefix, but only from the IP range 203.0.113.0/24.” You can restrict by IP, by VPC, by time of day, by referer header, or by whether the request uses HTTPS.
Apply a bucket policy via the CLI:
| |
Layer 3: RAM Policy#
RAM (Resource Access Management) policies are identity-based — attached to RAM users, groups, or roles. This is what your application server uses.
Create a RAM user for your application with minimum necessary permissions:
| |
Two resources are needed: the bucket itself (for ListObjects) and bucket/* (for object operations). Missing the first one is a common cause of “Access Denied on ListBuckets.”
| |
Layer 4: STS Temporary Credentials#
Security Token Service issues temporary credentials that expire after a configurable period (15 minutes to 1 hour). This is what you use for browser-based uploads and mobile apps — never embed long-lived AccessKeys in client code.
The flow:
- Client requests an upload token from your backend
- Your backend calls STS
AssumeRolewith a scoped-down policy - STS returns temporary AccessKeyId, AccessKeySecret, and SecurityToken
- Client uses those credentials to upload directly to OSS
- Credentials expire automatically
| |
The critical detail: the policy parameter in AssumeRole further restricts the role’s permissions. Even if the role has full OSS access, the temporary credentials only get PutObject on one specific path. This is defense in depth.
Signed URLs#
For one-off sharing or time-limited downloads, generate a signed URL that expires:
| |
This outputs a URL with the signature embedded as query parameters. Anyone with the URL can download the file until it expires. No authentication needed on the client side.
In Python:
| |
Uploading and Downloading#
Simple upload#
For files under 5 GB, a simple PUT request does the job:
| |
Multipart upload#
For files larger than 100 MB, use multipart upload. The file is split into parts (minimum 100 KB each, except the last), uploaded in parallel, then assembled server-side. If a part fails, you retry just that part — not the entire file.
| |
Under the hood, resumable_upload does:
- Calls
InitiateMultipartUploadto get an upload ID - Splits the file into parts
- Uploads each part with
UploadPart(parallelized) - Calls
CompleteMultipartUploadto assemble the object - Saves a checkpoint file locally so it can resume if interrupted
Resumable download#
For large downloads on unreliable networks:
| |
Using ossutil for bulk operations#
| |
Presigned URL upload from the browser#
The most common pattern for user-facing applications: generate a presigned PUT URL on the server, send it to the browser, let the browser upload directly to OSS. Your server never touches the file bytes.
| |
| |
This saves you from proxying file uploads through your application server, which would consume bandwidth and memory proportional to file size. With presigned URLs, the browser talks directly to OSS, and your server just coordinates.
Lifecycle Rules#
Lifecycle rules automate storage class transitions and object expiration. This is where the real cost savings happen. Set them up once and forget about them.

Common patterns#
Pattern 1: Progressive archival
| |
This rule, applied to the logs/ prefix, says:
- After 30 days, move to Infrequent Access (saves ~40%)
- After 90 days, move to Archive (saves ~75%)
- After 365 days, move to Cold Archive (saves ~90%)
- After 730 days (2 years), delete entirely
Pattern 2: Clean up incomplete multipart uploads
Incomplete multipart uploads consume storage but are invisible to ls. They accumulate silently. Always add this rule:
| |
Pattern 3: Delete old versions
If versioning is enabled, non-current versions pile up. Prune them:
| |
Apply lifecycle rules via CLI#
| |
Cost impact#
Here is a real example from a production bucket I manage. 2 TB of log data, growing ~50 GB/month:
| Strategy | Monthly cost | Annual cost |
|---|---|---|
| All Standard, no lifecycle | ~$40 \| ~$ 480 | |
| Lifecycle: IA at 30d, Archive at 90d | ~$18 \| ~$ 216 | |
| Lifecycle: IA at 30d, Archive at 90d, delete at 365d | ~$14 \| ~$ 168 |
That is a 65% reduction by adding a single JSON file. Multiply by 20 buckets across an organization and you are saving thousands of dollars a year for ten minutes of work.
Cross-Region Replication (CRR)#
Cross-Region Replication asynchronously copies objects from a source bucket to a destination bucket in a different region. Two use cases:

- Disaster recovery — If cn-beijing has a regional outage, your data exists in cn-shanghai
- Compliance — Regulatory requirements to store copies in specific geographic locations
Setting up CRR#
| |
Via the SDK:
| |
CRR details#
| Aspect | Details |
|---|---|
| Replication lag | Usually < 10 minutes for most objects, can be longer for large objects |
| What is replicated | Object data, metadata, ACL (optionally) |
| What is NOT replicated | Lifecycle transitions, bucket policies, server-side encryption settings |
| Cost | You pay for storage in the destination + data transfer between regions |
| Direction | One-way by default. For bidirectional, set up two rules. |
| Delete replication | Optional. You can choose whether deletes propagate. |
A warning: CRR is eventual consistency with no SLA on replication time. Do not use it as a real-time sync mechanism. If you need synchronous cross-region access, look at CEN + multi-region deployment instead.
CDN Integration#
Alibaba Cloud CDN + OSS is one of the most common production patterns. CDN edge nodes cache your OSS objects close to users, reducing latency from hundreds of milliseconds to single digits. The origin (your OSS bucket) only gets hit on cache misses.

Why CDN + OSS instead of just OSS?#
| Factor | OSS direct | CDN + OSS |
|---|---|---|
| Latency | 50-200ms (varies by user location) | 5-30ms (from nearest edge) |
| Cost per GB transfer | ~0.12/GB (internet) | ~0.04/GB (CDN is cheaper for high volume) |
| DDoS protection | Basic | Built-in at the CDN edge |
| HTTPS | Supported | Free certificate via CDN |
| Cache control | None | Configurable TTL, cache purge API |
| Custom domain | Supported but no free HTTPS | Full custom domain + free HTTPS |
For any bucket serving user-facing content (images, CSS, JS, videos, downloads), CDN is strictly better. The only case where you would not use CDN is for private, API-only access (e.g., backend services reading files programmatically).
Complete CDN + OSS setup#
Step 1: Add a CDN domain#
| |
Step 2: Configure CNAME DNS#
After adding the CDN domain, Alibaba Cloud gives you a CNAME value like cdn.example.com.w.kunlunsl.com. Add a CNAME record in your DNS:
| |
Step 3: Enable HTTPS with a free certificate#
| |
Alibaba Cloud CDN provides free DV (Domain Validated) certificates. They auto-renew. For production, you can upload your own certificate or use Certificate Management Service.
Step 4: Set cache rules#
| |
Step 5: Configure back-to-origin#
OSS as CDN origin works automatically, but configure these optimizations:
| |
This lets CDN access a private bucket without making the bucket public. CDN authenticates to OSS using an internal authorization mechanism. Your bucket stays private, but CDN can fetch objects on cache misses.
Step 6: Verify the setup#
| |
Cache purge#
When you update a file in OSS but CDN still serves the old version:
| |
Image Processing (IMM)#
OSS has built-in image processing that transforms images on the fly via URL parameters. No separate service, no pre-processing pipeline — just append query parameters to the object URL.

Basic transformations#
| |
Watermarking#
| |
Image info#
| |
Using image processing with CDN#
When you access https://cdn.example.com/images/photo.jpg?x-oss-process=image/resize,w_800/format,webp, CDN caches the processed version. Subsequent requests for the same transformation hit the CDN cache, not OSS. This means you get on-the-fly processing with CDN-speed delivery.
The processed images are cached separately from the originals — the full URL including query parameters is the cache key. So photo.jpg, photo.jpg?x-oss-process=image/resize,w_800, and photo.jpg?x-oss-process=image/resize,w_400 are three separate cache entries.
Solution: Media Storage Backend#
Let us put everything together. We will build a complete media storage backend: OSS bucket with lifecycle rules, CDN with a custom domain, and a Python Flask API that generates presigned upload URLs and serves images through CDN with processing.
Step 1: Create and configure the bucket#
| |
Step 2: Apply lifecycle rules#
| |
Step 3: Set up CDN#
| |
Step 4: The Flask API#
| |
Step 5: Test the complete flow#
| |
Architecture summary#
| |
The beauty of this architecture: your application server handles zero file I/O. Upload bytes flow directly from the browser to OSS. Download bytes flow from CDN edge nodes. Your Flask API is just a coordinator that generates signed URLs and constructs CDN paths. It stays lightweight, easy to scale, and cheap to run.
Summary#
OSS is not a filesystem. It is a flat key-value store accessed over HTTP. Do not try to use it like a mounted disk. Do not store millions of tiny files where NAS or a database would be better. Use it for what it excels at: storing blobs of any size with extreme durability, served over HTTP.
Start private, loosen carefully. Every bucket should be private by default. Use signed URLs for temporary access, STS tokens for client uploads, and CDN with origin access for public content. The public-read-write ACL should never appear in your infrastructure.
Lifecycle rules are free money. Set them on every bucket. Even a simple “transition to IA after 30 days” rule saves 40% on data you are not actively reading. The rule costs nothing to configure and runs automatically.
Use the internal endpoint. When your ECS instances and OSS bucket are in the same region, use oss-{region}-internal.aliyuncs.com. Data transfer over the internal network is free. Over the public endpoint, you pay ~$0.12/GB. This adds up fast.
CDN is not optional for user-facing content. The combination of lower latency, lower cost, and built-in DDoS protection makes CDN + OSS strictly better than OSS alone for any public content. The setup takes 15 minutes.
Presigned URLs keep your server thin. Never proxy file uploads or downloads through your application server. Generate presigned URLs and let the client talk directly to OSS (or CDN). Your server handles metadata and authorization, not bytes.
For using OSS with infrastructure-as-code, see Terraform Part 5: Storage . We will use OSS as the backing store for our ML models in Part 11: PAI .
What’s Next#
Storage is where your data lives. With OSS configured — buckets, lifecycle rules, access control, CDN, and image processing in place — we have the persistence layer sorted. In the next article, we move to managed databases: RDS for relational data, Redis for caching, and the replication, backup, and failover strategies that keep your data alive when hardware inevitably fails.
Alibaba Cloud Full Stack 12 parts
- 01 Alibaba Cloud Full Stack (1): The Ecosystem Map — What Alibaba Cloud Actually Is
- 02 Alibaba Cloud Full Stack (2): ECS — Compute That Actually Makes Sense
- 03 Alibaba Cloud Full Stack (3): VPC, SLB, and the Network Layer
- 04 Alibaba Cloud Full Stack (4): OSS — Object Storage Done Right you are here
- 05 Alibaba Cloud Full Stack (5): RDS and PolarDB — The Database Layer
- 06 Alibaba Cloud Full Stack (6): RAM, KMS, and Cloud Security
- 07 Alibaba Cloud Full Stack (7): SLS, CloudMonitor, and Observability
- 08 Alibaba Cloud Full Stack (8): Serverless — Function Compute and EventBridge
- 09 Alibaba Cloud Full Stack (9): OpenSearch and AI Search
- 10 Alibaba Cloud Full Stack (10): Bailian and DashScope — The LLM Layer
- 11 Alibaba Cloud Full Stack (11): PAI — The ML Platform
- 12 Alibaba Cloud Full Stack (12): End-to-End — One Terraform Apply for Everything