Skip to content

Zip File Streaming Microservice - stream zip files on the fly

License

Notifications You must be signed in to change notification settings

scosman/zipstreamer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zipstreamer Logo

Build and Test Format and Vet Docker Generation Go Report Card Go Reference

ZipStreamer is a golang microservice for streaming zip files from a series of web links, on the fly. For example, if you have 200 files on S3, and you want to download a zip file of them, you can do so in 1 request to this server.

Highlights include:

  • Low memory: the files are streamed out to the client immediately
  • Low CPU: the default server doesn't compress files, only packages them into a zip, so there's minimal CPU load (configurable)
  • High concurrency: the two properties above allow a single small server to stream hundreds of large zips simultaneous
  • Easy to host: several deployment options, including Docker images and two one-click deployers
  • It includes a HTTP server, but can be used as a library (see zip_streamer.go)

Content

JSON Zip File Descriptor

Each HTTP endpoint requires a JSON description of the desired zip file. It includes a root object with the following structure:

  • suggestedFilename [Optional, string]: The filename to suggest in the "Save As" UI in browsers. Defaults to archive.zip if not provided or invalid. Limited to US-ASCII.
  • files [Required, array]: an array descibing the files to include in the zip file. Each array entry required 2 properties:
    • url [Required, string]: the public URL of the file to include in the zip. Zipstreamer will fetch this via a GET request. The file must be publically accessible via this URL; if you're files are private, most file hosts provide query string authentication options which work well with Zipstreamer (example AWS S3 Docs).
    • zipPath [Required, string]: the path and filename where this entry should appear in the resulting zip file. This is a relative path to the root of the zip file.

Example JSON description with 2 files:

{
  "suggestedFilename": "tps_reports.zip",
  "files": [
    {
      "url":"https://server.com/image1.jpg",
      "zipPath":"image1.jpg"
    },
    {
      "url":"https://server.com/image2.jpg",
      "zipPath":"in-a-sub-folder/image2.jpg"
    }
  ]
}

HTTP Endpoints

POST /download

This endpoint takes a http POST body containing the JSON zip file descriptor, and returns a zip file.

Example usage with curl

Example curl usage of POST /download endpoint

# download a sample json descriptor
curl https://gist.githubusercontent.com/scosman/f57a3561fed98caab2d0ae285a0d7251/raw/4a9630951373e50f467f41d8c7b9d440c13a14d2/zipJsonDescriptor.json > zipJsonDescriptor.json
# call POST /download endpoint, passing json descriptor in body
curl --data-binary "@./zipJsonDescriptor.json" http://localhost:4008/download > archive.zip

GET /download

This endpoint fetches a JSON zip file descriptor hosted on another server, and returns a zip file. This is useful over the POST /download endpoint for a few use cases:

  • You want to hide from the client where the original files are hosted (see zsid parameter)
  • Use cases where POST requests aren't easy to adopt (traditional static webpages)
  • You want to trigger a browsers' "Save File" UI, which isn't shown for POST requests. See POST /create_download_link for a client side alternitive to achieve this.

This endpoint requires one of two query parameters describing where to find the JSON zip file descriptor:

  • zsurl: the full URL to the JSON file describing the zip. Example: /download?zsurl=https://yourserver.com/path_to_descriptors/82a1b54cd20ab44a916bd76a5
  • zsid: must be used with the ZS_LISTFILE_URL_PREFIX environment variable. The JSON file will be fetched from ZS_LISTFILE_URL_PREFIX zsid. This allows you to hide the full URL path from clients, revealing only the end of the URL. Example: ZS_LISTFILE_URL_PREFIX = "https://yoursever.com/path_to_descriptors/" and /download?zsid=82a1b54cd20ab44a916bd76a5
Example usage with curl

Example curl usage of GET /download endpoint with zsurl parameter

curl -X GET "http://localhost:4008/download?zsurl=https://gist.githubusercontent.com/scosman/f57a3561fed98caab2d0ae285a0d7251/raw/4a9630951373e50f467f41d8c7b9d440c13a14d2/zipJsonDescriptor.json" > archive.zip

Example curl usage of GET /download endpoint with zsid parameter

# start server with ZS_LISTFILE_URL_PREFIX
ZS_LISTFILE_URL_PREFIX="https://gist.githubusercontent.com/scosman/" ./zipstreamer
# call `GET /download` endpoint with zsid
curl -X GET "http://localhost:4008/download?zsid=f57a3561fed98caab2d0ae285a0d7251/raw/4a9630951373e50f467f41d8c7b9d440c13a14d2/zipJsonDescriptor.json" > archive.zip

POST /create_download_link

This endpoint takes a http POST body containing the JSON zip file descriptor, stores it in a local cache, and returns a link ID which allows the caller to fetch the zip file via an additional call to GET /download_link/{link_id}.

This is useful for if you want to trigger a browser "Save File" UI, which isn't shown for POST requests. See GET /download for a server side alternative to achieve this.

Important:

  • These links only live for 60 seconds. They are expected to be used immediately.
  • This stores the link in an in-memory cache, so it's not suitable for deploying to a multi-server cluster without extra configuration. If you are hosting on a multi-server cluster, see the deployment section for configuration advice.

Here is an example response body containing the link ID. See docs for GET /download_link/{link_id} below for how to fetch this zip file:

{
  "status":"ok",
  "link_id":"b4ecfdb7-e0fa-4aca-ad87-cb2e4245c8dd"
}

Example usage: see GET /download_link/{link_id} documentation below.

GET /download_link/{link_id}

Call this endpoint with a link_id generated with /create_download_link to download that zip file.

Example usage with curl

Example curl usage of POST /create_download_link and GET /download_link/{link_id} endpoints working together

# download a sample json descriptor
curl https://gist.githubusercontent.com/scosman/f57a3561fed98caab2d0ae285a0d7251/raw/4a9630951373e50f467f41d8c7b9d440c13a14d2/zipJsonDescriptor.json > zipJsonDescriptor.json
# call POST endpoint to create link
curl --data-binary "@./zipJsonDescriptor.json" http://localhost:4008/create_download_link
# Call GET endpoint to download zip. Note: must copy UUID from output of above POST command into this URL
curl -X GET "http://localhost:4008/download_link/UUID_FROM_ABOVE" > archive.zip

Deploy

Heroku - One Click Deploy

Deploy

Be sure to enable session affinity if you're using multiple servers and using /create_download_link.

Google Cloud Run - One Click Deploy, Serverless

Run on Google Cloud

Cloud Run is ideal serverless environment for ZipStreamer, as it routes many requests to a single container instance. ZipStreamer is designed to handle many concurrent requests, and will be cheaper to run on this serverless architecture than a instance-per-request architecture like AWS Lamba or Google Cloud Functions.

Important

  • The one-click deploy button has a bug and may force you to set the optional environment variables. If the server isn't working, check ZS_URL_PREFIX is blank in the Cloud Run console.
  • Be sure to enable session affinity if you're using using /create_download_link. Cloud Run may scale up to multiple containers automatically.

Docker

This repo contains an dockerfile, and an image is published on Github Packages.

Build Your Own Image

To build your own image, clone the repo and run:

docker build --tag docker-zipstreamer .
# Start on port 8080
docker run --env PORT=8080 -p 8080:8080 docker-zipstreamer

Run Official Package from Github Packages

Official packages are published on Github packages. To pull latest stable release:

docker pull ghcr.io/scosman/packages/zipstreamer:stable
# Start on port 8080
docker run --env PORT=8080 -p 8080:8080 ghcr.io/scosman/packages/zipstreamer:stable

Note: stable pulls the latest github release. Use ghcr.io/scosman/packages/zipstreamer:latest for top of tree.

Configuration Options

These environment variables can be used to configure the server:

  • PORT - Defaults to 4008. Sets which port the HTTP server binds to.
  • ZS_URL_PREFIX - If set, the server will verify the url property of the files in the JSON zip file descriptors start with this prefix. Useful to preventing others from using your server to serve their files.
  • ZS_COMPRESSION - Defaults to no compression. It's not universally known, but zip files can be uncompressed, and used only to combining many files into one file. Set to DEFLATE to use zip deflate compression. WARNING - enabling compression uses CPU, and will reduce throughput of server. Note: for files with internal compression (JPEGs, MP4s, etc), zip DEFLATE compression will often increase the total zip file size.
  • ZS_LISTFILE_URL_PREFIX - See documentation for GET /download

Why

I was mentoring at a "Teens Learning Code" class, but we had too many mentors, so I had some downtime.

Logo

Zipper portion of logo by Kokota from Noun Project (Creative Commons CCBY)