Cgar — guard file downloads with reCaptcha

bettercallshao
bettercallshao
Cgar — guard file downloads with reCaptcha

Cgar (reCaptcha Guarded Archive Retriever) is a simple service that retrieves files from blob stores (e.g. S3) to serve to browsers once a reCaptcha is passed. On an initial GET request for a certain file, an HTML page with reCaptcha is shown. If passed, the file is served as a download attachment on a POST request from the Go button. Find it on github at https://github.com/bettercallshao/cgar.

The motivation came from my job hunt. I want my resume to be publicly accessible by recruiters, but since it has my phone number on it, I want to protect it from web crawlers. I made the first version on a VM on Digital Ocean as part of a larger deployment with the file stored on the file system. When I migrated to Kubernetes, I also modernized it to be more cloud native.

Configuration

A single YAML file is used to configure the behavior of cgar.

RECAPTCHA_SITE_KEY: "..."
RECAPTCHA_SECRET_KEY: "..."
downloads:
  - id: "example"
    bucket: "..."
    key: "..."
    accessKeyId: "..."
    secretAccessKey: "..."
    endpoint: "..."

Sign up for reCaptcha from http://www.google.com/recaptcha/admin. Cgar only supports V2 at the moment. Then RECAPTCHA_SITE_KEY and RECAPTCHA_SECRET_KEY should be available.

Upload the target file to a S3 compatible blob store. Only S3 protocol is supported at the moment, while most cloud providers offer S3 compatibility. Create a new item in downloads section of the config file, and fill in bucket, key (i.e. path in bucket), assessKeyId, secretAccessKey, and endpoint with target file information. Then name the target by assigning a unique value to the id field. Lets assume we name it example.

Deploy locally with docker-compose

Clone the repo, create the config file as ./secret/config.yaml . Run docker-compose up, which will build the docker image and run the service. A restart is required for reconfiguration. Two endpoints should be available:

  1. localhost:8080/ready: The readiness check, which is always available.
  2. localhost:8080/download/example: The download page for example, which should produce a download attachment after the reCaptcha is passed, given the configuration is correct.

Deploy to Kubernetes

Optionally create a new namespace.

kubectl create ns cgar

Create a config map called cgar-config with config.yaml as the key and our config file as the value.

kubectl -n cgar create cm cgar-config --from-file=config.yaml=./secret/config.yaml

Create specification cgar.yaml for the cgar deployment and service.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cgar
spec:
  selector:
    matchLabels:
      app: cgar
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: cgar
    spec:
      containers:
      - name: cgar
        image: bettercallshao/cgar:latest
        volumeMounts:
        - name: secret
          mountPath: "/secret"
        env:
        - name: CONFIG_PATH
          value: "/secret/config.yaml"
        - name: PORT
          value: "80"
        ports:
        - containerPort: 80
        readinessProbe:
          httpGet:
            path: /ready
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /ready
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 60
      volumes:
      - name: secret
        configMap:
          name: cgar-config
---
apiVersion: v1
kind: Service
metadata:
  name: cgar-service
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  selector:
    app: cgar

There are a few tricks here.

  1. Prebuilt images are available on dockerhub at bettercallshao/cgar:latest.
  2. The config map cgar-config is treated as a volume and mounted in the container as /secret. The config could have been a k8s secret as well instead of a config map.
  3. The CONFIG_PATH environment variable is set to /secret/config.yaml to point to the config.yaml we mounted from the config map.
  4. The PORT environment variable is set to 80, which is optional.
  5. Probes are set up to monitor the /ready endpoint for health. The service will crash if the blob config is incorrect, it relies on k8s to restart it.

Create the specified resources.

kubectl -n cgar create -f cgar.yaml

In practice for bettercallshao.com, I use minio in conjunction. To see how I deployed minio and other things, refer to the (less well documented) repo https://github.com/bettercallshao/bcs-lke.