Running Your Own IPFS Gateway

ipfs kubo nginx caddy self-hosting homelab

In Hello, IPFS I mentioned, almost in passing, that this site is “pinned on my own Kubo gateway node.” That post was the what: flat files, content-addressed, resolved through ENS. This one is the how — the node that actually serves and pins the bytes.

A personal site that doesn’t depend on a single host still needs a host to pin and serve it. Mine runs on my own hardware behind a reverse proxy, and the config below is the real shape of it (with my internal names and addresses swapped out). The lessons are the interesting part anyway, and the best one cost me an afternoon — and then, later, a second look that changed the whole answer.

The chain: Caddy → Kubo

Two pieces, each doing one job:

internet/LAN ──> firewall ──> Caddy ──> Kubo
                 (LAN only)   (TLS +     (IPFS host)
                              full-duplex  API :5001
                               proxy)      gateway :8080
  • A firewall sits at the perimeter. The gateway is a LAN-only service, so inbound traffic to it from the internet is dropped at the edge — none of what follows is reachable from outside the network in the first place.
  • Internal DNS is what makes the names work. The gateway hostname (and the per-CID origins below) resolve to the proxy only on the LAN — queries never leave the network, and there are no public DNS records pointing at any of this. It’s also where .eth resolution hooks in, which I’ll get to.
  • Caddy sits at the edge and does everything in the middle: it terminates TLS (a wildcard certificate from Let’s Encrypt via a DNS-01 challenge, so every subdomain — the gateway and the per-CID origins below — is covered by one cert), guards the whole thing to the LAN, and reverse-proxies straight to Kubo.
  • Kubo — the reference IPFS implementation — runs on a separate host. This is where the DAG actually lives.

It wasn’t always two pieces. I started with a second proxy — nginx — wedged between Caddy and Kubo, doing routing and access control. Ripping it back out is the whole point of the war story below; it turned out to be the cause of the bug, not a layer of safety. So the diagram above is the after. Let me earn it.

Running Kubo

The node itself is a dozen lines of Docker Compose:

services:
  ipfs:
    image: ipfs/kubo:latest
    container_name: ipfs
    restart: unless-stopped
    environment:
      IPFS_PROFILE: server,pebbleds
    ports:
      - 4001:4001/tcp       # swarm (TCP)
      - 4001:4001/udp       # swarm (UDP/QUIC)
      - 5001:5001           # API
      - 8080:8080           # gateway
    volumes:
      - ipfs_data:/data/ipfs

volumes:
  ipfs_data: {}

A few things worth pointing at:

  • IPFS_PROFILE: server,pebbleds. The server profile disables local network discovery (MDNS) and NAT port-mapping — correct for a box in a rack that isn’t trying to find peers on the LAN. pebbleds switches the datastore to PebbleDB.
  • 4001 is the swarm port (how the node talks to other IPFS peers), exposed for both TCP and QUIC. 5001 is the admin API and 8080 the gateway — both reachable only from the LAN, never the internet (see below).

Compose gets the daemon running; the configuration is applied separately and declaratively, so a rebuild always lands in the same state. The interesting bits:

# Open CORS on the API and gateway (the API is locked down at the network
# layer instead — see the security section).
ipfs config --json API.HTTPHeaders.Access-Control-Allow-Origin '["*"]'
ipfs config --json Gateway.HTTPHeaders.Access-Control-Allow-Origin '["*"]'

# Subdomain gateway for the public hostname.
ipfs config --json Gateway.PublicGateways '{
  "gateway.example.com": {
    "UseSubdomains": true,
    "Paths": ["/ipfs", "/ipns"],
    "NoDNSLink": false
  }
}'

# Faster provider lookups.
ipfs config --json Routing.AcceleratedDHTClient true

# Let the node resolve .eth names itself, via a DoH endpoint.
ipfs config --json DNS.Resolvers '{"eth.": "https://your-ens-resolver.example/dns-query"}'

That last one is my favorite, and it deserves more than a passing mention.

ENS names live on Ethereum, not in DNS. A .eth name has no authoritative nameserver, so a stock resolver has no idea what to do with mysticryuujin.eth — it’ll just NXDOMAIN. To bridge that gap I run my own DNS-over-HTTPS resolver for the eth. zone, backed by an Ethereum node. When a query for a .eth name arrives, it reads that name straight out of the ENS registry on-chain — the name’s resolver contract, its contenthash, any DNS records it publishes — and answers as if it were serving an ordinary zone. (This is the same bridge the public eth.limo service provides; I just run my own so the lookups never leave the network.)

Two places consume it. Kubo’s DNS.Resolvers points the eth. zone at that endpoint, so the node resolves ENS itself — ipfs name resolve /ipns/mysticryuujin.eth works directly, no public gateway in the loop. And the internal DNS resolver forwards the eth. zone to the same place, so every machine on the LAN can browse .eth names natively. The whole stack speaks Ethereum naming without anything special in the application layer.

Routing: one host, three jobs

The proxy has to keep three things straight, and the split matters for both correctness and security:

# Apex: the admin API on /api, the path-style gateway on everything else.
gateway.example.com {
    handle /api/* {
        reverse_proxy ipfs-host:5001 {
            flush_interval -1      # stream full-duplex — see the war story
        }
    }
    handle {
        reverse_proxy ipfs-host:8080
    }
}

# Per-CID / per-IPNS subdomain origins — origin isolation.
*.ipfs.gateway.example.com,
*.ipns.gateway.example.com {
    reverse_proxy ipfs-host:8080
}

The apex host exposes two things: the admin API at /api:5001, and the path-style gateway at /:8080.

The wildcard hosts match <anything>.ipfs.gateway.example.com (and the .ipns variant) and forward everything to the gateway. This is origin isolation: each CID gets served from its own subdomain origin, so the browser’s same-origin policy keeps one piece of content from poking at another’s localStorage, cookies, or service workers. It’s the whole reason subdomain gateways exist.

Why is /api only on the apex? Because if a wildcard origin also exposed it, then https://<cid>.ipfs.gateway.example.com/api would hit the Kubo admin API instead of serving that CID’s /api path. You’d be one URL away from letting arbitrary content reach your node’s control plane. Keep the admin API on exactly one name.

End to end, a path request flows like this: you hit gateway.example.com/ipfs/<cid>, Kubo 301-redirects you to <cid>.ipfs.gateway.example.com, the wildcard cert already covers that name, the proxy forwards it with the original Host header intact (Caddy preserves it by default), and Kubo reads the CID straight out of the hostname.

That flush_interval -1 on the /api route looks like a throwaway line. It’s the whole story.

The war story: why bulk uploads deadlocked

Now the part that cost me real time — twice.

When I first stood this up, deploys hung. Adding a single file through the HTTPS endpoint worked fine. Running ipfs add -r against the whole built dist/ directory — a few hundred files — would just sit there forever.

The first fix was a band-aid: route the bulk upload around the proxy entirely and hit the Kubo API directly on the LAN. The deploy script still carries the escape hatch:

# IPFS_ADD_API optionally overrides the endpoint for the bulk upload (the
# Kubo add API is bidirectionally streaming and needs a full-duplex proxy).
IPFS_ADD_API="${IPFS_ADD_API:-$IPFS_API}"
CID="$(command ipfs --api "$IPFS_ADD_API" add -r -Q --cid-version 1 --pin dist)"

Point IPFS_ADD_API straight at the box on the LAN and the add goes through — no proxy, no problem. That got deploys working, but it bugged me: why did HTTPS break, and only for bulk adds?

The answer I thought I had

Back then I had nginx sitting between Caddy and Kubo, and nginx’s defaults look exactly wrong for a streaming API. Two of them:

client_max_body_size defaults to 1m — upload more than a megabyte and nginx rejects it. And response buffering is on by default. The Kubo add API is bidirectionally streaming: as the client uploads the multipart body, the server streams back per-file progress events. So I set the obvious knobs in the /api location:

location /api {
  proxy_http_version 1.1;
  proxy_buffering off;          # don't buffer the streamed response
  proxy_request_buffering off;  # don't buffer the streamed upload
  client_max_body_size 0;       # no upload size cap
  proxy_read_timeout 600;
  proxy_pass http://ipfs-host:5001;
}

Bulk adds went through. I wrote it up as solved, moved on, and felt clever.

The answer that was actually true

Months later, on a workstation with a different dist/ — a bigger one — the deploys started hanging again. Same symptom, “fixed” config. That’s the moment a band-aid you’ve forgotten about turns into a mystery.

The buffering directives were never the real fix. The real problem is that nginx is half-duplex by design. Once the upstream sends its first byte of response, nginx stops reading the request body — it treats the response starting as license to quit draining the upload (RFC 7230 §6.5, applied to all responses; it’s nginx trac #1293, filed and closed wontfix). No proxy_* directive changes this. Buffering off or on, the request and response simply cannot flow at the same time through nginx.

That is precisely the one thing the Kubo add API requires. It emits a progress event the instant it starts ingesting — which is nginx’s cue to stop reading my upload. The kernel socket buffers absorb the next few megabytes, so for a while both sides look alive; then the buffers fill, the upload blocks on a write that will never drain, the server blocks waiting for more body, and the whole exchange sits in a mutual stare until a timeout.

Which finally explained everything, including why I’d been fooled:

  • Single files worked — tiny upload, finishes before the buffers matter.
  • proxy_buffering off “fixed” it once — not because it fixed anything, but because that day’s dist/ happened to fit inside the socket buffers. It was never the directive; it was the size.
  • A bigger dist/ brought it right back — the build crossed the buffer ceiling and the half-duplex deadlock surfaced again.

You can’t configure your way out of a protocol-level limitation. nginx is a phenomenal proxy for request→response traffic; the Kubo API is one of the rare cases that genuinely needs to read and write a single connection concurrently, and nginx won’t.

The real fix: delete the hop

So I deleted nginx. Caddy — already terminating TLS at the edge — does full bidirectional streaming, and it can talk to Kubo directly. Everything nginx was doing (Host preservation, the private-IP guard) Caddy already did natively, so the second proxy was pure liability: an extra hop whose only distinguishing behavior was the bug.

The single line that makes it work is back in the routing section above:

handle /api/* {
    reverse_proxy ipfs-host:5001 {
        flush_interval -1      # flush every write immediately; never buffer
    }
}

flush_interval -1 tells Caddy to flush each chunk the moment it arrives in either direction instead of batching it — full-duplex streaming, which is exactly what the add API wants. Bulk adds of any size now go straight through HTTPS, no escape hatch, no extra process.

If you take one thing from this post: the Kubo API needs a full-duplex proxy, and not every reverse proxy is one. The failure mode — works for small things, hangs for big ones, and “fixes” that hold only until your data grows — is built to waste your afternoon, and then a second afternoon when you’ve forgotten the first.

Don’t expose the API

The Kubo API at :5001 is full node control. Pin anything, unpin anything, read the config, rewrite the config, shut it down. And as you saw above, I run it with wide-open CORS. That’s fine — but only because the API is never reachable from the internet.

Lockdown is defense-in-depth. The firewall drops inbound internet traffic to the gateway before it reaches anything. And then the proxy refuses any request whose real client IP isn’t on the LAN:

(private_only) {
    @not_private not remote_ip 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 127.0.0.0/8
    handle @not_private {
        abort
    }
}

abort hard-closes the connection — no banner, no error page, nothing to probe. remote_ip is the real L3 source address, so unlike an X-Forwarded-For allowlist it can’t be spoofed with a header. Import that snippet into every gateway site block and the open CORS header is harmless: nothing off-LAN can complete a request in the first place.

(Collapsing nginx into Caddy helped here, by the way. Two proxies meant two independent access-control configs that had to agree; one proxy is one place to get it right, and one fewer process exposed.)

If you take two things from this post: never put a Kubo :5001 API anywhere the public internet can reach it.

A name that doesn’t change: IPNS

Content addressing has a catch. Every time the site changes — a new post, a fixed typo — the CID changes too. That’s by design: the address is a hash of the bytes. But it means the “address of my site” is a moving target, and ENS records live on-chain. If my CID changed on every deploy and ENS pointed straight at it, every deploy would be an Ethereum transaction with real gas. That’s absurd for fixing a typo.

IPNS is the fix. An IPNS name is the hash of a public key, and you publish a signed record that points that name at a CID. Republish whenever you like to point it somewhere new. The name is permanent; what it resolves to is mutable — exactly the indirection a mutable site on an immutable filesystem needs.

The key is generated once, on the node:

ipfs key gen --type=ed25519 mysticryuujin
# k51qzi5uqu5d...   <- the IPNS name (a libp2p public-key CID)

That k51... string is the name I publish under, forever. It lives on the node and is worth backing up: lose the private key and you lose the ability to ever update that name again.

Then every deploy ends by signing a fresh record under that same key:

ipfs name publish --key=mysticryuujin --lifetime=72h --ttl=1m "/ipfs/$CID"
  • --key selects the keypair from above, so every deploy republishes the same name — only the CID it targets changes.
  • --lifetime=72h is how long the signed record stays valid before it expires out of the DHT. As long as I republish well inside that window (every deploy does), the name never goes dark.
  • --ttl=1m is a caching hint — how long resolvers may cache the answer. Short, so a fresh deploy shows up quickly instead of being pinned to a stale CID by someone’s cache.

And here’s the payoff: ENS only has to be told about this once. The site’s ENS contenthash is set to ipns://k51... — the name, not a CID. That single on-chain transaction is the only gas I ever pay for content. After it, the deploy pipeline is just add + name publish, no chain involved, and the resolution chain reads end to end:

mysticryuujin.eth ──> ENS contenthash ──> ipns://k51… ──> latest CID ──> bytes
   (on-chain, set once)        (republished every deploy)   (on my node)

Closing the loop

This is the node my deploy script talks to. Everything in Hello, IPFSastro build → ipfs add --cid-version 1 --pin → ipns publish — runs against the gateway’s HTTPS API, the very endpoint described above. The last line of every deploy is a sanity check that the gateway already serves the fresh CID:

curl -fsS -o /dev/null -w '%{http_code}' "$IPFS_GATEWAY/ipfs/$CID/"

If that returns 200, the new build is live and pinned on my own hardware before it ever touches an external pinning service.

Everything above is my setup, with the internal names filed off. If you want to actually stand up a stack like this, I packaged the patterns into an open-source, educational reference: spirens (“Sovereign Portal for IPFS Resolution via Ethereum Naming Services”). Clone it, add a domain, fill in a .env, and spirens up brings up a reverse proxy, a Kubo gateway, a local-first Ethereum RPC, and — handling the .eth resolution I described above — dweb-proxy, the same ENS→IPFS bridge that powers eth.limo. It’s a learning on-ramp, not a turnkey production deploy: read every config before you run it.

The permanent web, served from a box in my house. 🌐

← All posts