The symptom: Ollama loves Hugging Face until the download does not
Local large language models are having a long moment, and Ollama remains one of the friendliest on-ramps: a single binary, a predictable CLI, and a growing library of models you can run on modest hardware. The rough edge shows up when the friendly CLI suddenly prints transport errors, context deadlines, or endless retry loops while pulling weights from the Hugging Face Hub. Sometimes the failure is upstream; often it is not. The user story we hear most often sounds like this: chatty web browsing works, Git operations feel fine, but ollama pull against a specific tag crawls to zero bytes per second and eventually gives up. That pattern is the fingerprint of a split routing problem, not a mystery bug inside Ollama itself.
This article is intentionally narrow. It is not a review of model quality, quantization trade-offs, or GPU drivers. It is a networking field guide for people who already decided to self-host and now need their proxy configuration to respect how modern artifact distribution actually works: short JSON and metadata requests on one set of hostnames, very long HTTPS sessions and redirects on another, and occasional jumps to container registries such as ghcr.io when vendors ship models as OCI layers. If you have never touched YAML rules before, skim the YAML and Fake-IP overview first, because everything here assumes you remember that the first matching Clash rule wins.
Why “the internet is fine” is the wrong mental model
Throughput to a nearby speed-test server does not predict success for a multi-gigabyte file hosted behind a chain of DNS names, HTTP redirects, and CDN edges. Ollama and similar clients are closer to docker pull than to loading a news article: they open parallel connections, resume partial content, and remain sensitive to mid-stream path changes. A Clash profile that feels perfect for a browser can still starve a CLI daemon if one leg of the chain hits a GEOIP shortcut that sends a blob hostname DIRECT through a congested default route while the manifest hop used a different outbound entirely.
Another common trap is comparing Apples to oranges across processes. Your browser may honor the operating system proxy because the vendor wired it that way. Ollama, language runtimes, and background daemons often do not. You can spend an hour tuning groups in a beautiful dashboard while the process you care about never touches the tun interface. Part of this guide is therefore about ownership: who captures the socket, which resolver answered the name, and which rule row matched the Server Name Indication on the wire.
Quick orientation. If you already solved registry pulls with Clash, read the Docker registry split guide in parallel. The philosophy matches: separate policy buckets, explicit domains, ordered rules, verify in logs. Ollama simply swaps the client story from containerd to a local inference runtime.
What a Hugging Face model pull really opens
From the outside, a model name looks like a tidy string. Under the hood, the client resolves hostnames under the huggingface.co and hf.co families, negotiates TLS, may follow redirects, and then spends most of the wall-clock time talking to whatever infrastructure actually serves the large files. Those heavy flows are not always the same label as the pretty URL in your terminal history. Some regions see different edge domains over time. Vendors also experiment with alternate delivery paths, mirrors, and partnership CDNs.
You do not need to memorize every hostname forever; you need a process. When a pull fails, capture the live destinations from your Clash connection table or packet trace, compare them to your rule list, and promote any repeat offender to an explicit row above broad geography heuristics. Static tutorials go stale; dynamic observation stays accurate. The goal is not a mythical perfect ruleset on day one but a configuration you can extend without restructuring the entire profile every time a CDN label rotates.
When the artifact is an OCI image on ghcr.io
Some projects distribute weights as container layers on GitHub Container Registry. The client story then overlaps with general registry pulls: authentication endpoints, manifest fetches, and blob downloads may fan out across more than one DNS suffix. If your Ollama flow or companion tooling touches ghcr.io, reuse the same discipline we describe for Hub traffic rather than treating it as “just GitHub.” For a deeper registry-oriented walkthrough, see the registry split article; for raw rule-set fetch issues that sometimes accompany automation, see the GitHub raw fetch guide.
The GEOIP trap that looks innocent
Community profiles love aggressive lines such as GEOIP,CN,DIRECT or similar shortcuts because they improve average browsing latency. They are also exactly the sort of rule that swallows a CDN edge IP if the resolved address lands in a geography bucket you did not intend. The manifest request might have matched a careful DOMAIN-SUFFIX,huggingface.co row, while a subsequent blob connection resolves to a different property and falls through to GEOIP first. The download then appears to “hang after it started,” which is maddening to debug if you only watch the first hop.
The fix is not moral panic about GEOIP. The fix is precedence. Treat Hugging Face–related suffixes and any companion download hosts you observe as part of a single logical application funnel, the same way streaming guides pin video CDNs ahead of geography buckets. Put those rows above the blunt shortcuts, document why they exist, and revalidate after major profile merges. If you run mihomo or Clash Meta with remote RULE-SET providers, remember that stale or failed provider updates silently revert you to yesterday’s internet; occasional log checks save weekends.
CDNs change. Avoid copy-pasting enormous host lists from forum posts without attribution and testing. Prefer suffix rules for organizational boundaries you trust, augment with observations from your own connection logs, and treat one-off IP rules as temporary scaffolding.
Design policy groups you can reason about at 2 a.m.
Name at least two outbound buckets for this workload, even if you sometimes point both to the same node day to day:
- HF metadata / API: a group for interactive, smaller requests where handshake latency matters.
- HF blobs / CDN: a group for long downloads where throughput and stability beat millisecond ICMP games.
Optional third bucket: OCI registry traffic to ghcr.io or other vendor registries if your stack mixes formats. Keeping the buckets separate lets you send blobs through a datacenter-class relay while leaving API calls on a node that optimizes TLS setup, or vice versa, without duplicating entire profiles.
Populate groups with select for manual control or url-test for automatic selection, but be cautious with aggressive health checks on huge downloads. Flapping exits mid-stream creates the same partial transfer pain you were trying to avoid. For local experimentation, many readers prefer a stable, slightly slower node over a “fastest” label that changes every few minutes.
Build a rule funnel, not a junk drawer
Your rules: section should read top-down like a decision tree. Place DOMAIN-SUFFIX entries for huggingface.co and hf.co in the section of the funnel dedicated to this application, then add any additional hostnames you actually see during failing pulls. If you intentionally use a domestic mirror or cache, give that mirror its own explicit DIRECT path rather than hoping GEOIP agrees with your intent. If you need everything Hub-related to exit through a particular international relay, say so with a policy reference, not with a vague global mode.
Remember RFC1918 and corporate VPN paths: if a redirect ever points to an internal artifact host, you want that name to hit DIRECT or a corporate outbound before a catch-all tunnel swallows it. The documentation hub links broader references, but the operational mantra here remains simple: match names the client uses, not names you wish it used.
How Ollama actually reaches the proxy
Ollama runs as a background service on many platforms. Whether it respects HTTP_PROXY and HTTPS_PROXY depends on how you launched it and how your operating system injects environment variables into services. On some installs, setting proxies in your interactive shell affects only that shell. On others, TUN mode transparently captures all TCP once routing is correct, which is attractive until DNS leak or bypass lists send lookups around the tunnel.
Work through this concrete sequence when things disagree between browser and CLI:
- Confirm in the Clash connection panel that flows from the Ollama process appear at all.
- If they do not, fix route ownership first: mixed port versus TUN, service user versus login user, competing VPN adapters.
- If they do, verify which policy matched each destination and whether two related hostnames split across different outbounds.
On Windows in particular, overlapping virtual adapters cause exactly the sort of “only this app is broken” frustration this series tries to prevent. When TUN misbehaves, our Windows TUN troubleshooting article walks through firewall and routing order without hand-waving.
DNS, Fake-IP, and why names must stay coherent
Clash deployments that enable Fake-IP aim to simplify rule matching by ensuring the client sees controlled addresses while the proxy performs real resolution. That elegance breaks if a secondary resolver bypasses the core or if a critical hostname lands in fake-ip-filter accidentally. For long downloads, incoherence shows up as TLS errors, sudden drops, or policy flips halfway through a file.
When debugging, temporarily reduce the moving parts: one upstream resolver you trust, one clear profile, reproduce a minimal failing model, capture the hostname list, then reintroduce complexity with comments that justify each exception. Afterward, revisit the Fake-IP guide to ensure your refinements still match how your core version handles DNS over TCP, DoH, or fallback chains.
Timeouts, retries, and the psychology of partial progress
Large model files reward patience but punish ambiguous failure modes. Some clients resume well; others restart from scratch after specific errors. From a networking perspective, repeated timeouts often mean intermittent packet loss on a chosen path, an exit that rate-limits parallel connections, or middleboxes that dislike long idle periods on tunneled TCP. Splitting metadata and blobs across outbounds does not magically fix packet loss, but it does remove an entire class of self-inflicted failures where one leg is pinned DIRECT through a bad route while you believed everything was proxied.
If you suspect the exit rather than your rules, temporarily force both HF groups through a different node and reproduce. If behavior changes instantly, you were staring at congestion or peering drama, not DNS. If behavior is identical, return to precedence and daemon capture because the odds are high that part of the chain still bypasses Clash.
Verification beats bandwidth bragging rights
After you edit YAML, resist the temptation to declare victory based on ping alone. Instead, run a real ollama pull for a modest test model while watching live connections. You are looking for three signals:
- Every distinct hostname involved maps to an outbound you can explain.
- No surprise
DIRECThops appear for hosts that belong in your HF funnel. - Logs stay stable across the entire transfer, not only the first megabyte.
Keep a scratch note of the hostnames you observe. Over a few weeks that list becomes your personal, battle-tested appendix—far more valuable than a static table in a blog post that slowly drifts out of date.
Symptom map: what you see versus what to inspect
| What you see | Inspect first |
|---|---|
| Metadata succeeds; huge file never finishes | Whether blob hostnames share the same outbound; stray GEOIP rows interleaved in the funnel |
| Works in browser for hf.co links; CLI pull fails | Whether the daemon inherits proxy env vars or is captured by TUN; service user differences |
| Flaky only on office Wi-Fi | Captive portals, HTTP proxy auto-config, DNS hijacks that bypass Clash |
| Breaks right after merging a community profile | Reordered rules, renamed groups, stale RULE-SET fetch errors |
Frequently asked questions
Should I disable GEOIP entirely for Ollama? Rarely. Most teams only need to move specific application funnels above GEOIP shortcuts. Deleting geography heuristics often makes unrelated traffic worse.
Is a mirror always faster than an offshore exit? Not automatically. Mirrors can be brilliant when well-provisioned and close to you. They can also be incomplete or overloaded. Clash shines when you can switch policies without rewriting your entire stack.
Does this advice apply to other local LLM runtimes? The transport story generalizes: any tool that downloads large weights over HTTPS benefits from explicit domain routing, ordered rules, and daemon-level proxy verification. The CLI details differ; the precedence lessons do not.
A brief note on responsibility
Proxy software is neutral; provider terms, export controls, regional regulations, and corporate acceptable-use policies are not. This article explains routing mechanics for engineers who legitimately manage cross-border development environments. It does not encourage bypassing authentication, evading billing, or misrepresenting residency where contracts forbid it.
Closing: make model downloads as intentional as model choice
Choosing to run a local LLM is already a statement about control: you accept hardware limits in exchange for ownership of weights and prompts. Your network layer should honor that same intention. Splitting Hugging Face hub traffic from CDN-backed model download legs, aligning DNS with Clash rules, and verifying daemon capture closes the gap between “the model exists upstream” and “the model actually landed on disk.”
Compared with opaque tools that hide routing decisions behind a single glowing toggle, Clash-family cores reward you with connection logs that still make sense when the next CDN migration lands. Many all-in-one utilities either hard-code assumptions about regions or bury split logic so deep that a single timeout turns into mystical advice on forum threads. Clash stays explicit: names, order, outbound, proof. When you want a maintained client with a readable connection table layered on those ideas, start from our download page so binaries and documentation stay aligned. If you are tired of guessing which leg of an Ollama pull escaped your rules, download Clash for free and pair it with the verification habits above—you will spend less time staring at stalled progress bars and more time actually running models.