Run Qwen 3.6 in Claude Code for $1.50/Hour — RunPod + VS Code Remote Setup (2026)

The Problem with Running Qwen 3.6 Locally
Qwen 3.6-35B-A3B is the best open-source coding model available right now. On SWE-bench Verified it scores 73.4% — beating Qwen 3.5, beating Gemma 4, and rivaling Claude Sonnet 4.5 on agentic coding tasks. Apache 2.0 license. 262K token context. Natively multimodal.
The catch: to run it locally, you need a GPU with 40GB+ of VRAM. That's a $2000+ investment. The "just run a 4-bit quantized version on your laptop" answer doesn't work either — the quality drops enough that you're better off with free Claude.
There's a fourth option almost nobody talks about: rent a dedicated GPU by the hour. That's what this guide covers. We'll use RunPod for the GPU, VS Code Remote-SSH to connect from any laptop, and Claude Code to drive Qwen 3.6. Total cost: roughly $1.50 per hour. You control exactly when the GPU is on.
Why Qwen 3.6 Is Worth This Setup
- SWE-bench Verified: 73.4% — best open-source coding model available
- Mixture of Experts — 35B total parameters, 3B active per token (fast)
- Apache 2.0 license — fully open weights, commercial use allowed
- 262K token context — works with large codebases
- Natively multimodal — reads images, documents, video
Ordinarily this level of coding capability costs $20/month through Anthropic. Here it's under $2/hour, and only when you turn it on.
Why RunPod + VS Code Remote-SSH
Most "rent a GPU" tutorials have you SSH in through a plain terminal and work from there. That's fine for quick tests but painful for real work.
VS Code's Remote-SSH extension gives you a single window containing everything: integrated terminal, file explorer, and live browser preview — all running against the remote pod. Live Server auto-forwards preview ports over the SSH tunnel, so localhost:xxxx in your Mac browser opens a page served from the GPU server. Zero tunnel configuration.
Step 1: Deploy a RunPod Pod
Head to runpod.io, sign in, and click Deploy. For Qwen 3.6 you need a GPU with at least 40GB of VRAM — the model is ~24GB and you need headroom for context.
Tested configurations:
- A100 SXM 80GB — ~$1.49/hr on-demand, ~$1.22/hr community. Confirmed fast (<30s per prompt).
- L40S 48GB — ~$0.79/hr. Budget option. Verify availability in your region.
- RTX 4090 24GB — do NOT use. Model loads but inference falls back to CPU, 5-6 minutes per prompt.
Before hitting Deploy, click Edit and increase the container disk to 50GB — the model is ~24GB and you need room for dependencies. Then click Set Overrides and Deploy On-Demand.
Step 2: Connect from VS Code via Remote-SSH
Install the Remote - SSH extension in VS Code if you don't already have it.
In RunPod's dashboard, click your running pod and copy the "SSH over exposed TCP" command. It looks like this:
ssh root@<IP> -p <PORT> -i ~/.ssh/id_rsa
In VS Code:
- Press
Cmd+Shift+P(Mac) orCtrl+Shift+P(Windows) to open the Command Palette - Run Remote-SSH: Add New SSH Host and paste the command from RunPod
- VS Code adds it to
~/.ssh/config - Run Remote-SSH: Connect to Host and pick your pod
A new VS Code window opens, connected to the pod. Open the integrated terminal (Ctrl+`) — it's a remote terminal running on the GPU server.
Step 3: Install Ollama and Claude Code on the Pod
Paste this block into the integrated terminal:
apt update && apt install -y zstd
curl -fsSL https://ollama.com/install.sh | sh
curl -fsSL https://claude.ai/install.sh | bash
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc && source ~/.bashrc
The apt update && apt install -y zstd step is required — without it the Ollama install fails on a fresh RunPod image.
Step 4: Start Ollama and Pull Qwen 3.6
ollama serve > ollama.log 2>&1 &
sleep 5
ollama pull qwen3.6:35b
The model is ~24GB but RunPod's network typically pulls it in well under a minute.
Step 5: Verify GPU Usage (Do Not Skip)
ollama ps
Look at the PROCESSOR column. It must say 100% GPU. If it shows any CPU split, your VRAM is too small — stop the pod and deploy a bigger one. This check saves you from a frustrating recording where every prompt takes 5 minutes.
Step 6: Launch Claude Code with Qwen 3.6
ollama launch claude --model qwen3.6:35b
Claude Code starts in the integrated terminal with Qwen 3.6 attached as the active model. You now have the best open-source coding agent running on a rented GPU — controlled from your laptop.
Demo 1: Build a Landing Page with Live Browser Preview
Inside Claude Code, paste this prompt:
Create a new folder called landing/ and inside it build a marketing landing page for a fictional AI tool called CodeBolt — an open-source coding agent. Use semantic HTML (landing/index.html) and a separate stylesheet (landing/style.css). Layout: dark navbar with logo + nav links, hero section with a gradient background and a free badge + headline + subtitle + CTA button, a 3-card features section with icons, a testimonials carousel, and a footer with social icons. Modern fonts. Smooth hover effects. Fully responsive below 900px. After building, start a Python HTTP server on port 8765 in the landing folder so I can preview it.
Qwen 3.6 plans the work, writes the files (you can watch them appear in VS Code's file explorer in real time), and starts the HTTP server.
Open the Preview in Your Mac Browser
Install the Live Server extension inside the remote VS Code window. Then right-click landing/index.html in the file explorer and pick Open with Live Server.
VS Code auto-forwards the port over your SSH connection. Your Mac's Chrome opens the page — it's actually being served from the GPU server thousands of miles away, but it loads like localhost. Full design, fonts, gradients, hover effects — all rendered from code Qwen 3.6 just wrote. Total time: about 90 seconds. Total cost: pennies.
Demo 2: Recreate Any UI from a Screenshot (The Multimodal Bridge)
Qwen 3.6 is natively multimodal — it can read images. But there's a real catch worth knowing: Claude Code's Ollama integration has a documented bug where vision messages get silently stripped for non-whitelisted models. Drop an image into Claude Code with Qwen 3.6 and the model just hallucinates — it never sees the image.
The workaround is clean: use Ollama directly to describe the image, then hand that description to Claude Code to write the code.
Step 1: Upload a Mockup to the Pod
Create a folder /root/ui-clone/ in VS Code's file explorer. Drag any UI mockup (a Linear, Stripe, or Vercel homepage screenshot works great) from your Mac's Finder into that folder. VS Code's Remote-SSH uploads it automatically.
Step 2: Get a Detailed Description via Ollama Directly
Open a second integrated terminal in VS Code and run:
ollama run qwen3.6:35b "Describe this image in extreme detail — every section, every element, layout structure, colors with hex codes if you can infer, font weights, spacing, hierarchy. Be exhaustive so I can recreate it in code from your description alone." /root/ui-clone/mockup.png
Qwen 3.6 reads the image directly (no Claude Code bridge) and outputs a detailed, pixel-aware description — fonts, colors, spacing, hierarchy.
Step 3: Hand the Description to Claude Code
Copy the entire description output. Switch back to the first terminal (where Claude Code is running) and paste:
Here's a detailed description of a UI I want you to recreate. Build a faithful HTML + CSS version in this folder — /root/ui-clone/. Use semantic HTML, modern CSS, match the colors and layout as closely as possible. Start a Python HTTP server on port 8766 when done.
[paste the full description from the ollama run output]
Files appear in VS Code's file explorer. Claude Code starts the preview server. Open it with Live Server and put it side by side with the original mockup — every structural element, every color, every layout decision is there. Not pixel perfect, but genuinely close.
This is the workflow that works today. When Anthropic patches the Claude Code vision whitelist to include Ollama-served models, step 2 disappears and you feed the image directly to Claude Code. Until then: Ollama describes, Claude Code builds.
The Real Cost (And Remember to Stop the Pod)
The full recording — model pull, both demos, every prompt — costs well under $2. That's less than a single cup of coffee for two real things built with a state-of-the-art coding model.
One thing you must remember: RunPod bills per second. When you're done, click Stop on the pod. Otherwise it keeps billing overnight.
Stopping preserves your data and configuration — restart the same pod tomorrow and pick up where you left off. You're only paying when the GPU is on.
RunPod vs Ollama Cloud vs Local — When to Use Each
| Setup | Best For |
|---|---|
| Local | Only if you already own a 40GB+ GPU or 64GB+ Apple Silicon Mac. Fastest response times. Fully private. |
| Ollama Cloud | Models that have a cloud variant (Gemma 4 does, Qwen 3.6 doesn't yet). Zero setup, generous free tier. |
| RunPod | Any Ollama-compatible model. Pay by the hour, control when the GPU is on. Best for power users and big-context sessions. |
Known Issues (and Quick Fixes)
| Issue | Fix |
|---|---|
zstd error on Ollama install |
Run apt update && apt install -y zstd first |
claude: command not found |
source ~/.bashrc or restart the shell |
| Ollama server not responding | Relaunch with ollama serve > ollama.log 2>&1 & |
| Model pull fails with "no space left" | Container disk too small — redeploy with 50GB |
ollama ps shows CPU split |
VRAM too small — stop pod, deploy A100 40GB+ or L40S |
| Image passed via path gets stripped (vision hallucinates) | Use ollama run directly for image description, feed description to Claude Code (Demo 2 workflow) |
Wrap-Up
Qwen 3.6 on RunPod is the cheapest serious way to run a state-of-the-art coding model right now. No $2000 GPU. No subscription. Any laptop works.
- RunPod: runpod.io
- VS Code Remote-SSH: Marketplace
- Live Server extension: Marketplace
- Qwen 3.6 on Ollama: ollama.com/library/qwen3.6
- Previous Gemma 4 Cloud video (no RunPod needed): Watch on YouTube