Online · Local Inference

Adam Soong

Tinkering with local LLMs on a Mac Mini M4 Pro, two-handing a Nightrider Glaive through the Lands Between, pulling roofs at BoulderWorld, and trackside for #12.

Tech Stack & Local Inference

Running open-weight models on-device

Machine

Mac Mini M4 Pro

24GB Unified / 512GB SSD

Stack

Ollama + Docker

Open-webUI frontend

Primary Model

Qwen 2.5 Coder

7B & 14B quantized

Secondary

DeepSeek Coder

V2 Lite 16B (Q4_K_M)

local-inferenceactive
$ ollama run qwen2.5-coder:7b
>>> Pipeline ready - 48 tok/s on M4 Pro NPU
$ docker ps --format
CONTAINER STATUS open-webui · up 14d

Gaming Hub

Elden Ring - Nightrider Glaive + Bloodflame Blade build

Nightrider Glaive +25

Quality build · Keen affinity

RL 150

AR

782

2-handed

Poise Damage

38

per swing

Bloodflame

55

bleed buildup

Two-handed jumping R2s into Bloodflame Blade. The glaive’s reach + bleed pressure wins trades. Physick: Stonebarb + Thorny Cracked Tear.

Trackside & Climbs

BoulderWorld Belfast · Mercedes F1 (#12 Kimi Antonelli)

Bouldering

Regular at BoulderWorld Belfast. projecting V5 - V6, working on dynamic coordination moves and roof pulls. The social setting + problem-solving keeps it fresh every session.

Grade range V3 - V7·3x / week
Formula 1

Mercedes-AMG Petronas supporter. Following Kimi Antonelli (#12) closely this season - his Silverstone debut and Monza top-10 were electric. Silver arrows through the field.

#12Mercedes

Memory Bandwidth Token Calculator

Slide to adjust bandwidth and see estimated inference throughput

60 GB/s
8 GB/s120 GB/s

Estimated Tokens / Second

Qwen 2.5 Coder 7B54.0 tok/s
peak: 108 tok/s
DeepSeek Coder 14B27.0 tok/s
peak: 54 tok/s
DeepSeek Coder 32B12.0 tok/s
peak: 24 tok/s