Gemma 4 E4B as a primary local LLM (replaced Qwen)

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM · Digg /AI7h ago

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

- The new setup replaces his nine-month daily Qwen deployment.

263991120431.3K *]:-ml-2 flex max-w-full flex-wrap gap-y-2 z-0">svg]:pointer-events-none [&>svg]:size-3! hover:bg-muted hover:text-muted-foreground dark:hover:bg-muted/50 z-10 h-auto overflow-visible rounded-full p-0 gap-0 border-transparent transition-colors duration-150 ease-out relative bg-muted ring-2 ring-background" data-state="closed">#270👩‍💻P|@DYNAMICWEBPAIGE svg]:pointer-events-none [&>svg]:size-3! hover:bg-muted hover:text-muted-foreground dark:hover:bg-muted/50 z-10 h-auto overflow-visible rounded-full p-0 gap-0 border-transparent bg-transparent transition-colors duration-150 ease-out absolute left-0 top-0" data-state="closed">KA#488|@YACINEMTB svg]:pointer-events-none [&>svg]:size-3! hover:bg-muted hover:text-muted-foreground dark:hover:bg-muted/50 z-10 h-auto overflow-visible rounded-full p-0 gap-0 border-transparent bg-transparent transition-colors duration-150 ease-out absolute left-0 top-0" data-state="closed">FB#1117|@XEOPHON

Original post

Florian Brand@xeophon#1117in/AI

Gemma 4 E4B 6bit is now the local model of my choice and loaded 24/7 on my Mac (using @lmstudio), replacing Qwen3, 3.5 4B after ~9 months of usage What an insane model, congrats @GoogleDeepMind 🤠

4:19 AM · Jun 7, 2026 · 28.7K Views

Sentiment

Many users praise Gemma 4 as the preferred local AI model on Mac for its strong speed and quality, such as 50 tokens per second on 16GB hardware and GPT-4o-like results. Pos 92.9%

Neg 7.1%

15 comments with sentiment.View comments

Cluster Engagement

31.3K Views 26 Comments 11 Reposts 204 Bookmarks

Expand data

Posts from X Most ActivityMost ActivityTimeline

VIEWS3.5KBOOKMARKS6LIKES26 👩‍💻 Paige Bailey@DynamicWebPaige

💎 @googlegemma

Florian Brand@xeophon

4h|Views 3.5KLikes 26Bookmarks 6

RETWEETS2 Florian Brand@xeophon

7h|Views 28.7KLikes 384Bookmarks 208

REPLIES2 Lotto@LottoLabs

@xeophon @yacineMTB @lmstudio @GoogleDeepMind Wouldn’t qwen 9b be nicer?

3h|Views 487Likes 8

Igor Kotenkov@stalkermustang

@xeophon @lmstudio @GoogleDeepMind what are ur usecases? "rewrite", "summarize", "translate," or something bigger in scope and harder by nature?

6h|Views 274Likes 2

🧟@RaghavKoch19380

@xeophon @lmstudio @GoogleDeepMind Wouldn't the 4Bit QAT be better than a 6Bit PTQ

5h|Views 889

Florian Brand@xeophon

@RaghavKoch19380 @lmstudio @GoogleDeepMind The QAT are GGUF only afaik

5h|Views 821

Florian Brand@xeophon

@ignis_code @lmstudio @GoogleDeepMind M4 Max + 64 GB, model uses 7 GB

1h|Views 63Likes 2

🧟@RaghavKoch19380

@xeophon @lmstudio @GoogleDeepMind There are compressed tensor versions or something available for vLLM etc i think. check their huggingface QAT folder.

4h|Views 200Likes 1

Clemens Schartmüller@ClemensScharti

@xeophon @lmstudio @GoogleDeepMind what are you using it for?

4h|Views 624

Florian Brand@xeophon

@0xgeorge @yacineMTB @lmstudio @GoogleDeepMind License

1h|Views 170Likes 1

Vu@vu_zip

@xeophon @wambosec @lmstudio @GoogleDeepMind 64 gb and you use Gemma4 e4b ??? Bro at least use gemma4 12b

3h|Views 276

IGNIS@ignis_code

@xeophon @lmstudio @GoogleDeepMind 어느정도의 VRAM을 사용하시나요?

1h|Views 68Likes 1

Aaryan Kakad@aaryan_kakad

@xeophon @lmstudio @GoogleDeepMind yes, even i have one model always loaded on my system for assistance while building stuff or solving any problems. i think people who can use small 4-9B models to build stuff can actually be called coders.

6h|Views 57Likes 1

George I@0xgeorge

@xeophon @yacineMTB @lmstudio @GoogleDeepMind Why not LFM 2.5 at 8bit for just an extra gb?

1h|Views 181

wambo.@wambosec

@xeophon @lmstudio @GoogleDeepMind mac specs?

7h|Views 167

Jeremy Nguyen ✍🏼 🚢@JeremyNguyenPhD

@xeophon @lmstudio @GoogleDeepMind Are you using it for the privacy considerations, Xeo?

6h|Views 122

Dan Greller@dgreller

@xeophon @lmstudio @GoogleDeepMind What context window are you using?

6h|Views 116

Lazarz@Laz4rz

@xeophon @lmstudio @GoogleDeepMind Why?

5h|Views 88

Florian Brand@xeophon

@wambosec @lmstudio @GoogleDeepMind M4 Max + 64 GB RAM

6h|Views 155Likes 3

Florian Brand@xeophon

@ClemensScharti @lmstudio @GoogleDeepMind https://florianbrand.com/posts/local-llms

3h|Views 756Likes 1

Load more posts

Digg Deeper Ask Question No Digg Deeper questions have been answered for this story yet.

/AI7h ago

Florian Brand,...

Gemma 4 E4B as a primary local LLM (replaced Qwen)

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy