Setup for MacOS users

WaXy · November 11, 2024, 3:48pm

I’m running the fork on my Macbook Pro with M1 chip. I using on the Ollama framework using the Ollama quen2.5 coder. It’s taking 30 min or so to respond back with 1 sentence. Am I doing something wrong or is it just my laptop?

ColeMedin · November 11, 2024, 4:52pm

That is pretty slow!

I would try testing with running Ollama in the terminal and seeing what kinds of speeds you get there.

The command for that would be:

ollama run qwen2.5-coder:7b

If it’s still slow there it probably is because of your computer unfortunately!

WaXy · November 11, 2024, 5:13pm

thanks for responding. I downgraded to llama2 but the same thing. 8gb of ram might be my issue. But I’ve seen others make it work(not with bolt.new)in the terminal. I can’t seem to figure this one out

WaXy · November 11, 2024, 5:46pm

llama3.2 seems to work well in the terminal but not with bolt.new-any-llm

ColeMedin · November 14, 2024, 2:31pm

What size of Llama 3.2 are you using? The smaller ones like 3b and 1b are usually not big enough to handle the Bolt.new prompt unfortunately.

mahoney · November 15, 2024, 1:34am

This is my experience as well, once your model won’t fit fully into GPU RAM along with your machine’s usual memory needs, it will back off to CPU and the response times will nosedive. For reference I am on an M1 Max 32GB, and the 7B sized qwen models are at the realistic limit of support for me.