Hey all, I found this video to upload qwen 32b to the cloud…I’d like to use this same setup for ottodev?
for .39 cents an hour and no API limits, it seems like a no-brainer if the outputs are comparable…
Does anyone know of a way I can setup my runpod server to use ottodev?
There are limits in terms of speed.
Not sure what speed on your GPU Pod would be.
I found a number for Qwen2.5-7B
Its aproximally 40 tokens per second on single GPU.
That means that per hour with your .39 cents you will get:
144000 tokens.
And amount of input tokens also slows the generation. But lets leave that aside for now.
So we have 39 per hour or 3692 tokens per one cent.
Here are prices for that model on OpenRouter
$0.27/M output tokens
27 cents per million tokens
That is 37037 tokens per cent
So on OpenRouter for that model you get:
37037 / 3692 = 10x more value per cent.
That is very rough calculation.
But I am a proponent of using cloud APIs as best option for value.
You get better speed, better cost, better models.
Sadly we do not have good evals for “price for job done”
In the end its not about tokens but about cost per doing certain types of jobs succesfully and constently.