Unlsoth Colab For R1

dustinwloring1988 · February 7, 2025, 2:03pm

I found this colab and just wanted to share. you can create and experience it for yourself

ColeMedin · February 7, 2025, 6:20pm

Thanks for sharing this Dustin, this looks phenomenal!

dustinwloring1988 · February 10, 2025, 8:34pm

Building a new dataset that I will be putting on Huggingface, right now I am using ollama running qwen2.5-coder to grade it, only 123hrs left of running inference:

dustinwloring1988 · February 10, 2025, 8:37pm

This was after 275 steps using the old dataset hoping this new one will have a little better results:

leex279 · February 10, 2025, 8:59pm

nice but on what do you train this? First World War?

dustinwloring1988 · February 11, 2025, 9:47am

LOL, I made a couple new reward functions for training an r1 clone and was trying out multiply different style questions instead of just math. To see how it reasons differently. Only the questions with the highest score will be used out of the almost 100K Q&A only 5 to 10 percent will be used.

dustinwloring1988 · February 14, 2025, 10:01am

Here are 3 models I trained and some custom datasets, later on today (2/14/25) at 1300 hrs EST my Phi4 reasoning model will be done training and I will upload it as well: