I found this colab and just wanted to share. you can create and experience it for yourself
Thanks for sharing this Dustin, this looks phenomenal!
Building a new dataset that I will be putting on Huggingface, right now I am using ollama running qwen2.5-coder to grade it, only 123hrs left of running inference:
This was after 275 steps using the old dataset hoping this new one will have a little better results:
LOL, I made a couple new reward functions for training an r1 clone and was trying out multiply different style questions instead of just math. To see how it reasons differently. Only the questions with the highest score will be used out of the almost 100K Q&A only 5 to 10 percent will be used.
Here are 3 models I trained and some custom datasets, later on today (2/14/25) at 1300 hrs EST my Phi4 reasoning model will be done training and I will upload it as well: