for some reason my actor reward is starting at zero... is there something you did to ensure correct reward tracking?
Hi! A very short time ago, I tried GRPO on VERL in their new version and encountered the same issue, but I didn’t have time to debug it. I think it will be fixed soon; in the meantime, I switched to an earlier commit (d12d6b3ee86a4061ca51be0098acab4e773706d8) for my experiment. You can try a similar approach. Let me know if that solves your problem!