PXAI
Feed
Regions
DE
ES
FR
GR
IT
UK
US
View All
Viral
World
Politics
Technology
Daily Briefing
Sources
|
ToS
PXAI Audio Feed
+5
ΟΛΑ
02/04 20:29
takara.ai
Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing
reinforcement learning
large language models
policy optimization
self‑distillation
group‑relative
sample routing
Comments
Loading...
Send
Dev Changelog
v8.42
No logs found in database.
0
Display Settings
Size
Aa
Brightness
Theme
Dark
Comments