PXAI
Feed
Regions
DE
ES
FR
GR
IT
UK
US
View All
Viral
World
Politics
Technology
Daily Briefing
Sources
|
ToS
PXAI Audio Feed
+5
ΟΛΑ
07/04 17:56
takara.ai
QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization
program repair
over‑editing
LLMs
edit‑aware reward
self‑breaking
self‑repairing
06/04 05:10
dev.to
Connecting Generative Adversarial Networks and Actor-Critic Methods
GAN
Actor‑Critic
Reinforcement Learning
Adversarial Training
Policy Optimization
Generative Models
02/04 20:29
takara.ai
Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing
reinforcement learning
large language models
policy optimization
self‑distillation
group‑relative
sample routing
31/03 07:00
arxiv.org
Bitboard version of Tetris AI
Tetris
bitboard
reinforcement learning
simulation speed
policy optimization
game engine
30/03 07:00
arxiv.org
Stabilizing Rubric Integration Training via Decoupled Advantage Normalization
Process‑Aware Policy Optimization
Group Relative Policy Optimization
decoupled advantage normalization
outcome reward models
process reward models
reward hacking
26/03 06:00
arxiv.org
Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction
implicit reward
turn‑wise policy optimization
multi‑turn interaction
reinforcement learning
human‑AI collaboration
sparse rewards
26/03 06:00
arxiv.org
Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction
implicit reward
turn‑wise policy optimization
multi‑turn interaction
reinforcement learning
human‑AI collaboration
sparse rewards
Comments
Loading...
Send
Dev Changelog
v8.42
No logs found in database.
0
Display Settings
Size
Aa
Brightness
Theme
Dark
Comments