something its like

Co-founder of Anthropic. Pioneer in reinforcement learning and AI safety. Previously at OpenAI, known for PPO algorithm. John Schulman balances RLHF’s alignment potential with its anthropocentric biases, navigating tensions between human feedback and emergent AI goals. His work highlights the fragility of value alignment in systems exceeding human cognitive scales.

john schulman