Gradient-based Planning for World Models at Longer Horizons โ
GRASP treats world-model dynamics as a soft constraint and optimizes actions and latent states via collocation, enabling parallel computation across time. This approach aims to address ill-conditioned gradients and non-greedy planning that arise with long horizons. The method also considers exploration benefits from manipulating intermediate states, though it notes brittleness of state-input gradients in deep learning-based models.