Damek Davis on X: "Gradient descent doesn't avoid saddle points of C^1 smooth functions. Random initialization doesn't help. If you start in the crevice, you stay in the crevice. You need C^2
![The journey of Gradient Descent — From Local to Global | by Pradyumna Yadav | Analytics Vidhya | Medium The journey of Gradient Descent — From Local to Global | by Pradyumna Yadav | Analytics Vidhya | Medium](https://miro.medium.com/v2/resize:fit:1400/1*ZC9qItK9wI0F6BwSVYMQGg.png)
The journey of Gradient Descent — From Local to Global | by Pradyumna Yadav | Analytics Vidhya | Medium
![optimization - Oscillating around the saddle point in gradient descent? - Artificial Intelligence Stack Exchange optimization - Oscillating around the saddle point in gradient descent? - Artificial Intelligence Stack Exchange](https://i.stack.imgur.com/NyXTy.png)