Q-learning is a well-known reinforcement learning algorithm. It has been proven mathematically that Q-learning eventually always finds the best possible action in every state, as long as the state and action spaces are finite. However, I will demonstrate that in practice, Q-learning can reach very poor solutions, on some very simple problems. (Note: convergence of Q-learning is only guaranteed for Markov decision processes with finite state and action spaces, but the problems I will discuss fall into this category.) In my talk, I will attempt to give insights and answers for the following questions: - Why does a Q-learning agent believe that betting $1 in a game of roulette will on average yield $30, even after each bet has been attempted thousands of times? - In what class of settings does this type of problem appear, and why? - How does this behavior of Q-learning relate and generalize to other issues, such as: * In human decision-making, decision makers in general can seem overly optimistic about the choices they make. * In economics/finance the more you listen to your best analyst, the worse his/her recommendations can seem to be. * In a financial system, overestimation of values and underestimation of risks can be a result of the system and not (only) a human property (i.e. also in an artificial economy, with only rational agents, crises can and will occur if this overestimation is not taken into account).