Yes, backpropagation isn't the chain rule itself, but just an efficient way to calculate the chain rule. (In this respect there are some connections to dynamic programming, where you find the most efficient order of recursive computations to arrive at the solution).
I think of it as: computing the chain rule in the order such that we never need to compute Jacobians explicitly; only Jacobian-vector products.
I also didn't totally grasp its significance until implementing neural networks from matrix/array operations in NumPy. I hope all deep learning courses include this exercise.
Yes, they are not the same. The chain rule is what solves the one non-trivial problem with backpropagation. Besides that, it's just the quite obvious idea of changing the weights in proportion to how impactful they are on the error.