Yeah, backprop gives efficient gradient calcs to train at a given net architecture. To find the best architecture though people try various other methods, eg random search or Gaussian processes, which don’t evaluate derivatives wrt architecture Params.