If I understand correctly, it's because the frequency it is listening to is much higher than normal music. Also possibly because the algorithm is meant to deal with noise.
Actually it is because it is all fake, so he can set whatever rules he wants.
If you are listening for sounds higher than "normal music" you can't capture them with a standard mic, let alone the mic on your phone, since it is optimized for vocal range.