AM radio works by modulating carrier wave (of constant, high frequency), by voice (small, variable frequency). You want to get the voice, and if you just connected signal from antenae to the speaker, it'll be vibrating too fast for you to hear the voice.
So you need to filter out the carrier part of the signal. Diode only allows current to flow one direction, and the rest of circuit allows it to discharge slowly (it has some capacity).
When there is unmoduled carrier wave (silence on radio), there is balance between the current that flows thoruhgt diode, and the current of discharge, so the voltage on speaker is more or less constant (probably not on 0 level, but it doesn't matter).
When carrier wave is modulated to half the amplitude, speakers gets less the voltage.
So speakers vibrate with voice wave frequency, not carrier wave frequency, so you can hear the voice.
Imagine you want to measure the daily sea level, and it's very windy (big waves). You can't just measure it once a day, because depending on the moment you'll measure high or low level of wave, and you want the average level. So you make a container that is filled by waves throught one-way pipe, and you drill small hole in the bottom of that container, so it leaks at a constant rate. When you tune the size of hole in the bottom, and the size of one-way pipe correctly, you can measure level of water in this container, and it will be proportional to the average level of water in the sea.
Roughly, when you're receiving an AM signal, you use the rectifying effect of a diode coupled to a low-pass filter. The idea is that the envelope is all you really need to pass to the speaker, if the signal itself is powerful enough.