FSK is a modulation where the instantaneous frequency corresponds to a binary baseband signal. Given:
xm(t) is the baseband signal, perhaps a binary square wave taking values of -1 and 1,
fc is the carrier frequency
fΔ is the deviation, roughly speaking the spacing between tones
then the modulated signal y(t) as a function of time is:
y(t)=cos(2πfct+2πfΔ∫t0xm(τ)dτ)
Not by accident, this is exactly the same definition for analog FM. The only difference being the baseband signal xm(t) is analog, instead of digital. That is, FM and FSK are the same modulation, just with different baseband signals.
For AFSK, the same equation applies, only the carrier frequency is something in the audio range. Then, it's fed into an FM modulator, which applies the modulation a second time. Even without doing the math you can imagine applying this modulation function a second time results in a very different transmission.
On the other hand, a USB transmitter does nothing to the modulation besides shift it up in frequency. Or in terms of the equation above, it effectively changes fc. Thus, AFSK over USB is actually just FSK. This is what makes USB transceivers useful for implementing novel modulations like PSK31, JT65, etc.
It's also possible to generate FSK with an FM transmitter by feeding the binary baseband signal into the audio input of an FM transmitter. It requires calibrating the audio gain, because "too loud" would mean too much deviation, or too wide a tone spacing. As an example, some of the MMDVM setups use this to modulate and demodulate DMR with an analog FM transceiver.