Problem with auto-regressive models
gradually corrupt the image data $x_0$ via a fixed Markov chain $q(x_t|x_{t-1})$ = random replace some tokens of $x_{t-1}$.
Each token has a probability of $(\alpha_t+\beta_t)$ to remain the previous value at the current step while with a probability of $K\beta_t$ to be resampled uniformly over all the K categories.
uniform diffusion is an aggressive process that may pose challenge for the reverse estimation
corrupt the tokens by stochastically masking some of them so that the corrupted locations can be explicitly known by the reverse network.