If input images didn't need resizing because they were already smaller than max width/height then they didn't get normalized to a multiple of 64. This caused an exception like the following:
```Sizes of tensors must match except in dimension 1. Expected size 4 but got size 3 for tensor number 1 in the list.
```
Seems to be caused by incompatible types in group_norm when we use autocast.
Patch group_norm to cast the weights to the same type as the inputs
From what I can understand all the other repos just switch to full precision instead
of addressing this. I think this would make things slower but I'm not sure. So maybe
the patching solution I'm doing is better?
https://github.com/pytorch/pytorch/pull/81852
- improved image logging functionality. can just stick log_latent wherever you want
- improved some variable naming
- moved all the samplers together
- vendored k-diffusion library