I didn’t read the self improvement books that I intended to on Christmas Eve and day… instead, I installed Stable Diffusion on my new powerful computer and trained using two methods: DreamBooth and Textual Inversion.
I used this Google Collab notebook to generate the code for the textual inversion method. For the DreamStudio methods, I used this Google Collab notebook. DreamStudio is somewhat easier!
Below are renders using each method. AI is good at generating faces of pretty girls, but not so much specific species of birds… and I wanted a cockatoo! With DreamBooth I was able to train on a larger image set (265 images), but more random from my phone. With the textual inversion method, I just used 5 images that I was more selective choosing.
Lots of firsts for me, besides training Stable Diffusion to create a more realistic Moluccan cockatoo (will post the “Moluccan cockatoo” with same prompts in first comment):
- I figured out how to run both training scripts on Google Collab (and upgraded to a paid account) – the tutorials I found for this weren’t very good; AI technology is changing so fast and the tech people who take the time to do this are not necessarily skilled tutorial writers.
- I made an account on Github and navigated through conflicting information to download code and troubleshoot… I installed Stable Diffusion UI no less than five times, because the first three I was using the wrong fork!
- I installed Stable Diffusion UI and then installed the v2 data, and extensions for inpainting and upscaling, which included modifying the webui-user.bat file
- I got an extension to work and troubleshot ffmpeg installation to extract frames from videos
As you can see, DreamBooth produced pictures that most resemble Misha.
I think that next I’m going to try to train better datasets of my birds, make a dataset of myself, and definitely learn more about AI image and text generation techniques.
Photo of Mishabird3/Mishabird5/ [controls] cockatoo/”moluccan cockatoo”/Mishabird in a tree, white bird
ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft
Sampling method: DDIM
CFG scale 25
Checkpoint sd-v1-4-full-ema.ckpt (except Mishabird3 which was a ckpt using DreamBooth )
For reference, a picture of my pet cockatoo that I used in my AI training sets:
Textual Inversion AI training method results:
DreamBooth Training Results:
Pictures that did not use my specific instance prompts “Mishabird3” or “Mishabird5”; and instead used either cockatoo, Mishabird, or “moluccan cockatoo”.