Dive into the world of ControlNet, a unique derivative of the Stable Diffusion model that has been revolutionizing image generation by providing enhanced control through the integration of extra conditions. A prominent feature of ControlNet is its LineArt model, which has been specifically conditioned to work with lineart images. This sophisticated neural network architecture is a powerful companion to Stable Diffusion implementations, such as runwayml’s stable-diffusion-v1-5.
What sets the LineArt model apart is its special knack for creating images populated with straight lines, making it an invaluable tool for various applications, from interior design visualizations to complex architectural projects. The inner workings of ControlNet are quite intriguing; it employs trainable network modules attached to the U-Net (noise predictor) of the Stable Diffusion Model. This arrangement allows the Stable Diffusion model’s weight to remain constant during training, with only the attached modules undergoing modification.
ControlNet’s adaptability doesn’t stop there. This model can be trained on a variety of input types, including but not limited to edge detection, human pose detection, and semantic segmentation. ControlNet stands as a testament to the evolution of AI in art generation, offering precision and versatility that artists and creators can harness in their work.
Line art anime
This preprocessor is an essential part of ControlNet’s LineArt model. It focuses on rendering the outlines of images in an anime style, often associated with clean and simplistic lines. This style can capture the distinctive features of subjects and is particularly suited for artistic renderings or illustrations with an anime aesthetic.
Line art anime denoise
The ‘anime denoise’ preprocessor is a variant of the anime preprocessor. While it also provides anime-style lines, it’s designed to reduce details, producing a more simplistic and abstract image representation. This approach can emphasize the primary elements of the image and remove unnecessary or distracting details, making it suitable for minimalistic anime-style artwork.
Line art realistic
The ‘realistic’ preprocessor aims to create lines that mimic the complexity and intricacy found in real-world scenes. It tries to capture more detail and realism compared to the anime preprocessors. This can be especially useful for creating detailed sketches or line art drawings closely resembling the original image.
Line art coarse
The ‘coarse’ preprocessor produces realistic-style lines but with heavier weight. This means it creates thicker or bolder lines, which can give the image a more dramatic or impactful look. This style can effectively emphasize certain features or elements within the image and lends itself well to striking, high-contrast artwork.
The setup for generating the cat with the lineart model
Lineart preprocessor allows you the ability to control the base composition so that you can apply a style to the image through your prompt. ControlNet takes the image source and preprocesses it into a black-and-white image outline. It then takes the outline and uses it with the ControlNet model to guide the composition of the image output.
Here is one such workflow. I took a simple lineart illustration of a cat, and then in ControlNet, I selected the preprocessor Invert (from white bg & black line). With the image added to the ControlNet tab, select enable, pixel perfect (produces a more accurate outline), and allow preview checkbox options. If you click the 💥 icon button, ControlNet will use the preprocessor on the Image and add it to the preview panel. With the preview generated, you can drag this new Lineart Image and replace the source image. Then select none in the preprocessor dropdown menu, and the ControlNet model is set to lineart_anime.
Then, I used the new simple inverted image with a Lineart model and generated several images until I got one I wanted to work with. Then I take the new reference image and use the lineart realistic preprocessor.
Then I take the processed image (lineart realistic) and move it from the preview to the source image. I deselect the preprocessor model since the image has already been processed. I only need it for image guidance. So with the preprocessor deselect and the Model (controller lineart) selected.
With the control values set, I add the setting for generating the image of a cat.
Prompt: a watercolor painting of a orange cat sitting on a table, painting of a cat, a painting of a cat, cat portrait painting, watercolor-wash, sunny kitchen window in the background, vibrant colors, rich orange and whites
Negative prompt: (worst quality:0.8), verybadimagenegative_v1.3, easynegative, (surreal:0.8), (modernism:0.8), (art deco:0.8), (art nouveau:0.8), extra paws, extra toes
Steps: 35, Sampler: DPM++ 2M Karras_dynthres15, CFG scale: 6.5, Seed: 2582070524, Size: 512×768, Model hash: 073447953e, Model: rundiffusionFX25D_v10, VAE: vae-ft-mse-840000-ema-pruned, Denoising strength: 0.45, Clip skip: 1, Version: 0d101b9, Parser: Full parser
Various backgrounds with the same cat in the foreground
Images were generated with the same cat but with different backgrounds.
I first used the illustration of a cat in a guidance test for image control in MidJourney. See the prompt post: Generating an image of a watercolor painting from a simple sketch and a text prompt.
Lean tower of Piza – Generated with Lineart
Prompt: a painting of the leaning tower of Pisa, Piazza del Duomo, by Dirck van der Lisse, inspired by Johann Balthasar Bullinger, golden hour, blue sky above
Negative prompt: (worst quality:0.8), verybadimagenegative_v1.3, easynegative, (surreal:0.8), (modernism:0.8), (art deco:0.8), (art nouveau:0.8), style-empire-neg
Steps: 30 | Sampler: DPM++ 2S a Karras | CFG scale: 7 | Seed: 1224046228 | Size: 408×608 | Model hash: 073447953e | Model: rundiffusionFX25D_v10 | VAE: vae-ft-mse-840000-ema-pruned | Denoising strength: 0.7 | Clip skip: 1 | Version: b48a0f1 | Token merging ratio: 0.5 | Token merging ratio hr: 0.5 | Parser: Full parser | ControlNet 0: “preprocessor: none | model: control_v11p_sd15s2_lineart_anime [3825e83e] | weight: 1 | starting/ending: (0 | 1) | resize mode: Crop and Resize | pixel perfect: True | control mode: ControlNet is more important | preprocessor params: (512 | 64 | 64)” | Hires upscale: 2 | Hires upscaler: R-ESRGAN 4x+ Anime6B
Note: The lineart image was modified using Affinity Photo, and the lines for the background were removed.
Superman Art Generated with ControlNet Lineart
Prompt: a close up of a superman, moody iconic scene, pleading face, standing in a barren field, dramatic sky in the background, dawn morning, sublime-cool-hot-hyperadvanced, pained expression, by Jim Lee, perfect symmetric coherent face, a broad shouldered, solemn, short hair cut,
Negative prompt: (worst quality:0.8), verybadimagenegative_v1.3, easynegative, (surreal:0.8), (modernism:0.8), (art deco:0.8), (art nouveau:0.8), hair over face, hair cover face
Steps: 30 | Sampler: DPM++ 2M Karras | CFG scale: 7 | Seed: 3684907572 | Size: 664×664 | Model hash: 073447953e | Model: rundiffusionFX25D_v10 | VAE: vae-ft-mse-840000-ema-pruned | Denoising strength: 0.4 | Clip skip: 2 | Version: b48a0f1 | Token merging ratio: 0.5 | Token merging ratio hr: 0.5 | Parser: Full parser | ControlNet 0: “preprocessor: none | model: control_v11p_sd15s2_lineart_anime [3825e83e] | weight: 1 | starting/ending: (0 | 1) | resize mode: Crop and Resize | pixel perfect: True | control mode: Balanced | preprocessor params: (512 | 64 | 64)” | Hires upscale: 1.5 | Hires upscaler: R-ESRGAN 4x+ Anime6B
Note: The lineart image was modified using Affinity Photo, and the lines for the background were removed. The hairline was corrected, and some added lines were removed from the face.