Stable Diffusion ControlNet Guide

What is ControlNet?

ControlNet is a neural network model designed to control Stable Diffusion models. It allows users to copy compositions or human poses from a reference image, providing a way to control the subjects’ placement and appearance with precision. This is a significant improvement over the randomness of image generation in Stable Diffusion models, where users typically generate many images and pick one they like.

ControlNet adds extra conditioning to the text prompt used in Stable Diffusion models. This extra conditioning can take many forms, such as edge detection or human pose detection. For example, ControlNet can take an additional input image, detect its outlines using the Canny edge detector, and save an image containing the detected edges as a control map. This control map is then fed into the ControlNet model as an extra conditioning to the text prompt.

Prompt: a woman in a blue dress jumping in the air, tonal topstitching, streets of new york, pointe pose, ernie chan, vibrant realistic, centered full-body shot, centered subject, desaturated colors, shot on anamorphic lenses, lei jin, tutu, blue sky, out of focus background

Another example is human pose detection. OpenPose, a fast human keypoint detection model, can extract human poses like positions of hands, legs, and head. These keypoints are extracted from the input image and saved as a control map, fed to Stable Diffusion as an extra conditioning along with the text prompt.

ControlNet can be installed on Windows, Mac, and Google Colab, and it comes with various settings and use cases. It can be used for tasks like copying human poses, stylizing images, controlling poses with Magic Pose, and interior design ideas. It also explains the difference between the Stable Diffusion depth model and ControlNet and how ControlNet works.

Pose Source:


Here are the steps to install ControlNet on Windows, Mac, and Google Colab:

  1. Open the “Extensions” tab in your Stable Diffusion WebUI.
  2. Open the “Install from URL” tab.
  3. Enter into the “URL for extension’s git repository” field.
  4. Press the “Install” button.
  5. Wait a few seconds, and you will see the message “Installed into stable-diffusion-webui\extensions\sd-webui-controlnet. Use Installed tab to restart”.
  6. Go to the “Installed” tab, click “Check for updates,” and then click “Apply and restart UI.” (You can also use these buttons to update ControlNet.)
  7. Completely restart the Stable Diffusion WebUI, including your terminal. (If you’re unsure what a “terminal” is, you can reboot your computer to achieve the same effect.)
  8. Download the ControlNet models. You can find them here: You need to download model files ending with “.pth”.
  9. Put the models in your “stable-diffusion-webui\extensions\sd-webui-controlnet\models” folder. The repository already includes all “yaml” files, so you only need to download the “pth” files.

If you download models from elsewhere, please ensure the yaml and model file names are identical. You may need to manually rename all yaml files if you download from other sources.

After you put the models in the correct folder, you may need to refresh to see the models. The refresh button is right next to your “Model” dropdown.

Please note that these instructions are based on the information available on the GitHub repository for ControlNet as of the time of writing. For the most up-to-date instructions, please refer to the repository directly.

An example of using ControlNet

In the WebUI, Select the first tab txt2Img. The fill in the prompt, negative prompt, sample method, sampling steps, and check restore faces.

You will need a stable diffusion checkpoint compatible with v1.5 – I am using for this example Realistic Vision V2.0.

With ControlNet Installed, add your image to the Image container. Select Enable and Allow Preview. Select the Preprocessor openpose, and click the Run Preprocessor Icon (little red explosion). Then select the Model control_sd15_openpose

Prompt: a woman in a blue dress jumping in the air, tonal topstitching, streets of new york, pointe pose, ernie chan, vibrant realistic, centered full-body shot, centered subject, desaturated colors, shot on anamorphic lenses, lei jin, tutu, blue sky, out of focus background
Negative prompt: EasyNegative, bad-hands-5, (deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime:1.4), text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck

Settings: Steps: 30, Sampler: Euler a, CFG scale: 7, Seed: 4191006204, Face restoration: CodeFormer, Size: 512×768, Model hash: e6415c4892, Model: realisticVisionV20_v20, ENSD: 31337, Version: v1.2.1, ControlNet 0: “preprocessor: openpose, model: control_sd15_openpose [fef5e48e], weight: 1, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced, preprocessor params: (512, 64, 64)”, Discard penultimate sigma: True

