AI image generation with Flux.1 and the ComfyUI visual user interface

Walter Schaerer

Generative AI Beratung im Online Marketing

Published Oct 1, 2024

AI image generation has evolved significantly, enabling artists, designers, and enthusiasts to create stunning visuals.

One of the leading tools in generative AI, Flux AI, combined with the ComfyUI visual user interface, provides a seamless image creation experience.

This summary of my Webmemo blogpost will show you how to navigate ComfyUI, set up your own environment, and maximize your use of these tools, while also exploring advanced features and real-world applications.

Walter as rendered by Generative AI in ComfyUI and Flux.1 Dev — Walter rendered by Generative AI in ComfyUI and Flux.1 Dev

Why upgrade from Midjourney or DALL-E to ComfyUI?

Midjourney and OpenAI’s DALL-E are great AI image generators. Even more so since Midjourney became accessible via a website instead of a cumbersome Discord bot. It’s great for beginners and even advanced users.

However, ComfyUI offers several advantages over Midjourney and DALL-E for users looking for more control and customization in their AI image generation process:

Local processing: Unlike Midjourney, which runs on remote servers, ComfyUI can be run locally on your own hardware, giving you more privacy and control over your creations. Inference tokens are on the house!
Customizable workflows: ComfyUI’s node-based interface allows for intricate customization of the generation process, enabling fine-tuned control over various aspects of your image creation.
Model flexibility: ComfyUI supports multiple AI models, including different versions of Stable Diffusion and Flux, allowing you to switch between models or even use custom ones.
Cost-effective: After the initial setup, using ComfyUI is free, whereas Midjourney requires a subscription for continued use.
Transparency: The open-source nature of ComfyUI allows users to understand and modify the underlying processes. E.g. you can develop and add your own custom nodes to perform a unique workflow.

How to get started with ComfyUI

Generative AI workflow in ComfyUI — ComfyUI allows you to easily adapt your workflow

A word of warning: While ComfyUI is the most powerful and modular stable diffusion GUI and backend, it is not easy to set up and it requires a powerful graphics card to run smoothly (inference), preferably from Nvidia.

Getting started with ComfyUI involves a few key steps:

System Requirements: Ensure your computer meets the minimum requirements, including a compatible GPU with sufficient VRAM (8GB+ the more of it, the better).
Installation:Download and install PythonClone or download the ComfyUI repository from GitHubInstall the required dependencies using pip
Download models:Obtain Stable Diffusion or Flux model checkpoints from Hugging Face (e.g., SD 1.5, SD 2.1 or Flux.1-dev, Flux.1-schnell) Here’s an according instruction I compiled on Perplexity.Place the models in the appropriate folder within the ComfyUI directory. The README files contain the instructions where to put the different files
Launch ComfyUI:From a command shell where the script resides, run the start script provided in the ComfyUI package (preferably the one for Nvidia GPUs, i.e. run_nvidia_gpu.bat)Access the interface through your web browser. The batch file should automatically open your browser at http://127.0.0.1:8188/
Familiarize yourself with the interface:Explore the available nodesLearn how to connect nodes to create basic workflows
Start with a simple text-to-image workflow to generate your first image.

ComfyUI image-to-image workflow

Like Midjourney, ComfyUI offers the possibility to add image-to-image workflow.

Refer to my Webmemo blogpost for the full description.

How to add your photos to the AI model: Train a Low-Rank Adaptation LoRA

You can easily train a Visual Language Model (VLM) with LoRA (Low-Rank Adaptation) to incorporate your own photos into the AI model.

Think of a LoRA as fine-tuning for visual diffusion models like Stable Diffusion or Flux.1.

By using LoRA, you can train the model on your own images, allowing it to generate pictures in your specific style or of particular subjects like your portrait photos. Here’s how to do it:

Prepare your dataset: Collect 10 – 20 high-quality images that represent the style or subject you want to train.Ensure images are diverse but consistent in style or subject.If you use Replicate or Fal.ai, you can forget about resizing and labeling each image, as many tutorials will tell you. These platforms do it for you automatically.
Set up the training environment locally or better yet, use Replicate or Fal.ai: You will probably train your portrait photo only once, therefore it’s not worth the hassle to set up and train on your own machine.Training a Flux.1 Dev model on Replicate with 10 portrait photos takes approximately 20 minutes on their high-end Nvidia H100 GPUs and costs as little as $2.50. On your local machine the model fine-tuning may easily take a few hours.Important: Set a trigger word before you start the training. You will need it later when you refer to your LoRA in your image prompt.
After the training process: Download the LoRA weights file from Replicate or Fal and place it in your ./models/loras/ folder.
Using your LoRA in ComfyUI: In your ComfyUI workflow, add a «LoRA Loader» node and select your LoRA.Connect the LoRA Loader to your model checkpoint in the workflow.Adjust the LoRA strength to control how much influence it has on the generation.
Generate images: Use your hot word in your prompts related to your trained subject or style.Experiment with different LoRA strengths to find the right balance.
Ethical considerations: Ensure you have the right to use the images in your training set. Be mindful of potential biases in your training data.Consider the implications of generating images that closely mimic real individuals.

Remember, while LoRA allows you to customize the AI model with your own images, it’s still building upon the base model’s capabilities.

The quality of your results will depend on both the base model and the quality of your training data. Flux Dev or Flux Schnell are among the best at the time of writing.

By incorporating LoRA into your ComfyUI workflow, you can create unique, personalized images that blend the power of large AI models with your specific visual style or subjects of interest.

If I didn’t know better, I would think the following picture was actually a photo of yours truly. But it’s not, it’s completely Generative AI.

Image generated by a Flux.1 Dev LoRA in ComfyUI — Generative AI: Walter rendered in Venice by Flux.1 Dev LoRA in ComfyUI

Outlook: What to add from here

To further enhance your ComfyUI experience, consider exploring:

Advanced workflows: Learn to use more complex nodes such as ControlNet for precise control over someone’s posture.
Custom nodes: Develop your own or incorporate community-made custom nodes to extend ComfyUI’s functionality. E.g. I will want to add a label «Created with AI» that I export to its dedicated Photoshop layer.
Model merging: Experiment with merging different AI models to create unique styles and capabilities.
Batch processing: Set up workflows for generating multiple images with variations.
Animation workflows: Explore techniques for creating animated sequences using ComfyUI.
Integration with other tools: Learn how to use ComfyUI in conjunction with photo editing software like exporting layers and masks to Photoshop for post-processing.
Community engagement: Join ComfyUI forums and Discord channels to share knowledge and stay updated on new developments.
Contributing to the project: As an open-source tool, you can contribute to ComfyUI’s development or documentation.

Use cases I plan to work on next:

Creating ad variations of a given photo for different platforms.
Adding layers of company logos or labels like «Created with AI« that are automatically exported to Photoshop for final touches.