Currently the Server Config settings menu only exists in the Desktop version, and this settings menu item does not exist in other versions

Network

Host: The IP address to listen on

  • Function: Sets the IP address the server binds to. Default 127.0.0.1 means only local access is allowed. If you need LAN access, you can set it to 0.0.0.0

Although we provide LAN listening settings for the Desktop version, as a desktop application, it is not suitable for use as a server. We recommend that if you need to use ComfyUI as a public service within the LAN, please refer to the manual deployment tutorial to deploy the corresponding ComfyUI service.

Port: The port to listen on

Function: The port number the server listens on. Desktop version defaults to port 8000, Web version typically uses port 8188

TLS Key File: Path to TLS key file for HTTPS

Function: The private key file path required for HTTPS encryption, used to establish secure connections

TLS Certificate File: Path to TLS certificate file for HTTPS

Function: The certificate file path required for HTTPS encryption, used in conjunction with the private key

Enable CORS header: Use ”*” for all origins or specify domain

Function: Cross-Origin Resource Sharing settings, allowing web browsers to access the server from different domains

Maximum upload size (MB)

Function: Limits the maximum size of single file uploads, in MB, default 100MB. Affects upload limits for images, models and other files

CUDA

CUDA device index to use

Function: Specifies which NVIDIA graphics card to use. 0 represents the first graphics card, 1 represents the second, and so on. Important for multi-GPU systems

Use CUDA malloc for memory allocation

Function: Controls whether to use CUDA’s memory allocator. Can improve memory management efficiency in certain situations

Inference

Global floating point precision

Function: Sets the numerical precision for model calculations. FP16 saves VRAM but may affect quality, FP32 is more precise but uses more VRAM

UNET precision

Options:

  • auto: Automatically selects the most suitable precision
  • fp64: 64-bit floating point precision, highest precision but largest VRAM usage
  • fp32: 32-bit floating point precision, standard precision
  • fp16: 16-bit floating point precision, can save VRAM
  • bf16: 16-bit brain floating point precision, between fp16 and fp32
  • fp8_e4m3fn: 8-bit floating point precision (e4m3), minimal VRAM usage
  • fp8_e5m2: 8-bit floating point precision (e5m2), minimal VRAM usage

Function: Specifically controls the computational precision of the UNET core component of diffusion models. Higher precision can provide better image generation quality but uses more VRAM. Lower precision can significantly save VRAM but may affect the quality of generated results.

VAE precision

Options and Recommendations:

  • auto: Automatically selects the most suitable precision, recommended for users with 8-12GB VRAM
  • fp16: 16-bit floating point precision, recommended for users with 6GB or less VRAM, can save VRAM but may affect quality
  • fp32: 32-bit floating point precision, recommended for users with 16GB or more VRAM who pursue the best quality
  • bf16: 16-bit brain floating point precision, recommended for newer graphics cards that support this format, can achieve better performance balance

Function: Controls the computational precision of the Variational Autoencoder (VAE), affecting the quality and speed of image encoding/decoding. Higher precision can provide better image reconstruction quality but uses more VRAM. Lower precision can save VRAM but may affect image detail restoration.

Run VAE on CPU

Function: Forces VAE to run on CPU, can save VRAM but will reduce processing speed

Text Encoder precision

Options:

  • auto: Automatically selects the most suitable precision
  • fp8_e4m3fn: 8-bit floating point precision (e4m3), minimal VRAM usage
  • fp8_e5m2: 8-bit floating point precision (e5m2), minimal VRAM usage
  • fp16: 16-bit floating point precision, can save VRAM
  • fp32: 32-bit floating point precision, standard precision

Function: Controls the computational precision of the text prompt encoder, affecting the accuracy of text understanding and VRAM usage. Higher precision can provide more accurate text understanding but uses more VRAM. Lower precision can save VRAM but may affect prompt parsing effectiveness.

Memory

Force channels-last memory format

Function: Changes the data arrangement in memory, may improve performance on certain hardware

DirectML device index

Function: Specifies the device when using DirectML acceleration on Windows, mainly for AMD graphics cards

Disable IPEX optimization

Function: Disables Intel CPU optimization, mainly affects Intel processor performance

VRAM management mode

Options:

  • auto: Automatically manages VRAM, allocating VRAM based on model size and requirements
  • lowvram: Low VRAM mode, uses minimal VRAM, may affect generation quality
  • normalvram: Standard VRAM mode, balances VRAM usage and performance
  • highvram: High VRAM mode, uses more VRAM for better performance
  • novram: No VRAM usage, runs entirely on system memory
  • cpu: CPU-only mode, doesn’t use graphics card

Function: Controls VRAM usage strategy, such as automatic management, low VRAM mode, etc.

Reserved VRAM (GB)

Function: Amount of VRAM reserved for the operating system and other programs, prevents system freezing

Disable smart memory management

Function: Disables automatic memory optimization, forces models to move to system memory to free VRAM

Preview

Method used for latent previews

Options:

  • none: No preview images displayed, only shows progress bar during generation
  • auto: Automatically selects the most suitable preview method, dynamically adjusts based on system performance and VRAM
  • latent2rgb: Directly converts latent space data to RGB images for preview, faster but average quality
  • taesd: Uses lightweight TAESD model for preview, balances speed and quality

Function: Controls how to preview intermediate results during generation. Different preview methods affect preview quality and performance consumption. Choosing the right preview method can find a balance between preview effects and system resource usage.

Size of preview images

Function: Sets the resolution of preview images, affects preview clarity and performance. Larger sizes provide higher preview quality but also consume more VRAM

Cache

Use classic cache system

Function: Uses traditional caching strategy, more conservative but stable

Use LRU caching with a maximum of N node results cached

Function: Uses Least Recently Used (LRU) algorithm caching system, can cache a specified number of node computation results

Description:

  • Set a specific number to control maximum cache count, such as 10, 50, 100, etc.
  • Caching can avoid repeated computation of the same node operations, improving workflow execution speed
  • When cache reaches the limit, automatically clears the least recently used results
  • Cached results occupy system memory (RAM/VRAM), larger values use more memory

Usage Recommendations:

  • Default value is null, meaning LRU caching is not enabled
  • Set appropriate cache count based on system memory capacity and usage requirements
  • Recommended for workflows that frequently reuse the same node configurations
  • If system memory is sufficient, larger values can be set for better performance improvement

Attention

Cross attention method

Options:

  • auto: Automatically selects the most suitable attention computation method
  • split: Block-wise attention computation, can save VRAM but slower speed
  • quad: Uses quad attention algorithm, balances speed and VRAM usage
  • pytorch: Uses PyTorch native attention computation, faster but higher VRAM usage

Function: Controls the specific algorithm used when the model computes attention. Different algorithms make different trade-offs between generation quality, speed, and VRAM usage. Usually recommended to use auto for automatic selection.

Force attention upcast

Function: Forces high-precision attention computation, improves quality but increases VRAM usage

Prevent attention upcast

Function: Disables high-precision attention computation, saves VRAM

General

Disable xFormers optimization

Function: Disables the optimization features of the xFormers library. xFormers is a library specifically designed to optimize the attention mechanisms of Transformer models, typically improving computational efficiency, reducing memory usage, and accelerating inference speed. Disabling this optimization will:

  • Fall back to standard attention computation methods
  • May increase memory usage and computation time
  • Provide a more stable runtime environment in certain situations

Use Cases:

  • When encountering compatibility issues related to xFormers
  • When more precise computation results are needed (some optimizations may affect numerical precision)
  • When debugging or troubleshooting requires using standard implementations

Default hashing function for model files

Options:

  • sha256: Uses SHA-256 algorithm for hash verification, high security but slower computation
  • sha1: Uses SHA-1 algorithm, faster but slightly lower security
  • sha512: Uses SHA-512 algorithm, provides highest security but slowest computation
  • md5: Uses MD5 algorithm, fastest but lowest security

Function: Sets the hash algorithm for model file verification, used to verify file integrity. Different hash algorithms have different trade-offs between computation speed and security. Usually recommended to use sha256 as the default option, which achieves a good balance between security and performance.

Make pytorch use slower deterministic algorithms when it can

Function: Forces PyTorch to use deterministic algorithms when possible to improve result reproducibility.

Description:

  • When enabled, PyTorch will prioritize deterministic algorithms over faster non-deterministic algorithms
  • Same inputs will produce same outputs, helpful for debugging and result verification
  • Deterministic algorithms typically run slower than non-deterministic algorithms
  • Even with this setting enabled, completely identical image results cannot be guaranteed in all situations

Use Cases:

  • Scientific research requiring strict result reproducibility
  • Debugging processes requiring stable output results
  • Production environments requiring result consistency

Enable some untested and potentially quality deteriorating optimizations

Function: Enables experimental optimizations that may improve speed but could potentially affect generation quality

Don’t print server output to console

Function: Prevents displaying server runtime information in the console, keeping the interface clean.

Description:

  • When enabled, ComfyUI server logs and runtime information will not be displayed
  • Can reduce console information interference, making the interface cleaner
  • May slightly improve system performance when there’s heavy log output
  • Default is disabled (false), meaning server output is displayed by default

Use Cases:

  • Production environments where debugging information is not needed
  • When wanting to keep the console interface clean
  • When the system runs stably and log monitoring is not required

Note: It’s recommended to keep this option disabled during development and debugging to promptly view server runtime status and error information.

Disable saving prompt metadata in files

Function: Does not save workflow information in generated images, reducing file size, but also means the loss of corresponding workflow information, preventing you from using workflow output files to reproduce the corresponding generation results

Disable loading all custom nodes

Function: Prevents loading all third-party extension nodes, typically used when troubleshooting issues to locate whether errors are caused by third-party extension nodes

Logging verbosity level

Function: Controls the verbosity level of log output, used for debugging and monitoring system runtime status.

Options:

  • CRITICAL: Only outputs critical error information that may cause the program to stop running
  • ERROR: Outputs error information indicating some functions cannot work properly
  • WARNING: Outputs warning information indicating possible issues that don’t affect main functionality
  • INFO: Outputs general information including system runtime status and important operation records
  • DEBUG: Outputs the most detailed debugging information including system internal runtime details

Description:

  • Log levels increase in verbosity from top to bottom
  • Each level includes all log information from higher levels
  • Recommended to set to INFO level for normal use
  • Can be set to DEBUG level when troubleshooting for more information
  • Can be set to WARNING or ERROR level in production environments to reduce log volume

Directories

Input directory

Function: Sets the default storage path for input files (such as images, models)

Output directory

Function: Sets the save path for generation results