This node is primarily used for loading CLIP text encoder models independently. The model files can be detected in the following paths:

  • “ComfyUI/models/text_encoders/”
  • “ComfyUI/models/clip/”

If you save a model after ComfyUI has started, you’ll need to refresh the ComfyUI frontend to get the latest model file path list

Supported model formats:

  • .ckpt
  • .pt
  • .pt2
  • .bin
  • .pth
  • .safetensors
  • .pkl
  • .sft

For more details on the latest model file loading, please refer to folder_paths

Inputs

ParameterData TypeDescription
clip_nameCOMBO[STRING]Specifies the name of the CLIP model to be loaded. This name is used to locate the model file within a predefined directory structure.
typeCOMBO[STRING]Determines the type of CLIP model to load. As ComfyUI supports more models, new types will be added here. Please check the CLIPLoader class definition in node.py for details.
deviceCOMBO[STRING]Choose the device for loading the CLIP model. default will run the model on GPU, while selecting CPU will force loading on CPU.

Device Options Explained

When to choose “default”:

  • Have sufficient GPU memory
  • Want the best performance
  • Let the system optimize memory usage automatically

When to choose “cpu”:

  • Insufficient GPU memory
  • Need to reserve GPU memory for other models (like UNet)
  • Running in a low VRAM environment
  • Debugging or special purpose needs

Performance Impact

Running on CPU will be much slower than GPU, but it can save valuable GPU memory for other more important model components. In memory-constrained environments, putting the CLIP model on CPU is a common optimization strategy.

Supported Combinations

Model TypeCorresponding Encoder
stable_diffusionclip-l
stable_cascadeclip-g
sd3t5 xxl/ clip-g / clip-l
stable_audiot5 base
mochit5 xxl
cosmosold t5 xxl
lumina2gemma 2 2B
wanumt5 xxl

As ComfyUI updates, these combinations may expand. For details, please refer to the CLIPLoader class definition in node.py

Outputs

ParameterData TypeDescription
clipCLIPThe loaded CLIP model, ready for use in downstream tasks or further processing.

Additional Notes

CLIP models play a core role as text encoders in ComfyUI, responsible for converting text prompts into numerical representations that diffusion models can understand. You can think of them as translators, responsible for translating your text into a language that large models can understand. Of course, different models have their own “dialects,” so different CLIP encoders are needed between different architectures to complete the text encoding process.