Example images generated using this scenario:
If you haven't already, you may wish to review the README
included in the stable-diffusion-2 directory as it gives a more extensive look at configuration & setup. The rest of this README assumes
prior familiarity and builds upon the ideas found there.
In moving from Stable Diffusion 2
to Stable Diffusion XL [white paper],
we can expect an increase in model file sizes, required resources & loading timeframes. To accommodate, we need to adjust a few KServe wrapper
parameters to be a bit more patient with our model load & processing times. To do so, you'll need permissions to edit the config-defaults
ConfigMap within the knative-serving namespace.
Prior to deploying your model, edit the ConfigMap's YAML data section so that the section resembles this (values added above _example):
data:
max-revision-timeout-seconds: '1800'
revision-response-start-timeout-seconds: '1600'
revision-timeout-seconds: '1800'
_example: |
[rest of the Example Configuration _example block, as-is]Similarly to how we bundled the base model for the previous Stable Diffusion 2 example into our .mar archive, we'll bundle the SDXL base
model for this scenario, however, we'll also & include the refiner model
which will utilizing a StableDiffusionXLImg2ImgPipeline
to further refine results.
The included notebook can be executed in-full to prepare the stable-diffusion.mar file (~12GB) for download and transfer to bucket.
A few changes to this config are needed for easing timeframe allowances.
inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
enable_envvars_config=true
install_py_dep_per_model=true
enable_metrics_api=true
metrics_mode=prometheus
NUM_WORKERS=1
number_of_netty_threads=4
job_queue_size=10
max_response_size=30000000
default_response_timeout=2400
default_startup_timeout=2400
model_store=/mnt/models/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"stable-diffusion":{"1.0":{"defaultVersion":true,"marName":"stable-diffusion.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"startupTimeout":2400,"responseTimeout":2400}}}}
Pre/Post-Processing remains the same as previous example handler, but our initialization & inference need some updates.
For starters, we need to deal with two zip archives that have been included in our .mar archive file. Since these are both sizeable, let's use async threads to
make the timeframe a bit more tolerable:
def unzip_file(self, zip_path, extract_to):
try:
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
zip_ref.extractall(extract_to)
self.output_queue.put(f'Unzipped {zip_path} to {extract_to}')
except Exception as e:
self.output_queue.put(f'Error unzipping {zip_path}: {str(e)}')
def unzip_files_concurrently(self, zip_files, extract_paths):
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [
executor.submit(self.unzip_file, zip_file, extract_to)
for zip_file, extract_to in zip(zip_files, extract_paths)
]
concurrent.futures.wait(futures)
...
def initialize(self, ctx):
...
logger.info("starting new zip threads")
zip_files = [model_dir + "/model.zip", model_dir + "/refiner.zip"]
extract_paths = [model_dir + "/model", model_dir + "/refiner"]
self.output_queue = queue.Queue()
for path in extract_paths:
os.makedirs(path, exist_ok=True)
self.unzip_files_concurrently(zip_files, extract_paths)
self.show_threads_queue()
...
self.pipe = StableDiffusionXLPipeline.from_pretrained(model_dir + "/model")
self.pipe.to(self.device)
logger.info("Diffusion model from path %s loaded successfully", model_dir)
self.refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(model_dir + "/refiner")
#self.pipe.to(self.device) #if we try to send both to cuda, 24gb isn't enough for GPU to handle it
logger.info("Refiner model from path %s loaded successfully", model_dir)In order to take advantage of our refiner model, we need to chain the output from our base model pipeline into the refiner model via StableDiffusionXLImg2ImgPipeline. After refiner's done, we can return our image the same as if it came from base model:
logger.info("start model")
image = self.pipe(
inputs, guidance_scale=7.5, num_inference_steps=50, height=768, width=768
).images
logger.info("done model")
logger.info("start refiner")
image = self.refiner(
prompt=inputs,
negative_prompt=self.negative_prompt,
num_inference_steps=self.n_steps,
denoising_start=self.high_noise_frac,
image=image
).images
logger.info("done refiner")
return imageThe notebook code for hitting the inference endpoint remains the same as before. Given that we're using an XL model, meaning higher resolution than SD2, and also chaining into a refiner, expect the query to take some time to respond.
Similar to before, querying the inference endpoint will reflect the model query in logs. This time, however, notice the transition into refiner model once the 50 steps have been completed for the base model.
2024-10-15T20:28:21,063 [WARN ] W-9000-stable-diffusion_1.0-stderr MODEL_LOG - 100%|██████████| 50/50 [00:24<00:00, 2.23it/s]
2024-10-15T20:28:21,064 [WARN ] W-9000-stable-diffusion_1.0-stderr MODEL_LOG - 100%|██████████| 50/50 [00:24<00:00, 2.02it/s]
2024-10-15T20:28:21,430 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - done model
2024-10-15T20:28:21,430 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - start refiner
2024-10-15T20:28:27,920 [WARN ] W-9000-stable-diffusion_1.0-stderr MODEL_LOG -
2024-10-15T20:28:42,499 [WARN ] W-9000-stable-diffusion_1.0-stderr MODEL_LOG - 0%| | 0/8 [00:00<?, ?it/s]
2024-10-15T20:28:57,092 [WARN ] W-9000-stable-diffusion_1.0-stderr MODEL_LOG - 12%|█▎ | 1/8 [00:14<01:42, 14.58s/it]

