Merge branch "main" into textual_inversion

Fannovel16 · Jan 26, 2023 · 3fb12e4 · 3fb12e4
2 parents 186a266 + 591e3c1
commit 3fb12e4
Show file tree

Hide file tree

Showing 14 changed files with 360 additions and 100 deletions.
diff --git a/README-ja.md b/README-ja.md
@@ -1,7 +1,7 @@
 ## リポジトリについて
 Stable Diffusionの学習、画像生成、その他のスクリプトを入れたリポジトリです。
 
-[README in English](./README.md)
+[README in English](./README.md) ←更新情報はこちらにあります
 
 GUIやPowerShellスクリプトなど、より使いやすくする機能が[bmaltais氏のリポジトリ](https://github.com/bmaltais/kohya_ss)で提供されています（英語です）のであわせてご覧ください。bmaltais氏に感謝します。
 
@@ -16,9 +16,10 @@ GUIやPowerShellスクリプトなど、より使いやすくする機能が[bma
 
 当リポジトリ内およびnote.comに記事がありますのでそちらをご覧ください（将来的にはすべてこちらへ移すかもしれません）。
 
-* note.com [環境整備とDreamBooth学習スクリプトについて](https://note.com/kohya_ss/n/nba4eceaa4863)
+* [DreamBoothの学習について](./train_db_README-ja.md)
 * [fine-tuningのガイド](./fine_tune_README_ja.md):
 BLIPによるキャプショニングと、DeepDanbooruまたはWD14 taggerによるタグ付けを含みます
+* [LoRAの学習について](./train_network_README-ja.md)
 * note.com [画像生成スクリプト](https://note.com/kohya_ss/n/n2693183a798e)
 * note.com [モデル変換スクリプト](https://note.com/kohya_ss/n/n374f316fe4ad)
 
@@ -44,12 +45,11 @@ PowerShellを使う場合、venvを使えるようにするためには以下の
 
 通常の（管理者ではない）PowerShellを開き以下を順に実行します。
 
-
 ```powershell
 git clone https://github.com/kohya-ss/sd-scripts.git
 cd sd-scripts
 
-python -m venv --system-site-packages venv
+python -m venv venv
 .\venv\Scripts\activate
 
 pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
@@ -70,7 +70,7 @@ accelerate config
 git clone https://github.com/kohya-ss/sd-scripts.git
 cd sd-scripts
 
-python -m venv --system-site-packages venv
+python -m venv venv
 .\venv\Scripts\activate
 
 pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
@@ -84,6 +84,8 @@ copy /y .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cud
 accelerate config
 ```
 
+（注:``python -m venv venv`` のほうが ``python -m venv --system-site-packages venv`` より安全そうなため書き換えました。globalなpythonにパッケージがインストールしてあると、後者だといろいろと問題が起きます。）
+
 accelerate configの質問には以下のように答えてください。（bf16で学習する場合、最後の質問にはbf16と答えてください。）
 
 ※0.15.0から日本語環境では選択のためにカーソルキーを押すと落ちます（……）。数字キーの0、1、2……で選択できますので、そちらを使ってください。
@@ -99,7 +101,11 @@ accelerate configの質問には以下のように答えてください。（bf1
 ```
 
 ※場合によって ``ValueError: fp16 mixed precision requires a GPU`` というエラーが出ることがあるようです。この場合、6番目の質問（
-``What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:``）に「0」と答えてください。（id `0`のGPUが使われます。）
+``What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:``）に「0」と答えてください。（id `0`のGPUが使われます。）
+
+### PyTorchとxformersのバージョンについて
+
+他のバージョンでは学習がうまくいかない場合があるようです。特に他の理由がなければ指定のバージョンをお使いください。
 
 ## アップグレード
 

diff --git a/README.md b/README.md
@@ -2,16 +2,30 @@ This repository contains training, generation and utility scripts for Stable Dif
 
 ## Updates
 
-- 15 Jan. 2023, 2023/1/15
-  - Added ``--max_train_epochs`` and ``--max_data_loader_n_workers`` option for each training script. 
-  - If you specify the number of training epochs with ``--max_train_epochs``, the number of steps is calculated from the number of epochs automatically.
-  - You can set the number of workers for DataLoader with ``--max_data_loader_n_workers``, default is 8. The lower number may reduce the main memory usage and the time between epochs, but may cause slower dataloading (training).
-  - ``--max_train_epochs`` と ``--max_data_loader_n_workers`` のオプションが学習スクリプトに追加されました。
-  - ``--max_train_epochs`` で学習したいエポック数を指定すると、必要なステップ数が自動的に計算され設定されます。
-  - ``--max_data_loader_n_workers`` で DataLoader の worker 数が指定できます（デフォルトは8）。値を小さくするとメインメモリの使用量が減り、エポック間の待ち時間も短くなるようです。ただしデータ読み込み（学習時間）は長くなる可能性があります。
+__Stable Diffusion web UI now seems to support LoRA trained by ``sd-scripts``.__ Thank you for great work!!! 
 
-Please read [release version 0.3.0](https://github.com/kohya-ss/sd-scripts/releases/tag/v0.3.0) for recent updates.
-最近の更新情報は [release version 0.3.0](https://github.com/kohya-ss/sd-scripts/releases/tag/v0.3.0) をご覧ください。
+Note: The LoRA models for SD 2.x is not supported too in Web UI.
+
+- 24 Jan. 2023, 2023/1/24
+  - Change the default save format to ``.safetensors`` for ``train_network.py``.
+  - Add ``--save_n_epoch_ratio`` option to specify how often to save. Thanks to forestsource! 
+    - For example, if 5 is specified, 5 (or 6) files will be saved in training.
+  - Add feature to pre-caclulate hash to reduce loading time in the extension. Thanks to space-nuko!
+  - Add bucketing matadata. Thanks to space-nuko!
+  - Fix an error with bf16 model in ``gen_img_diffusers.py``.
+  - ``train_network.py`` のモデル保存形式のデフォルトを ``.safetensors`` に変更しました。
+  - モデルを保存する頻度を指定する ``--save_n_epoch_ratio`` オプションが追加されました。forestsource氏に感謝します。
+    - たとえば 5 を指定すると、学習終了までに合計で5個（または6個）のファイルが保存されます。
+  - 拡張でモデル読み込み時間を短縮するためのハッシュ事前計算の機能を追加しました。space-nuko氏に感謝します。
+  - メタデータにbucket情報が追加されました。space-nuko氏に感謝します。
+  - ``gen_img_diffusers.py`` でbf16形式のモデルを読み込んだときのエラーを修正しました。
+
+Stable Diffusion web UI本体で当リポジトリで学習したLoRAモデルによる画像生成がサポートされたようです。
+
+注：SD2.x用のLoRAモデルはサポートされないようです。
+
+Please read [Releases](https://github.com/kohya-ss/sd-scripts/releases) for recent updates.
+最近の更新情報は [Release](https://github.com/kohya-ss/sd-scripts/releases) をご覧ください。
 
 ##
 
@@ -65,7 +79,7 @@ Open a regular Powershell terminal and type the following inside:
 git clone https://github.com/kohya-ss/sd-scripts.git
 cd sd-scripts
 
-python -m venv --system-site-packages venv
+python -m venv venv
 .\venv\Scripts\activate
 
 pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
@@ -77,9 +91,10 @@ cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\ce
 cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
 
 accelerate config
-
 ```
 
+update: ``python -m venv venv`` is seemed to be safer than ``python -m venv --system-site-packages venv`` (some user have packages in global python).
+
 Answers to accelerate config:
 
 ```txt
@@ -92,11 +107,16 @@ Answers to accelerate config:
 - fp16
 ```
 
-note: Some user reports ``ValueError: fp16 mixed precision requires a GPU`` is occured in training. In this case, answer `0` for the 6th question: 
-``What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:`` 
+note: Some user reports ``ValueError: fp16 mixed precision requires a GPU`` is occurred in training. In this case, answer `0` for the 6th question: 
+``What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:`` 
 
 (Single GPU with id `0` will be used.)
 
+### about PyTorch and xformers
+
+Other versions of PyTorch and xformers seem to have problems with training.
+If there is no other reason, please install the specified version.
+
 ## Upgrade
 
 When a new release comes out you can upgrade your repo with the following command:

diff --git a/fine_tune.py b/fine_tune.py
@@ -200,6 +200,8 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
   # epoch数を計算する
   num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
   num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
+  if (args.save_n_epoch_ratio is not None) and (args.save_n_epoch_ratio > 0):
+    args.save_every_n_epochs = math.floor(num_train_epochs / args.save_n_epoch_ratio) or 1
 
   # 学習する
   total_batch_size = args.train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps

diff --git a/fine_tune_README_ja.md b/fine_tune_README_ja.md
@@ -324,7 +324,7 @@ __※引数を都度書き換えて、別のメタデータファイルに書き
 ## 学習の実行
 たとえば以下のように実行します。以下は省メモリ化のための設定です。
 ```
-accelerate launch --num_cpu_threads_per_process 8 fine_tune.py 
+accelerate launch --num_cpu_threads_per_process 1 fine_tune.py 
     --pretrained_model_name_or_path=model.ckpt 
     --in_json meta_lat.json 
     --train_data_dir=train_data 
@@ -336,7 +336,7 @@ accelerate launch --num_cpu_threads_per_process 8 fine_tune.py
     --save_every_n_epochs=4
 ```
 
-accelerateのnum_cpu_threads_per_processにはCPUのコア数を指定するとよいようです。
+accelerateのnum_cpu_threads_per_processには通常は1を指定するとよいようです。
 
 pretrained_model_name_or_pathに学習対象のモデルを指定します（Stable DiffusionのcheckpointかDiffusersのモデル）。Stable Diffusionのcheckpointは.ckptと.safetensorsに対応しています（拡張子で自動判定）。
 

diff --git a/gen_img_diffusers.py b/gen_img_diffusers.py
@@ -1981,7 +1981,6 @@ def __getattr__(self, item):
       imported_module = importlib.import_module(network_module)
 
       network_mul = 1.0 if args.network_mul is None or len(args.network_mul) <= i else args.network_mul[i]
-      network_dim = None if args.network_dim is None or len(args.network_dim) <= i else args.network_dim[i]
 
       net_kwargs = {}
       if args.network_args and i < len(args.network_args):
@@ -1992,22 +1991,22 @@ def __getattr__(self, item):
           key, value = net_arg.split("=")
           net_kwargs[key] = value
 
-      network = imported_module.create_network(network_mul, network_dim, vae, text_encoder, unet, **net_kwargs)
-      if network is None:
-        return
-
       if args.network_weights and i < len(args.network_weights):
         network_weight = args.network_weights[i]
         print("load network weights from:", network_weight)
 
-        if os.path.splitext(network_weight)[1] == ".safetensors":
+        if model_util.is_safetensors(network_weight):
           from safetensors.torch import safe_open
           with safe_open(network_weight, framework="pt") as f:
             metadata = f.metadata()
           if metadata is not None:
             print(f"metadata for: {network_weight}: {metadata}")
 
-        network.load_weights(network_weight)
+        network = imported_module.create_network_from_weights(network_mul, network_weight, vae, text_encoder, unet, **net_kwargs)
+      else:
+        raise ValueError("No weight. Weight is required.")
+      if network is None:
+        return
 
       network.apply_to(text_encoder, unet)
 
@@ -2518,16 +2517,14 @@ def process_batch(batch, highres_fix, highres_1st=False):
   parser.add_argument("--bf16", action='store_true', help='use bfloat16 / bfloat16を指定し省メモリ化する')
   parser.add_argument("--xformers", action='store_true', help='use xformers / xformersを使用し高速化する')
   parser.add_argument("--diffusers_xformers", action='store_true',
-                      help="use xformers by diffusers (Hypernetworks doen\"t work) / Diffusersでxformersを使用する（Hypernetwork利用不可）")
+                      help="use xformers by diffusers (Hypernetworks doesn\"t work) / Diffusersでxformersを使用する（Hypernetwork利用不可）")
   parser.add_argument("--opt_channels_last", action='store_true',
-                      help="set channels last option to model / モデルにchannles lastを指定し最適化する")
+                      help="set channels last option to model / モデルにchannels lastを指定し最適化する")
   parser.add_argument("--network_module", type=str, default=None, nargs='*',
                       help='Hypernetwork module to use / Hypernetworkを使う時そのモジュール名')
   parser.add_argument("--network_weights", type=str, default=None, nargs='*',
                       help='Hypernetwork weights to load / Hypernetworkの重み')
   parser.add_argument("--network_mul", type=float, default=None, nargs='*', help='Hypernetwork multiplier / Hypernetworkの効果の倍率')
-  parser.add_argument("--network_dim", type=int, default=None, nargs='*',
-                      help='network dimensions (depends on each network) / モジュールの次元数（ネットワークにより定義は異なります）')
   parser.add_argument("--network_args", type=str, default=None, nargs='*',
                       help='additional argmuments for network (key=value) / ネットワークへの追加の引数')
   parser.add_argument("--clip_skip", type=int, default=None, help='layer number from bottom to use in CLIP / CLIPの後ろからn層目の出力を使う')