Add LoRA multimethod export to CoreML static LLM export#18347
Add LoRA multimethod export to CoreML static LLM export#18347lucylq wants to merge 15 commits intogh/lucylq/144/headfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18347
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 2 Unrelated FailuresAs of commit 7ef0005 with merge base dd7464a ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Add --adapter CLI for exporting LoRA adapters as separate methods in a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights across methods. Supports combination with --multifunction for decode/prefill variants per adapter. Authored with Claude. ghstack-source-id: 16ab852 ghstack-comment-id: 4093498502 Pull-Request: #18347
Add --adapter CLI for exporting LoRA adapters as separate methods in a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights across methods. Supports combination with --multifunction for decode/prefill variants per adapter. Authored with Claude. ghstack-source-id: eaac058 ghstack-comment-id: 4093498502 Pull-Request: #18347
Add --adapter CLI for exporting LoRA adapters as separate methods in a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights across methods. Supports combination with --multifunction for decode/prefill variants per adapter. Authored with Claude. ghstack-source-id: d3fc801 ghstack-comment-id: 4093498502 Pull-Request: #18347
Add --adapter CLI for exporting LoRA adapters as separate methods in a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights across methods. Supports combination with --multifunction for decode/prefill variants per adapter. Authored with Claude. ghstack-source-id: 54b25aa ghstack-comment-id: 4093498502 Pull-Request: #18347
| "forward": _export_model(model, example_inputs, "base"), | ||
| } | ||
| for name, lora_model in lora_models.items(): | ||
| methods[name] = _export_model(lora_model, example_inputs, name) |
There was a problem hiding this comment.
add methods for each lora
| methods[f"{name}_forward"] = _export_model( | ||
| lora_model, decode_inputs, f"{name} decode" | ||
| ) | ||
| methods[f"{name}_prefill"] = _export_model( |
There was a problem hiding this comment.
add methods for each lora with separate prefill, decode
Not sure if this is how we want to do it, though.
| constant_methods=constant_methods, | ||
| compile_config=edge_compile_config, | ||
| if has_adapters: | ||
| constant_methods["has_lora"] = True |
There was a problem hiding this comment.
Not sure if we should add a method like this here. Or if we add, it should be more granular, like a list of lora methods.
| return False | ||
| return isinstance(m, nn.Linear) | ||
|
|
||
| linear_filter = _exclude_lora if has_lora_modules else None |
There was a problem hiding this comment.
Exclude lora from quantization for now; this should probably be a config.
There was a problem hiding this comment.
actually I think we can keep it in, and exclude if necessary later?
Add --adapter CLI for exporting LoRA adapters as separate methods in
a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights
across methods. Supports combination with --multifunction for
decode/prefill variants per adapter.
Authored with Claude.