Open
Conversation
This unifies provider prompting, persists sessions and execution timelines, and adds local verification plus platform adapters so the desktop agent behaves more safely and consistently across runs.
This refreshes the project overview to match the current single-step runtime, shared provider prompt semantics, session-backed request flow, and local verification behavior.
These research artifacts no longer reflect the current repository documentation set, so this drops them to keep the repo focused on the maintained request-flow reference and README.
There was a problem hiding this comment.
Pull request overview
This PR refactors Open Interface into a structured single-step visual agent loop with an explicit prompt system (v1), expanded provider routing (OpenAI/GPT-5, Claude, Qwen, OpenAI-compatible Gemini), and new platform support utilities (Windows DPI, hotkeys, clipboard), plus a large set of verification/diagnostic scripts.
Changes:
- Introduces
app/prompting/Prompt System v1 (tool registry/schema, task/timeline/visual context, output contract, prompt dumps). - Adds/updates model adapters and routing (
models/catalog.py,ModelFactory, Claude/Qwen adapters, Gemini routed through OpenAI-compatible path). - Adds local verification and platform utilities (
StepVerifier,platform_support/*, Windows DPI awareness), plus multiple verification scripts and diagnostics.
Reviewed changes
Copilot reviewed 74 out of 77 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/verify_windows_openai_computer_use_keys.py | Verifies Windows key normalization for computer-use keypress actions. |
| tests/verify_windows_local_info.py | Verifies Windows installed-app scanning behavior via env overrides. |
| tests/verify_windows_hotkey_mapper.py | Verifies Windows hotkey mappings and semantic shortcuts. |
| tests/verify_windows_dpi_coordinate_mapping.py | Verifies DPI-scaled coordinate mapping in verification flow. |
| tests/verify_visual_agent_mvp.py | Smoke-checks agent memory and visual verification pass/fail paths. |
| tests/verify_text_input_strategy.py | Verifies non-ASCII write uses clipboard paste strategy. |
| tests/verify_settings_refactor.py | Validates nested settings + provider routing for factory. |
| tests/verify_screen_semantic_ocr.py | Ensures legacy OCR/semantic fields are retired from screen payload and OCR backend constructor errors. |
| tests/verify_request_timeout_diagnosis.py | Verifies request timeout persistence and runtime application. |
| tests/verify_prompt_image_archive.py | Verifies archived prompt image writing + naming/resize. |
| tests/verify_macos_doubleclick.py | Verifies macOS Quartz multi-click emission and verifier classification. |
| tests/verify_gpt5_reasoning.py | Verifies GPT-5 reasoning request options and settings validation. |
| tests/verify_disable_local_step_verification.py | Verifies skip-verification mode behavior and sleep cadence. |
| tests/test_config_center_red.py | Pytest red-check coverage for settings migration/validation and routing expectations. |
| tests/simple_test.py | Clarifies manual-only GUI smoke helper docstring/comment. |
| tests/session_store_red_check.py | Red-check script for SessionStore schema + Core init ordering. |
| tests/session_context_red_check.py | Red-check script for session history injection + request boundary reset. |
| tests/qwen_diagnostic.py | CLI diagnostic helper for Qwen/DashScope OpenAI-compatible probing. |
| tests/claude_diagnostic.py | CLI diagnostic helper for Claude adapter payload shape and (optional) live request. |
| tests/chat_ui_red_check.py | Red-check script for UI hydration, window sizing, and i18n copy registration. |
| requirements.txt | Removes PyAudio and rubicon-objc dependencies. |
| build.py | Improves PyInstaller path discovery and cross-platform --add-data handling; errors on unsupported zip platform. |
| app/verifier.py | Adds StepVerifier for local before/after image change classification. |
| app/utils/ocr.py | Introduces optional Vision-based OCR backend with safe fallback when unavailable. |
| app/utils/local_info.py | Switches installed-app enumeration to platform-aware helper with safer failure handling. |
| app/resources/old-context.txt | Removes legacy context file. |
| app/resources/context.txt | Replaces legacy guidance with single-step agent/tool/coordinate/output contract rules. |
| app/prompting/visual_context.py | Builds visual context section from frame metadata and grid usage rules. |
| app/prompting/tool_schema.py | Defines tool registry + schema text generation for prompt allowlist. |
| app/prompting/task_context.py | Builds structured task/session/constraints context from request_context + machine profile. |
| app/prompting/system_context.py | Composes system context including schema + custom instructions. |
| app/prompting/recent_details.py | Adds recent-step detailed breakdown section. |
| app/prompting/output_contract.py | Defines strict JSON output contract and example for models. |
| app/prompting/execution_timeline.py | Adds full step timeline rendering for prompt. |
| app/prompting/debug.py | Adds prompt package dump writer to promptdump/. |
| app/prompting/constants.py | Centralizes prompt schema version and sizing caps. |
| app/prompting/composer.py | Joins prompt sections into final user context text. |
| app/prompting/common.py | Utilities for block formatting and bounded parameter/text summarization. |
| app/prompting/builder.py | Builds PromptPackage (system + user contexts + debug text). |
| app/prompting/init.py | Exposes prompting public API surface. |
| app/platform_support/screen_adapter.py | Adds Windows DPI awareness and unified screen capture metrics. |
| app/platform_support/local_apps.py | Adds cross-platform installed-app sample enumeration. |
| app/platform_support/input_adapter.py | Adds hotkey normalization and macOS Quartz multi-click handling. |
| app/platform_support/hotkey_mapper.py | Adds platform-aware key normalization (cmd/option mapping, etc.). |
| app/platform_support/detector.py | Adds platform name detection helpers. |
| app/platform_support/clipboard_adapter.py | Adds clipboard read/write backend with platform fallbacks. |
| app/platform_support/init.py | Re-exports platform support adapters and helpers. |
| app/models/qwen.py | Adds Qwen adapter with VL enforcement and reasoning options via extra_body. |
| app/models/openai_computer_use.py | Updates computer-use-preview adapter to use visual prompt payload + percent coordinates + hotkey mapper. |
| app/models/gpt5.py | Updates GPT-5 adapter to prompt-package-based formatting and parsing. |
| app/models/gpt4v.py | Updates GPT-4V adapter to prompt-package-based formatting and parsing. |
| app/models/gpt4o.py | Updates GPT-4o assistants adapter to use prompt packages, visual prompt upload, and enriched frame context. |
| app/models/gemini.py | Replaces Gemini adapter with GPT4v alias (OpenAI-compatible routing). |
| app/models/factory.py | Adds provider-aware factory routing and argument normalization. |
| app/models/deprecated/init.py | Adds placeholder deprecated package init. |
| app/models/claude.py | Adds Claude adapter via Anthropic-compatible HTTP API with thinking support. |
| app/models/catalog.py | Adds provider/model catalog, defaults, and model capability helpers. |
| app/llm.py | Refactors LLM wrapper to provider-aware settings mapping and prompt runtime sync. |
| app/app.py | Adds initial session hydration and routes structured core events to UI. |
| app/agent_memory.py | Adds compact agent memory structure and payload builder. |
| README.md | Updates product description and documents new architecture/prompt system. |
| .gitignore | Adds new ignored paths (.venv, etc.). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+105
to
+110
| message = routed_model.format_user_request_for_llm( | ||
| SAMPLE_REQUEST, | ||
| 0, | ||
| build_visual_payload(), | ||
| None, | ||
| ) |
Comment on lines
256
to
273
| return [ | ||
| { | ||
| 'function': 'moveTo', | ||
| 'parameters': {'x': start_x, 'y': start_y} | ||
| 'parameters': { | ||
| 'x_percent': start_coords['x_percent'], | ||
| 'y_percent': start_coords['y_percent'], | ||
| }, | ||
| }, | ||
| { | ||
| 'function': 'dragTo', | ||
| 'parameters': {'x': end_x, 'y': end_y, 'duration': 0.2, 'button': 'left'} | ||
| } | ||
| 'parameters': { | ||
| 'x_percent': end_coords['x_percent'], | ||
| 'y_percent': end_coords['y_percent'], | ||
| 'duration': 0.2, | ||
| 'button': 'left', | ||
| }, | ||
| }, | ||
| ] |
Comment on lines
222
to
233
| @@ -182,14 +228,14 @@ def convert_action_to_steps(self, action: Any) -> list[dict[str, Any]]: | |||
| return [{ | |||
| 'function': 'press', | |||
| 'parameters': { | |||
| 'key': normalized_keys[0] | |||
| } | |||
| 'key': normalized_keys[0], | |||
| }, | |||
| }] | |||
| return [{ | ||
| 'function': 'write', | ||
| 'parameters': { | ||
| 'string': self.read_obj(action, 'text') or '', |
Comment on lines
+87
to
+93
| routed_model = ModelFactory.create_model( | ||
| args.model, | ||
| args.api_key, | ||
| args.base_url, | ||
| '请只返回 JSON。', | ||
| provider_type='anthropic_compatible', | ||
| ) |
Owner
|
woah thank you @skyDuanXianBing, really extensive PR! Let me review. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.