add agent to one step agent by skyDuanXianBing · Pull Request #66 · AmberSahdev/Open-Interface

skyDuanXianBing · 2026-03-15T11:40:54Z

No description provided.

This unifies provider prompting, persists sessions and execution timelines, and adds local verification plus platform adapters so the desktop agent behaves more safely and consistently across runs.

This refreshes the project overview to match the current single-step runtime, shared provider prompt semantics, session-backed request flow, and local verification behavior.

These research artifacts no longer reflect the current repository documentation set, so this drops them to keep the repo focused on the maintained request-flow reference and README.

Copilot

Pull request overview

This PR refactors Open Interface into a structured single-step visual agent loop with an explicit prompt system (v1), expanded provider routing (OpenAI/GPT-5, Claude, Qwen, OpenAI-compatible Gemini), and new platform support utilities (Windows DPI, hotkeys, clipboard), plus a large set of verification/diagnostic scripts.

Changes:

Introduces app/prompting/ Prompt System v1 (tool registry/schema, task/timeline/visual context, output contract, prompt dumps).
Adds/updates model adapters and routing (models/catalog.py, ModelFactory, Claude/Qwen adapters, Gemini routed through OpenAI-compatible path).
Adds local verification and platform utilities (StepVerifier, platform_support/*, Windows DPI awareness), plus multiple verification scripts and diagnostics.

Reviewed changes

Copilot reviewed 74 out of 77 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/verify_windows_openai_computer_use_keys.py	Verifies Windows key normalization for computer-use keypress actions.
tests/verify_windows_local_info.py	Verifies Windows installed-app scanning behavior via env overrides.
tests/verify_windows_hotkey_mapper.py	Verifies Windows hotkey mappings and semantic shortcuts.
tests/verify_windows_dpi_coordinate_mapping.py	Verifies DPI-scaled coordinate mapping in verification flow.
tests/verify_visual_agent_mvp.py	Smoke-checks agent memory and visual verification pass/fail paths.
tests/verify_text_input_strategy.py	Verifies non-ASCII write uses clipboard paste strategy.
tests/verify_settings_refactor.py	Validates nested settings + provider routing for factory.
tests/verify_screen_semantic_ocr.py	Ensures legacy OCR/semantic fields are retired from screen payload and OCR backend constructor errors.
tests/verify_request_timeout_diagnosis.py	Verifies request timeout persistence and runtime application.
tests/verify_prompt_image_archive.py	Verifies archived prompt image writing + naming/resize.
tests/verify_macos_doubleclick.py	Verifies macOS Quartz multi-click emission and verifier classification.
tests/verify_gpt5_reasoning.py	Verifies GPT-5 reasoning request options and settings validation.
tests/verify_disable_local_step_verification.py	Verifies skip-verification mode behavior and sleep cadence.
tests/test_config_center_red.py	Pytest red-check coverage for settings migration/validation and routing expectations.
tests/simple_test.py	Clarifies manual-only GUI smoke helper docstring/comment.
tests/session_store_red_check.py	Red-check script for SessionStore schema + Core init ordering.
tests/session_context_red_check.py	Red-check script for session history injection + request boundary reset.
tests/qwen_diagnostic.py	CLI diagnostic helper for Qwen/DashScope OpenAI-compatible probing.
tests/claude_diagnostic.py	CLI diagnostic helper for Claude adapter payload shape and (optional) live request.
tests/chat_ui_red_check.py	Red-check script for UI hydration, window sizing, and i18n copy registration.
requirements.txt	Removes `PyAudio` and `rubicon-objc` dependencies.
build.py	Improves PyInstaller path discovery and cross-platform `--add-data` handling; errors on unsupported zip platform.
app/verifier.py	Adds `StepVerifier` for local before/after image change classification.
app/utils/ocr.py	Introduces optional Vision-based OCR backend with safe fallback when unavailable.
app/utils/local_info.py	Switches installed-app enumeration to platform-aware helper with safer failure handling.
app/resources/old-context.txt	Removes legacy context file.
app/resources/context.txt	Replaces legacy guidance with single-step agent/tool/coordinate/output contract rules.
app/prompting/visual_context.py	Builds visual context section from frame metadata and grid usage rules.
app/prompting/tool_schema.py	Defines tool registry + schema text generation for prompt allowlist.
app/prompting/task_context.py	Builds structured task/session/constraints context from request_context + machine profile.
app/prompting/system_context.py	Composes system context including schema + custom instructions.
app/prompting/recent_details.py	Adds recent-step detailed breakdown section.
app/prompting/output_contract.py	Defines strict JSON output contract and example for models.
app/prompting/execution_timeline.py	Adds full step timeline rendering for prompt.
app/prompting/debug.py	Adds prompt package dump writer to `promptdump/`.
app/prompting/constants.py	Centralizes prompt schema version and sizing caps.
app/prompting/composer.py	Joins prompt sections into final user context text.
app/prompting/common.py	Utilities for block formatting and bounded parameter/text summarization.
app/prompting/builder.py	Builds `PromptPackage` (system + user contexts + debug text).
app/prompting/init.py	Exposes prompting public API surface.
app/platform_support/screen_adapter.py	Adds Windows DPI awareness and unified screen capture metrics.
app/platform_support/local_apps.py	Adds cross-platform installed-app sample enumeration.
app/platform_support/input_adapter.py	Adds hotkey normalization and macOS Quartz multi-click handling.
app/platform_support/hotkey_mapper.py	Adds platform-aware key normalization (cmd/option mapping, etc.).
app/platform_support/detector.py	Adds platform name detection helpers.
app/platform_support/clipboard_adapter.py	Adds clipboard read/write backend with platform fallbacks.
app/platform_support/init.py	Re-exports platform support adapters and helpers.
app/models/qwen.py	Adds Qwen adapter with VL enforcement and reasoning options via `extra_body`.
app/models/openai_computer_use.py	Updates computer-use-preview adapter to use visual prompt payload + percent coordinates + hotkey mapper.
app/models/gpt5.py	Updates GPT-5 adapter to prompt-package-based formatting and parsing.
app/models/gpt4v.py	Updates GPT-4V adapter to prompt-package-based formatting and parsing.
app/models/gpt4o.py	Updates GPT-4o assistants adapter to use prompt packages, visual prompt upload, and enriched frame context.
app/models/gemini.py	Replaces Gemini adapter with GPT4v alias (OpenAI-compatible routing).
app/models/factory.py	Adds provider-aware factory routing and argument normalization.
app/models/deprecated/init.py	Adds placeholder deprecated package init.
app/models/claude.py	Adds Claude adapter via Anthropic-compatible HTTP API with thinking support.
app/models/catalog.py	Adds provider/model catalog, defaults, and model capability helpers.
app/llm.py	Refactors LLM wrapper to provider-aware settings mapping and prompt runtime sync.
app/app.py	Adds initial session hydration and routes structured core events to UI.
app/agent_memory.py	Adds compact agent memory structure and payload builder.
README.md	Updates product description and documents new architecture/prompt system.
.gitignore	Adds new ignored paths (`.venv`, etc.).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/claude_diagnostic.py

+    message = routed_model.format_user_request_for_llm(
+        SAMPLE_REQUEST,
+        0,
+        build_visual_payload(),
+        None,
+    )


app/models/openai_computer_use.py

            return [
                {
                    'function': 'moveTo',
-                    'parameters': {'x': start_x, 'y': start_y}
+                    'parameters': {
+                        'x_percent': start_coords['x_percent'],
+                        'y_percent': start_coords['y_percent'],
+                    },
                },
                {
                    'function': 'dragTo',
-                    'parameters': {'x': end_x, 'y': end_y, 'duration': 0.2, 'button': 'left'}
-                }
+                    'parameters': {
+                        'x_percent': end_coords['x_percent'],
+                        'y_percent': end_coords['y_percent'],
+                        'duration': 0.2,
+                        'button': 'left',
+                    },
+                },
            ]


app/models/openai_computer_use.py

@@ -182,14 +228,14 @@ def convert_action_to_steps(self, action: Any) -> list[dict[str, Any]]:
                return [{
                    'function': 'press',
                    'parameters': {
-                        'key': normalized_keys[0]
-                    }
+                        'key': normalized_keys[0],
+                    },
                }]


app/models/openai_computer_use.py

            return [{
                'function': 'write',
                'parameters': {
                    'string': self.read_obj(action, 'text') or '',


tests/claude_diagnostic.py

+    routed_model = ModelFactory.create_model(
+        args.model,
+        args.api_key,
+        args.base_url,
+        '请只返回 JSON。',
+        provider_type='anthropic_compatible',
+    )


AmberSahdev · 2026-03-21T00:22:13Z

woah thank you @skyDuanXianBing, really extensive PR! Let me review.

skyDuanXianBing added 3 commits March 15, 2026 19:29

refactor request runtime around session history and prompt system v1

de42003

This unifies provider prompting, persists sessions and execution timelines, and adds local verification plus platform adapters so the desktop agent behaves more safely and consistently across runs.

update readme for prompt system v1 request flow

48e448d

This refreshes the project overview to match the current single-step runtime, shared provider prompt semantics, session-backed request flow, and local verification behavior.

remove outdated architecture audit docs

ca4253d

These research artifacts no longer reflect the current repository documentation set, so this drops them to keep the repo focused on the maintained request-flow reference and README.

Copilot AI review requested due to automatic review settings March 15, 2026 11:40

Copilot started reviewing on behalf of skyDuanXianBing March 15, 2026 11:41 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

update prompt image sizing and archive settings

d330f43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add agent to one step agent#66

add agent to one step agent#66
skyDuanXianBing wants to merge 4 commits intoAmberSahdev:mainfrom
skyDuanXianBing:main

skyDuanXianBing commented Mar 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

AmberSahdev commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

skyDuanXianBing commented Mar 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

AmberSahdev commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants