Skip to content

add agent to one step agent#66

Open
skyDuanXianBing wants to merge 4 commits intoAmberSahdev:mainfrom
skyDuanXianBing:main
Open

add agent to one step agent#66
skyDuanXianBing wants to merge 4 commits intoAmberSahdev:mainfrom
skyDuanXianBing:main

Conversation

@skyDuanXianBing
Copy link

No description provided.

This unifies provider prompting, persists sessions and execution timelines, and adds local verification plus platform adapters so the desktop agent behaves more safely and consistently across runs.
This refreshes the project overview to match the current single-step runtime, shared provider prompt semantics, session-backed request flow, and local verification behavior.
These research artifacts no longer reflect the current repository documentation set, so this drops them to keep the repo focused on the maintained request-flow reference and README.
Copilot AI review requested due to automatic review settings March 15, 2026 11:40
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Open Interface into a structured single-step visual agent loop with an explicit prompt system (v1), expanded provider routing (OpenAI/GPT-5, Claude, Qwen, OpenAI-compatible Gemini), and new platform support utilities (Windows DPI, hotkeys, clipboard), plus a large set of verification/diagnostic scripts.

Changes:

  • Introduces app/prompting/ Prompt System v1 (tool registry/schema, task/timeline/visual context, output contract, prompt dumps).
  • Adds/updates model adapters and routing (models/catalog.py, ModelFactory, Claude/Qwen adapters, Gemini routed through OpenAI-compatible path).
  • Adds local verification and platform utilities (StepVerifier, platform_support/*, Windows DPI awareness), plus multiple verification scripts and diagnostics.

Reviewed changes

Copilot reviewed 74 out of 77 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/verify_windows_openai_computer_use_keys.py Verifies Windows key normalization for computer-use keypress actions.
tests/verify_windows_local_info.py Verifies Windows installed-app scanning behavior via env overrides.
tests/verify_windows_hotkey_mapper.py Verifies Windows hotkey mappings and semantic shortcuts.
tests/verify_windows_dpi_coordinate_mapping.py Verifies DPI-scaled coordinate mapping in verification flow.
tests/verify_visual_agent_mvp.py Smoke-checks agent memory and visual verification pass/fail paths.
tests/verify_text_input_strategy.py Verifies non-ASCII write uses clipboard paste strategy.
tests/verify_settings_refactor.py Validates nested settings + provider routing for factory.
tests/verify_screen_semantic_ocr.py Ensures legacy OCR/semantic fields are retired from screen payload and OCR backend constructor errors.
tests/verify_request_timeout_diagnosis.py Verifies request timeout persistence and runtime application.
tests/verify_prompt_image_archive.py Verifies archived prompt image writing + naming/resize.
tests/verify_macos_doubleclick.py Verifies macOS Quartz multi-click emission and verifier classification.
tests/verify_gpt5_reasoning.py Verifies GPT-5 reasoning request options and settings validation.
tests/verify_disable_local_step_verification.py Verifies skip-verification mode behavior and sleep cadence.
tests/test_config_center_red.py Pytest red-check coverage for settings migration/validation and routing expectations.
tests/simple_test.py Clarifies manual-only GUI smoke helper docstring/comment.
tests/session_store_red_check.py Red-check script for SessionStore schema + Core init ordering.
tests/session_context_red_check.py Red-check script for session history injection + request boundary reset.
tests/qwen_diagnostic.py CLI diagnostic helper for Qwen/DashScope OpenAI-compatible probing.
tests/claude_diagnostic.py CLI diagnostic helper for Claude adapter payload shape and (optional) live request.
tests/chat_ui_red_check.py Red-check script for UI hydration, window sizing, and i18n copy registration.
requirements.txt Removes PyAudio and rubicon-objc dependencies.
build.py Improves PyInstaller path discovery and cross-platform --add-data handling; errors on unsupported zip platform.
app/verifier.py Adds StepVerifier for local before/after image change classification.
app/utils/ocr.py Introduces optional Vision-based OCR backend with safe fallback when unavailable.
app/utils/local_info.py Switches installed-app enumeration to platform-aware helper with safer failure handling.
app/resources/old-context.txt Removes legacy context file.
app/resources/context.txt Replaces legacy guidance with single-step agent/tool/coordinate/output contract rules.
app/prompting/visual_context.py Builds visual context section from frame metadata and grid usage rules.
app/prompting/tool_schema.py Defines tool registry + schema text generation for prompt allowlist.
app/prompting/task_context.py Builds structured task/session/constraints context from request_context + machine profile.
app/prompting/system_context.py Composes system context including schema + custom instructions.
app/prompting/recent_details.py Adds recent-step detailed breakdown section.
app/prompting/output_contract.py Defines strict JSON output contract and example for models.
app/prompting/execution_timeline.py Adds full step timeline rendering for prompt.
app/prompting/debug.py Adds prompt package dump writer to promptdump/.
app/prompting/constants.py Centralizes prompt schema version and sizing caps.
app/prompting/composer.py Joins prompt sections into final user context text.
app/prompting/common.py Utilities for block formatting and bounded parameter/text summarization.
app/prompting/builder.py Builds PromptPackage (system + user contexts + debug text).
app/prompting/init.py Exposes prompting public API surface.
app/platform_support/screen_adapter.py Adds Windows DPI awareness and unified screen capture metrics.
app/platform_support/local_apps.py Adds cross-platform installed-app sample enumeration.
app/platform_support/input_adapter.py Adds hotkey normalization and macOS Quartz multi-click handling.
app/platform_support/hotkey_mapper.py Adds platform-aware key normalization (cmd/option mapping, etc.).
app/platform_support/detector.py Adds platform name detection helpers.
app/platform_support/clipboard_adapter.py Adds clipboard read/write backend with platform fallbacks.
app/platform_support/init.py Re-exports platform support adapters and helpers.
app/models/qwen.py Adds Qwen adapter with VL enforcement and reasoning options via extra_body.
app/models/openai_computer_use.py Updates computer-use-preview adapter to use visual prompt payload + percent coordinates + hotkey mapper.
app/models/gpt5.py Updates GPT-5 adapter to prompt-package-based formatting and parsing.
app/models/gpt4v.py Updates GPT-4V adapter to prompt-package-based formatting and parsing.
app/models/gpt4o.py Updates GPT-4o assistants adapter to use prompt packages, visual prompt upload, and enriched frame context.
app/models/gemini.py Replaces Gemini adapter with GPT4v alias (OpenAI-compatible routing).
app/models/factory.py Adds provider-aware factory routing and argument normalization.
app/models/deprecated/init.py Adds placeholder deprecated package init.
app/models/claude.py Adds Claude adapter via Anthropic-compatible HTTP API with thinking support.
app/models/catalog.py Adds provider/model catalog, defaults, and model capability helpers.
app/llm.py Refactors LLM wrapper to provider-aware settings mapping and prompt runtime sync.
app/app.py Adds initial session hydration and routes structured core events to UI.
app/agent_memory.py Adds compact agent memory structure and payload builder.
README.md Updates product description and documents new architecture/prompt system.
.gitignore Adds new ignored paths (.venv, etc.).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +105 to +110
message = routed_model.format_user_request_for_llm(
SAMPLE_REQUEST,
0,
build_visual_payload(),
None,
)
Comment on lines 256 to 273
return [
{
'function': 'moveTo',
'parameters': {'x': start_x, 'y': start_y}
'parameters': {
'x_percent': start_coords['x_percent'],
'y_percent': start_coords['y_percent'],
},
},
{
'function': 'dragTo',
'parameters': {'x': end_x, 'y': end_y, 'duration': 0.2, 'button': 'left'}
}
'parameters': {
'x_percent': end_coords['x_percent'],
'y_percent': end_coords['y_percent'],
'duration': 0.2,
'button': 'left',
},
},
]
Comment on lines 222 to 233
@@ -182,14 +228,14 @@ def convert_action_to_steps(self, action: Any) -> list[dict[str, Any]]:
return [{
'function': 'press',
'parameters': {
'key': normalized_keys[0]
}
'key': normalized_keys[0],
},
}]
return [{
'function': 'write',
'parameters': {
'string': self.read_obj(action, 'text') or '',
Comment on lines +87 to +93
routed_model = ModelFactory.create_model(
args.model,
args.api_key,
args.base_url,
'请只返回 JSON。',
provider_type='anthropic_compatible',
)
@AmberSahdev
Copy link
Owner

woah thank you @skyDuanXianBing, really extensive PR! Let me review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants