URL & content collection — src/source-collection/. Scripts to generate source URLs (e.g. via LLM or search) and scrape page content into the shared JSON format. See that folder’s README for get_urls.py, collect_sources_from_urls.py, collect_sources.py, and setup.
Scoring — src/content-scoring/scripts/. The main script is scoring.py: it scores source pages (e.g. with Qwen) and writes enriched JSON + CSV. See that folder’s README for usage and --input-file format.