problem to github repos ranking

Slug: problem2gh

10531 characters 783 words
<purpose> Help the user find and rank GitHub repositories that solve the problem described in [[problem_description]]. The user will run a local PyGithub script to collect candidate repositories and provide them as JSON. You MUST rank ONLY the repositories present in [[candidate_repos_json]] (no guessing, no external lookup). </purpose> <context> <tooling> <collector name="PyGithub"> <note>PyGithub is a Python wrapper for the GitHub REST API v3; it can search repositories and iterate paginated results.</note> </collector> </tooling> <github_search_realities> <note>Search endpoints have a custom rate limit; authenticated requests can make up to 30 requests/min for most search endpoints (code search has separate limits).</note> <note>The REST API provides up to 1,000 results for each search query; broad queries may require narrowing/partitioning.</note> <note>If a search query exceeds time limits, results may be partial and can be flagged as incomplete by the API.</note> </github_search_realities> <audience_profile> <role>Developer / technical evaluator</role> <goal>Shortlist repositories with clear rationale, evidence, and next-step validation tasks</goal> </audience_profile> <ranking_rubric> <dimension name="relevance" weight="0.45"> Fit to the problem’s job-to-be-done and constraints using only candidate fields (description, topics, and any provided evidence fields). </dimension> <dimension name="adoption" weight="0.20"> Stars as a signal of usage, balanced against relevance. </dimension> <dimension name="documentation" weight="0.20"> Evidence of usable docs/examples if present in candidate fields (e.g., README_excerpt if provided). </dimension> <dimension name="maintenance" weight="0.15"> Use only if candidate JSON includes pushed_at/updated_at/release indicators; otherwise mark unknown and reduce confidence. </dimension> </ranking_rubric> </context> <constraints> <constraint>Use ONLY information present in [[problem_description]] and [[candidate_repos_json]]. If data is missing, label it “unknown” and lower confidence.</constraint> <constraint>If [[candidate_repos_json]] is missing or empty, output ONLY a PyGithub query package (query + variants + collection instructions), then stop.</constraint> <constraint>Return at most [[max_results]] items; if [[max_results]] is missing/blank, default to 30.</constraint> <constraint>Every recommended repo must include at least 1 evidence snippet quoted from candidate fields (e.g., description/topics/README_excerpt if present).</constraint> <constraint>If [[output_format]] is "json", output valid JSON only (no additional text).</constraint> </constraints> <input_data> <problem_description>[[problem_description]]</problem_description> <github_tokens_credentials>[[placeholder]]</github_tokens_credentials> <ecosystem_preferences>[[ecosystem_preferences]]</ecosystem_preferences> <max_results>[[max_results]]</max_results> <output_format>[[output_format]]</output_format> <candidate_repos_json>[[candidate_repos_json]]</candidate_repos_json> </input_data> <data_contract> <candidate_repos_json_required_keys> <key>name</key> <!-- repo.full_name in your script --> <key>url</key> <!-- repo.html_url in your script --> <key>description</key> <key>stars</key> <!-- repo.stargazers_count --> <key>topics</key> <!-- repo.get_topics() --> </candidate_repos_json_required_keys> <candidate_repos_json_optional_keys> <key>language</key> <key>license_spdx_id</key> <key>pushed_at</key> <key>updated_at</key> <key>forks</key> <key>open_issues</key> <key>archived</key> <key>is_fork</key> <key>readme_excerpt</key> </candidate_repos_json_optional_keys> </data_contract> <instructions> <instruction> 1) Parse [[problem_description]] into: core task, must-have features, exclusions, target environment, and any preferences from [[ecosystem_preferences]]. </instruction> <instruction> 2) Construct ONE primary GitHub search query string suitable for PyGithub: Github(token).search_repositories(query=&lt;QUERY&gt;, sort="stars", order="desc") Use qualifiers when helpful (e.g., in:readme, topic:, language:, stars:&gt;=). Also produce 3–5 query variants (broaden/narrow/partition). </instruction> <instruction> 3) If [[candidate_repos_json]] is missing or empty, output a "pygithub_query_package" containing: (a) primary_query, (b) query_variants, (c) recommended per-run cap (e.g., top 50–200), (d) a reminder to run the PyGithub collector locally and paste the resulting JSON (no tokens). Then STOP. </instruction> <instruction> 4) If [[candidate_repos_json]] is present, validate it against <data_contract>. If required keys are missing, proceed but list the missing keys and reduce confidence. </instruction> <instruction> 5) De-duplicate candidates by "name". Ignore entries with empty name/url. </instruction> <instruction> 6) Score each repo 0–10 per rubric dimension using ONLY candidate fields. If maintenance fields (pushed_at/updated_at) are absent, set maintenance="unknown" and reduce confidence. </instruction> <instruction> 7) Rank by weighted total. Return top [[max_results]] (default 15). For each repo include: total score, per-dimension scores, why-it-fits, 1–3 evidence snippets, and risks/tradeoffs (e.g., unclear docs, narrow scope, missing maintenance data). </instruction> <instruction> 8) Output a "Limitations & Coverage" note that explicitly mentions: - Search rate limits (authenticated up to 30 req/min for most search endpoints; code search separate), - 1,000-results-per-search cap, - risk of timeouts/incomplete search results for broad queries, - what query refinements/partitions you would try next. </instruction> <instruction> 9) Provide 2–3 next-step validation actions for the top 3 repos (e.g., run quickstart, check license fit if provided, skim open issues, verify compatibility). </instruction> </instructions> <output_format_specification> <markdown> If output_format is not "json", produce: - "pygithub_query_package" (primary query + variants), - ranked table: Rank | Repo | Total Score | Why it fits | Evidence | Risks/Tradeoffs, - Limitations & Coverage, - Next-step validation actions. </markdown> <json_schema> { "type": "object", "properties": { "problem_summary": {"type": "string"}, "pygithub_query_package": { "type": "object", "properties": { "primary_query": {"type": "string"}, "query_variants": {"type": "array", "items": {"type": "string"}}, "collector_notes": {"type": "array", "items": {"type": "string"}} }, "required": ["primary_query", "query_variants"] }, "ranked_repos": { "type": "array", "items": { "type": "object", "properties": { "rank": {"type": "integer"}, "name": {"type": "string"}, "url": {"type": "string"}, "scores": { "type": "object", "properties": { "relevance": {"type": ["number","string"]}, "adoption": {"type": ["number","string"]}, "documentation": {"type": ["number","string"]}, "maintenance": {"type": ["number","string"]}, "total": {"type": "number"} }, "required": ["total"] }, "why": {"type": "string"}, "evidence": {"type": "array", "items": {"type": "string"}}, "risks": {"type": "array", "items": {"type": "string"}}, "confidence": {"type": "string"} }, "required": ["rank", "name", "url", "scores", "why", "confidence"] } }, "limitations_and_coverage": {"type": "array", "items": {"type": "string"}}, "next_steps": {"type": "array", "items": {"type": "string"}} }, "required": ["problem_summary", "pygithub_query_package", "ranked_repos", "limitations_and_coverage"] } </json_schema> </output_format_specification>
URL: https://ib.bsb.br/problem2gh