files to prompt (f2p)

Slug: f2p

98565 characters 5689 words

#1. INSTALLATION

#1.1 Primary Requirements

  • click (command-line interface support)
  • Optional modules:
    • rarfile for .rar archives
    • py7zr for .7z archives
    • pathspec for advanced .gitignore matching
    • jinja2 for templated output

Install everything:

pip install click rarfile py7zr pathspec jinja2

Or install only what you need (for example, omit rarfile if you don’t require .rar support).

#1.2 Script File Reference

The provided script is typically named f2p.py, so you would run:

python f2p.py [OPTIONS] [PATHS...]

#2. BASIC USAGE

Run the script as follows:

python f2p.py [OPTIONS] /some/path /another/path
  • The script recursively scans directories and extracts recognized archives:
    • .zip, .rar, .7z, .tar, .gz, .bz2
    • Office Open XML / OpenDocument formats: .docx, .xlsx, .pptx, .odt, .ods, .odp

(Legacy formats such as .doc, .xls, .ppt are not treated as archives.)


#3. FLAGS & OPTIONS

Flag/Option Type / Default Short Description
--extension multiple=True / None -e Restricts processing to specific extensions (archives & Office docs).
--include-hidden is_flag=True / False n/a Considers hidden/dot files and directories.
--ignore-gitignore is_flag=True / False n/a Ignores .gitignore rules in directories.
--ignore multiple=True / None n/a Excludes files matching glob patterns (e.g., *.log).
--output file path / None -o Writes output to the specified file instead of stdout.
--xml is_flag=True / False n/a Outputs content in an XML-like structure.
--template-file file path / None -t Uses a Jinja2 template for custom formatting (requires jinja2).
--max-depth int / 5 -d Limits recursion depth for nested archives (default: 5).

#4. FILE & ENCODING LOGIC

  1. Multiple Encoding Attempts:
    • Tries utf-8 first, then latin-1.
    • If both fail, the file’s content is omitted (with a warning logged).
  2. Archive & Office Document Extraction:
    • .rar extraction requires rarfile, and .7z extraction requires py7zr.
    • Office documents (OOXML/ODF) are processed as .zip archives.
    • Extraction is performed safely to prevent path traversal vulnerabilities.

#5. IGNORING LOGIC

  1. .gitignore / pathspec:
    • When pathspec is installed, advanced .gitignore rules apply.
    • Otherwise, a simpler fnmatch approach is used.
  2. Hidden Files:
    • Hidden items are skipped by default (unless --include-hidden is used).
  3. Additional Patterns:
    • Use --ignore to exclude files (e.g., --ignore="*.log" or --ignore="*_backup.*").

#6. OUTPUT FORMATS

  1. Plain Text (Default):
    • Prints each file’s path, followed by a separator and the file content.
  2. XML-like (--xml):
    • Wraps the content within <section>...</section> elements.
  3. Jinja2 Templates (-t/--template-file):
    • Applies a provided .j2 template to format each file’s content.

#7. EXAMPLES

  1. Restrict to .py & .md and ignore *.log:
    python f2p.py -e .py -e .md --ignore="*.log" /path/to/process
  2. Process hidden files, disable .gitignore, and output to a file:
    python f2p.py --include-hidden --ignore-gitignore -o out.txt /some/path
  3. Output as XML and limit recursion to 3 levels:
    python f2p.py --xml --max-depth=3 /path/to/archives
  4. Use a Jinja2 template:
    python f2p.py --template-file=custom_template.j2 /path/to/files
  5. Process multiple paths:
    python f2p.py /first/path /second/path

#8. Jinja2 EXAMPLES

When invoking:

python f2p.py -t my_template.j2 [PATHS...]

the template receives:

  • {{ path }}: the file’s path
  • {{ content }}: the file’s text content
  • {{ index }}: a numeric counter for labeling

#Example Templates

  1. Minimal Example: Plain-Text Highlight
    File: minimal_example.j2
    File #{{ index }}: {{ path }} ---------------------------- {{ content }} ----------------------------

    Rationale: Displays the file path and content with a simple separator.

  2. Numbered Lines Use Case
    File: numbered_lines.j2
    File: {{ path }} (Index: {{ index }}) ====================================== {% set lines = content.split('\n') %} {% for loop_index, line in lines | enumerate(start=1) %} {{ loop_index }}: {{ line }} {% endfor %} ======================================

    Rationale: Enumerates each line, useful for line-by-line reference.

  3. HTML Output for Browser Rendering
    File: html_output.j2
    <html> <head> <title>File {{ index }}</title> </head> <body> <h2>File Path: {{ path }}</h2> <p><strong>Index:</strong> {{ index }}</p> <hr /> <pre> {{ content }} </pre> </body> </html>

    Rationale: Formats the file content into a simple HTML page.

  4. Markdown Code Snippet
    File: markdown_snippet.j2
    ### File #{{ index }}: {{ path }}

    {{ content }}

    Rationale: Ideal for embedding code or text in a Markdown document.

  5. Summarized Headings Template
    File: summarized_headings.j2
    ["FILE #{{ index }}"] {{ path }} --------------------------------- {% set first_lines = content.split('\n')[:3] %} {% for line in first_lines %} {{ line }} {% endfor %} --------------------------------- (... Content Truncated ...)

    Rationale: Shows only the first few lines to save space.

  6. JSON-Inspired Output Template
    File: json_inspired.j2
    { "index": {{ index }}, "path": "{{ path | replace('\\', '\\\\') }}", "content_lines": [ {% set lines = content.split('\n') %} {% for line in lines %} "{{ line | replace('"','\\"') }}"{{ "," if not loop.last else "" }} {% endfor %} ] }

    Rationale: Outputs the file data in a JSON-like structure.

  7. Columnar Key-Value Template
    File: columnar_kv.j2
    =============== File #{{ index }} =============== Path: {{ path }} --------------- {% for key, val in { 'Characters': content|length, 'Lines': content.split('\n')|length, 'First Line': content.split('\n')[0] if content else '' }.items() %} {{ key }}: {{ val }} {% endfor %} --------------- {{ content }}

    Rationale: Displays statistics followed by the file content.

  8. Interactive-Like Script Template
    File: interactive_prompt.j2
    === File #{{ index }} === LOAD FILE: {{ path }} RUN COMMANDS: 1) SomeProcess --file "{{ path }}" 2) AnotherProcess --analyze "{{ path }}" 3) (Optional) Check content below: {{ content }} =================

    Rationale: Provides a stylized “script” output with follow-up commands.

  9. Task-List / To-Do Style Template
    File: task_list.j2
    ## File #{{ index }}: {{ path }} - [ ] Review lines for errors - [ ] Extract useful references - [ ] Create summary - [ ] Mark for final review Content: {{ content }}

    Rationale: Produces a checklist along with the file content.

  10. Blockquote Slicer Template
    File: blockquote_slicer.j2
    > **File #{{ index }}**: {{ path }} {% for i, line in content.split('\n') | enumerate %} > {{ "%02d" | format(i+1) }} {{ line }} {% endfor %}

    Rationale: Converts each line into a blockquote with a line number.

  11. Content by Word Count Buckets
    File: word_bucket.j2
    ## File #{{ index }}: {{ path }} {% set words = content.split() %} {% if words|length < 30 %} (SHORT FILE) {{ content }} {% elif words|length < 100 %} (MEDIUM FILE) ---BEGIN--- {{ content }} ---END----- {% else %} (LONG FILE - WORD COUNT: {{ words|length }}) [Preview: First 100 words] {{ words[:100]|join(' ') }} [... shortened ...] {% endif %}

    Rationale: Adjusts output based on the file’s length.

  12. Script-Inlining Template (Code + Comments)
    File: inline_script.j2
    ### SCRIPT SNIPPET (INDEX: {{ index }}) # File Path: {{ path }} cat <<'EOF' > output_file_{{ index }}.txt {{ content }} EOF # Explanation: # Writes the file content into "output_file_{{ index }}.txt" using a here-document.

    Rationale: Useful for recreating file content on another system.

  13. Directory Tree Logging Template
    File: directory_tree.j2
    [FILE ENTRY #{{ index }}] PATH: {{ path }} DIR OR FILE: {% if content == '' and '.' not in path.split('/')[-1] %} (Possibly a directory or empty file) {% else %} (File with content) {% endif %} ========= CONTENT START ========= {{ content }} ========= CONTENT END ===========

    Rationale: Distinguishes between empty directories and files with content.

  14. Quick Data Stats with Regex (Custom Filters)
    File: quick_regex_stats.j2
    {% set lines = content.split('\n') %} {% set import_lines = lines | select("match", "^(import|from) ") | list %} {% set todo_lines = lines | select("match", ".*TODO.*") | list %} File #{{ index }}: {{ path }} ============================ TOTAL LINES: {{ lines|length }} IMPORT STATEMENTS: {{ import_lines|length }} TODO MARKERS: {{ todo_lines|length }} -- EXCERPT (first 5 lines) -- {% for l in lines[:5] %} {{ l }} {% endfor %} ----------------------------

    Rationale: Analyzes text (e.g., counting “import” or “TODO” occurrences).


#9. FURTHER NOTES

  • Ensure you have installed all required modules (e.g., rarfile, py7zr) for handling specific archive types.
  • The script processes .docx, .pptx, .xlsx, .odt, .ods, and .odp as archives.
  • Legacy MS Office formats (such as .doc, .xls, .ppt) are not supported.
  • Adjust the --max-depth parameter when processing heavily nested archives.
#!/usr/bin/env python3 import os import sys import tempfile import shutil import zipfile import tarfile import click import logging from fnmatch import fnmatch from typing import Callable, List, Optional, Tuple # Attempt to import optional modules try: import rarfile except ImportError: rarfile = None try: import py7zr except ImportError: py7zr = None try: import pathspec except ImportError: pathspec = None try: from jinja2 import Environment, FileSystemLoader except ImportError: Environment = None FileSystemLoader = None logging.basicConfig( level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s", stream=sys.stderr, ) logger = logging.getLogger(__name__) def is_within_directory(directory: str, target: str) -> bool: """ Checks if the target path is within the specified directory, helping to avoid path traversal vulnerabilities. """ abs_directory = os.path.abspath(directory) abs_target = os.path.abspath(target) return os.path.commonprefix([abs_directory, abs_target]) == abs_directory def safe_extract(tar: tarfile.TarFile, path: str = ".", members=None) -> None: """ Safely extract tar contents, preventing directory traversal. """ for member in (members or tar.getmembers()): member_path = os.path.join(path, member.name) if not is_within_directory(path, member_path): raise ValueError("Detected path traversal attempt.") tar.extractall(path=path, members=members) def handle_zip(file_path: str, extract_dir: str) -> bool: """ Extracts ZIP archives, handling potential exceptions. """ try: with zipfile.ZipFile(file_path, "r") as zf: zf.extractall(extract_dir) return True except zipfile.BadZipFile as e: logger.warning(f"Bad ZIP file {file_path}: {str(e)}") return False def handle_rar(file_path: str, extract_dir: str) -> bool: """ Extracts RAR archives if rarfile is available. """ if not rarfile: logger.warning("RAR handling requires 'rarfile' to be installed.") return False try: with rarfile.RarFile(file_path, "r") as rf: rf.extractall(extract_dir) return True except rarfile.Error as e: logger.warning(f"RAR extraction failed: {str(e)}") return False def handle_7z(file_path: str, extract_dir: str) -> bool: """ Extracts 7z archives if py7zr is available. """ if not py7zr: logger.warning("7z handling requires 'py7zr' to be installed.") return False try: with py7zr.SevenZipFile(file_path, "r") as sz: sz.extractall(extract_dir) return True except py7zr.exceptions.Bad7zFile as e: logger.warning(f"7z extraction failed: {str(e)}") return False def handle_tar(file_path: str, extract_dir: str) -> bool: """ Extracts TAR archives, using safe_extract to avoid path traversal. """ try: with tarfile.open(file_path, "r:*") as tf: safe_extract(tf, extract_dir) return True except tarfile.TarError as e: logger.warning(f"TAR extraction failed: {str(e)}") return False ARCHIVE_HANDLERS = { ".zip": handle_zip, ".rar": handle_rar, ".7z": handle_7z, ".tar": handle_tar, ".gz": handle_tar, ".bz2": handle_tar, } OFFICE_EXTENSIONS = [".docx", ".xlsx", ".pptx", ".odt", ".ods", ".odp"] def read_gitignore(directory: str) -> List[str]: """ Reads lines from .gitignore if present. """ path = os.path.join(directory, ".gitignore") if os.path.isfile(path): with open(path, "r", encoding="utf-8") as f: return [line.strip() for line in f if line.strip() and not line.startswith("#")] return [] def should_ignore(path: str, rules: List[str]) -> bool: """ Basic fnmatch-based ignoring for files or directories. """ base = os.path.basename(path) if os.path.isdir(path): base += "/" return any(fnmatch(base, rule) for rule in rules) class GitignoreHandler: """ Handles ignoring of files or directories based on .gitignore (via pathspec or fallback). """ def __init__(self, directory: str): self.pathspec_spec = None self.fallback_rules = [] lines = read_gitignore(directory) if pathspec: self.pathspec_spec = pathspec.PathSpec.from_lines("gitwildmatch", lines) else: self.fallback_rules = lines def should_ignore(self, path_to_check: str) -> bool: if self.pathspec_spec: return self.pathspec_spec.match_file(path_to_check) return should_ignore(path_to_check, self.fallback_rules) class OutputFormatter: """ Outputs data in either plain text, XML-like format, or via Jinja2 templates if available. """ def __init__( self, writer: Callable[[str], None], xml_mode: bool = False, template_file: Optional[str] = None ): self.writer = writer self.xml_mode = xml_mode self.xml_index = 1 self.template_file = template_file self.jinja_env = None if template_file and Environment and FileSystemLoader: template_dir = os.path.dirname(template_file) self.jinja_env = Environment(loader=FileSystemLoader(template_dir)) def write(self, path: str, content: str) -> None: """ Decides the approach (Jinja2/XML/plain text) for output. """ if self.jinja_env and self.template_file: try: template_name = os.path.basename(self.template_file) template = self.jinja_env.get_template(template_name) rendered = template.render(path=path, content=content, index=self.xml_index) self.writer(rendered) except Exception as e: logger.warning(f"Jinja2 rendering error: {e}") self._fallback_write(path, content) elif self.xml_mode: self.writer(f'<## data-filename="xml_code-block xml" data-code="">') self.writer(f' {path}</source>') self.writer(' ') for line in content.splitlines(): self.writer(f' {line}') self.writer(' ') self.writer('</section>') self.xml_index += 1 else: self._fallback_write(path, content) def _fallback_write(self, path: str, content: str): """ Prints content with separators if not using templates or XML. """ self.writer(path) self.writer("--") self.writer(content) self.writer("") self.writer("--") self.xml_index += 1 class FileProcessor: """ Manages recursion through directories or archives and applies ignoring, formatting, etc. """ def __init__( self, extensions: Tuple[str, ...], include_hidden: bool, ignore_gitignore: bool, ignore_patterns: Tuple[str, ...], formatter: OutputFormatter, max_depth: int = 5, ): self.extensions = [ext.lower() for ext in extensions] self.include_hidden = include_hidden self.ignore_gitignore = ignore_gitignore self.ignore_patterns = ignore_patterns self.formatter = formatter self.max_depth = max_depth def process_path(self, path: str, depth: int = 0, extra_gitignore_rules: List[str] = None) -> None: if extra_gitignore_rules is None: extra_gitignore_rules = [] if depth > self.max_depth: logger.warning(f"Max recursion depth ({self.max_depth}) reached at {path}.") return if os.path.isfile(path): self._handle_file(path, depth) elif os.path.isdir(path): if not self.ignore_gitignore: extra_gitignore_rules.extend(read_gitignore(path)) self._handle_directory(path, depth, extra_gitignore_rules) def _handle_file(self, path: str, depth: int) -> None: ext = os.path.splitext(path)[1].lower() if ext in ARCHIVE_HANDLERS or ext in OFFICE_EXTENSIONS: self._extract_and_recurse(path, ext, depth) else: self._read_and_output(path) def _extract_and_recurse(self, path: str, ext: str, depth: int) -> None: handler_func = ARCHIVE_HANDLERS.get(ext) if ext in OFFICE_EXTENSIONS: # Office documents are ZIP-based archives handler_func = ARCHIVE_HANDLERS[".zip"] if not handler_func: logger.warning(f"No valid handler for extension: {ext}") return with tempfile.TemporaryDirectory() as tmpdir: success = handler_func(path, tmpdir) if success: self.process_path(tmpdir, depth + 1) else: logger.warning(f"Extraction failed for {path}") def _read_and_output(self, path: str) -> None: encodings_to_try = ["utf-8", "latin-1"] for encoding in encodings_to_try: try: with open(path, "r", encoding=encoding) as f: content = f.read() self.formatter.write(path, content) return except UnicodeDecodeError: continue except Exception as e: logger.warning(f"File read error {path}: {e}") return logger.warning(f"Could not read file {path} with provided encodings.") def _handle_directory(self, directory: str, depth: int, extra_gitignore_rules: List[str]) -> None: gitignore_handler = None if not self.ignore_gitignore: gitignore_handler = GitignoreHandler(directory) for root, dirs, files in os.walk(directory): if not self.include_hidden: dirs[:] = [d for d in dirs if not d.startswith(".")] files = [f for f in files if not f.startswith(".")] if gitignore_handler: dirs[:] = [d for d in dirs if not gitignore_handler.should_ignore(os.path.join(root, d))] files = [f for f in files if not gitignore_handler.should_ignore(os.path.join(root, f))] dirs[:] = [d for d in dirs if not should_ignore(os.path.join(root, d), extra_gitignore_rules)] files = [f for f in files if not should_ignore(os.path.join(root, f), extra_gitignore_rules)] if self.ignore_patterns: files = [ f for f in files if not any(fnmatch(f, pattern) for pattern in self.ignore_patterns) ] if self.extensions: files = [ f for f in files if any(f.lower().endswith(ext) for ext in self.extensions) ] for file_name in sorted(files): self.process_path(os.path.join(root, file_name), depth + 1, extra_gitignore_rules) @click.command() @click.argument("paths", nargs=-1, type=click.Path(exists=True)) @click.option("-e", "--extension", "extensions", multiple=True, help="Specify file extensions, e.g. .txt, .md.") @click.option("--include-hidden", is_flag=True, default=False, help="Include hidden files and subdirectories.") @click.option("--ignore-gitignore", is_flag=True, default=False, help="Do not apply .gitignore-based filtering.") @click.option("--ignore", "ignore_patterns", multiple=True, help="Specify one or more glob patterns to exclude.") @click.option("-o", "--output", "output_file", type=click.Path(writable=True), help="Output file path (stdout by default).") @click.option("--xml", "output_xml", is_flag=True, default=False, help="Output in XML-like format.") @click.option("-t", "--template-file", "template_file", type=click.Path(exists=True), help="Use a Jinja2 template for output.") @click.option("-d", "--max-depth", "max_depth", default=5, help="Maximum recursion depth for nested archives.") def cli(paths, extensions, include_hidden, ignore_gitignore, ignore_patterns, output_file, output_xml, template_file, max_depth): """ "f2p" -- Enhanced from "framework1" using "raw_data": 1) Safe recursion-based file and archive handling. 2) Advanced ignoring logic from .gitignore or pathspec. 3) Optional Jinja2-based templating for output formatting. """ writer = click.echo file_handle = None if output_file: try: file_handle = open(output_file, "w", encoding="utf-8") writer = lambda msg: print(msg, file=file_handle) except IOError as e: logger.error(f"Could not open output file {output_file}: {e}") sys.exit(1) formatter = OutputFormatter( writer=writer, xml_mode=output_xml, template_file=template_file ) if output_xml and not template_file: writer("<root>") processor = FileProcessor( extensions=extensions, include_hidden=include_hidden, ignore_gitignore=ignore_gitignore, ignore_patterns=ignore_patterns, formatter=formatter, max_depth=max_depth ) for path in paths: processor.process_path(path) if output_xml and not template_file: writer("</root>") if file_handle: file_handle.close() if __name__ == "__main__": cli()
URL: https://ib.bsb.br/f2p