gittech. site

for different kinds of informations and explorations.

AI Context: produce a context file from various sources to LLM ready Markdown

Published at
Dec 24, 2024
AI Context Logo

AI Context

Release Build GitHub Release

Go Report Card License: MIT GoDoc

Generate AI-friendly markdown files from GitHub repositories, local source code, YouTube videos, or webpages.


A multi-architecture, multi-OS, command-line tool with concurrency support that produces context files in markdown from various sources to make interactions with LLM apps (like ChatGPT, Claude, etc.) easy.

Quickstart

ai-context -u "https://github.com/tanq16/ai-context" # single URL
ai-context -f urllist.file                           # URL file

Features

  • Local Directory Processing
    • this is mainly for locally available code bases (directories or already cloned git repos)
    • the context file includes directory structure and all file contents within context
  • GitHub Repository Processing
    • this clones and processes provided GitHub link and does the same as Local Directory Processing
    • it temporarily clones the repository, so no need for cleanup
    • it also supports private repositories on GitHub through use of GH_TOKEN environment variable
  • YouTube Transcript Processing
    • this downloads transcripts for given YouTube video link and stores it as markdown
    • the transcript also preserves time segments
  • WebPage Processing
    • this converts an HTML webpage to markdown text, stripping off JS and CSS
    • it also downloads all images from the page and stores them locally with UUID filenames
    • the markdown text includes links via local paths to the downloaded images

Installation

  • Binary
    • Download the latest release for your platform and OS from the releases page
    • Binaries are build via GitHub actions for MacOS, Linux, and Windows for both AMD64 (x86_64) and ARM64 (like Apple Silicon) architectures
    • You can also download specific versions if needed; however, the latest version is recommended
  • Go Install
    • Run the following command (requires Go v1.22+):
    go install github.com/tanq16/ai-context@latest
    
    • For specific versions, use binaries or build specific commits as I have not and will not implement Go-native binary versioning
  • Local Build
    git clone https://github.com/tanq16/ai-context.git && \
    cd ai-context
    
    go build .
    

Usage

# Process a single path (local directory) with additional ignore patterns
ai-context -u /path/to/directory  -i "tests,docs,*doc.*"

# Process one URL (GitHub repo or YouTube Video or Webpage URL)
ai-context -u https://www.youtube.com/watch?v=video_id

# Make a list of paths
cat << EOF > listfile
../notif
/working/cybernest
https://github.com/assetnote/h2csmuggler
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html
EOF

# Process URL list concurrently
ai-context -f listfile

# Process private GitHub repository
GH_TOKEN=$(cat /secrets/GH.PAT) ai-context -u https://github.com/ORG/REPO

[!WARNING] For directory path (in URL or listfile mode), the path should either start with / (absolute) or with ./ or ../ (relative). For current directory, always use ./ for correct regex matching.

Output

  • The tool creates a local folder called context and puts everything converted into .md files in that folder
  • The filenames have the syntax of TYPE-PATHNAME.md (example, gh-ffuf_ffuf.md)
  • Every single path in the listfile mode will result in a new context file
  • All images (only downloaded via webpages) are named as UUIDs and stored in the context/images directory (images are downloaded as a conenience, but doesn't take away from text-first context creation)

Command Line Options

  • -u, --url: provide a path (GitHub repo, YouTube video, WebPage link, or relative/absolute directory path) to process
  • -f, --file: provide a file with a list of paths (URLs or directory paths) to process
  • -i, --ignore: add additional patterns to ignore during processing (comma-separated)
  • -t, --threads: (optional) number of workers for concurrent file processing when passing list file (default = 5)
  • --debug: verbose logging (helpful if something isn't working as expected or you want to see individual steps)

[!TIP]

  • Do a head -n 200 context/FILE.md (or 500 lines) to view the content tree of the processed code base or directory to see what's been included. Then refine your -i flag arguments to ignore additional patterns.
  • When processing a large number of items, it can look stalled due to thread limits and image download times; use --debug to enable verbose logs to know what's running.

Default Ignores

The tool includes pre-defined and sensible ignore patterns, including common files and directories that typically don't add value to the context. These are:

  • Version control files (.git, .gitignore)
  • Dependencies (node_modules, vendor)
  • Compiled files (*.exe, *.dll)
  • Media files (images, videos, audio)
  • Documentation files
  • Lock files (package-lock.json, yarn.lock)
  • Build artifacts and caches

For a full list, see aicontext/ignores.go.

Acknowledgments

This project takes inspiration from, uses, or references:

  • repomix: inspiration for turning code into context
  • innertube: inspiration for code to get transcript from YouTube video
  • html-to-markdown: used to convert HTML to MD
  • go-git: git operations in Go