ColGREP

Semantic code search for your terminal and your coding agents, built on NextPlaid.
Combines regex filtering with semantic ranking with LateOn-Code-edge multi-vector embeddings.
A single Rust binary. No server. No API. 100% local, your code never leaves your machine.

ColGREP demo

Quick Start · Search Modes · Agent Integrations · How It Works · Installation

Quick Start

Install:

# macOS / Linux
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/lightonai/next-plaid/releases/latest/download/colgrep-installer.sh | sh

# Windows (PowerShell)
powershell -c "irm https://github.com/lightonai/next-plaid/releases/latest/download/colgrep-installer.ps1 | iex"

> macOS binaries ship with Apple Accelerate + CoreML enabled — full hardware acceleration out of the box. > > Linux & Windows binaries work immediately but run on CPU only. For hardware acceleration, install via Cargo — see Installation.

Build the index:

colgrep init              # current directory
colgrep init /path/to/project  # or a specific project
colgrep init -y  # auto-confirm for large codebases (>10K code units)

Search:

colgrep "database connection pooling"

No setup, no config, no dependencies. colgrep init builds the index for the first time. After that, every search detects file changes and updates the index automatically before returning results. Supports --model to override the ColBERT model and --pool-factor to control embedding compression.


Search Modes

ColGREP supports three search modes: semantic, regex, and hybrid (both combined).

Semantic Search

Find code by meaning, even when keywords don't match exactly:

colgrep "function that retries HTTP requests"
colgrep "error handling in API layer"
colgrep "authentication middleware" ./src

Regex Search

Use -e for traditional pattern matching (ERE syntax by default):

colgrep -e "async fn\s+\w+"
colgrep -e "TODO|FIXME|HACK"
colgrep -e "impl\s+Display" --include="*.rs"

Hybrid Search

Combine regex filtering with semantic ranking. Regex narrows the candidates, semantics ranks them:

# Find async functions, rank by "error handling"
colgrep -e "async fn" "error handling"

# Find Result types, rank by "database operations"
colgrep -e "Result<" "database operations" --include="*.rs"

# Find TODOs, rank by relevance to "security"
colgrep -e "TODO" "security concerns"

CLI Reference

Search Options

FlagLongDescription
-e--patternRegex pre-filter (ERE syntax)
-E--extended-regexpERE mode (default, kept for grep compat)
-F--fixed-stringsTreat -e as literal string
-w--word-regexpWhole-word match for -e
-k--resultsNumber of results (default: 15)
-n--linesContext lines to show (default: 6)
-l--files-onlyList matching files only
-c--contentShow full function/class content
-r--recursiveRecursive (default, for grep compat)
-y--yesAuto-confirm indexing
--jsonJSON output
--code-onlySkip docs/config files
--includeFilter by glob (e.g., "*.rs")
--excludeExclude files by glob
--exclude-dirExclude directories
--modelOverride ColBERT model
--no-poolDisable embedding pooling
--pool-factorSet pool factor (default: 2)

Filtering

# By file extension
colgrep --include="*.py" "database query"
colgrep --include="*.{ts,tsx}" "React component"

# By path pattern
colgrep --include="src/**/*.rs" "config parsing"
colgrep --include="**/tests/**" "test helper"

# Exclude files or directories
colgrep --exclude="*.test.ts" "component"
colgrep --exclude-dir="vendor" --exclude-dir="node_modules" "import"

# Search specific paths
colgrep "error handling" ./src/api ./src/auth

# Code-only (skip markdown, yaml, json, etc.)
colgrep --code-only "authentication logic"

Glob pattern syntax:

PatternMatches
*.pyAll Python files
*.{ts,tsx}TypeScript and TSX files
src/**/*.rsRust files under src/
**/tests/**Files in any tests/ directory
*_test.goGo test files

Output Modes

# Default: filepath:lines with context
colgrep "authentication"

# Files only (like grep -l)
colgrep -l "database queries"

# Full content with syntax highlighting
colgrep -c "authentication handler" -k 5

# JSON for scripting
colgrep --json "auth" | jq '.[] | .unit.file'

Subcommands

CommandDescription
colgrep initBuild or update the index
colgrep statusShow index status for current project
colgrep clearClear index for current project
colgrep clear --allClear all indexes
colgrep set-model <ID>Change the default ColBERT model
colgrep settingsView or modify configuration
colgrep --statsShow search statistics for all indexes

Configuration

# Show current config
colgrep settings

# Set default results count
colgrep settings --k 20

# Set default context lines
colgrep settings --n 10

# Use INT8 quantized model (faster inference)
colgrep settings --int8

# Use FP32 full precision (more accurate)
colgrep settings --fp32

# Set embedding pool factor (2 = 50% smaller index, 1 = full precision)
colgrep settings --pool-factor 2

# Set parallel encoding sessions (default: CPU count, max 16)
colgrep settings --parallel 8

# Set batch size per session (default: 1 for CPU, 64 for CUDA)
colgrep settings --batch-size 2

# Enable verbose output by default
colgrep settings --verbose

# Reset a value to default (pass 0)
colgrep settings --k 0 --n 0

Change Model

# Temporary (single query)
colgrep "query" --model lightonai/LateOn-Code

# Permanent (clears existing indexes)
colgrep set-model lightonai/LateOn-Code

# Private HuggingFace model
HF_TOKEN=hf_xxx colgrep set-model myorg/private-model

Config stored at ~/.config/colgrep/config.json.


Agent Integrations

AgentInstallUninstall
Claude Codecolgrep --install-claude-codecolgrep --uninstall-claude-code
OpenCodecolgrep --install-opencodecolgrep --uninstall-opencode
Codexcolgrep --install-codexcolgrep --uninstall-codex
> Restart your agent after installing.

Claude Code Integration

The Claude Code integration installs session and task hooks that:

This means Claude Code automatically uses colgrep as its primary search tool when the index is ready.

Complete Uninstall

Remove colgrep from all AI tools, clear all indexes, and delete all data:

colgrep --uninstall

How It Works

flowchart TD
    A["Your codebase"] --> B["Tree-sitter"]
    B --> C["Structured representation"]
    C --> D["LateOn-Code-edge · 17M"]
    D --> E["NextPlaid"]
    E --> F["Search"]

    B -.- B1["Parse functions, methods, classes"]
    C -.- C1["Signature, params, calls, docstring, code"]
    D -.- D1["Multi-vector embedding per code unit · runs on CPU"]
    E -.- E1["Rust index binary · quantized · memory-mapped · incremental"]
    F -.- F1["grep-compatible flags · SQLite filtering · semantic ranking
100% local, your code never leaves your machine"]

    style A fill:#4a90d9,stroke:#357abd,color:#fff
    style B fill:#50b86c,stroke:#3d9956,color:#fff
    style C fill:#50b86c,stroke:#3d9956,color:#fff
    style D fill:#e8913a,stroke:#d07a2e,color:#fff
    style E fill:#e8913a,stroke:#d07a2e,color:#fff
    style F fill:#9b59b6,stroke:#8445a0,color:#fff
    style B1 fill:none,stroke:#888,stroke-dasharray:5 5,color:#888
    style C1 fill:none,stroke:#888,stroke-dasharray:5 5,color:#888
    style D1 fill:none,stroke:#888,stroke-dasharray:5 5,color:#888
    style E1 fill:none,stroke:#888,stroke-dasharray:5 5,color:#888
    style F1 fill:none,stroke:#888,stroke-dasharray:5 5,color:#888

1. Parse

Tree-sitter parses source files into ASTs and extracts code units: functions, methods, classes, constants, and raw code blocks (module-level statements not covered by other units). This gives 100% file coverage.

2. Analyze (5 Layers)

Each code unit is enriched with five layers of analysis:

LayerExtractsExample
ASTSignature, parameters, return type, docstring, parent classdef fetch(url: str) -> Response
Call GraphOutgoing calls + reverse called_byCalls: range, client.get
Control FlowLoops, branches, error handling, cyclomatic complexityhas_error_handling: true
Data FlowVariable declarations and assignmentsVariables: i, e
DependenciesImports used within the functionUses: client, RequestError

3. Build Structured Text

Each unit is converted to a structured text representation before embedding. This gives the model richer signal than raw code alone:

Function: fetch_with_retry
Signature: def fetch_with_retry(url: str, max_retries: int = 3) -> Response
Description: Fetches data from a URL with retry logic.
Parameters: url, max_retries
Returns: Response
Calls: range, client.get
Variables: i, e
Uses: client, RequestError
Code:
def fetch_with_retry(url: str, max_retries: int = 3) -> Response:
    """Fetches data from a URL with retry logic."""
    for i in range(max_retries):
        try:
            return client.get(url)
        except RequestError as e:
            if i == max_retries - 1:
                raise e
File: src / utils / http client http_client.py

File paths are normalized for better semantic matching: separators become spaces, snake_case and CamelCase are split (e.g., HttpClient &rarr; http client).

4. Encode with ColBERT

The ColBERT model produces multi-vector embeddings: ~300 token-level vectors of dimension 128 per code unit (instead of a single vector). At query time, each query token finds its best match across all document tokens (MaxSim scoring). This preserves fine-grained information that single-vector models lose.

The default model is LateOn-Code-edge (17M parameters), optimized for code search and fast enough to run on CPU.

5. Index with PLAID

The PLAID algorithm compresses multi-vector embeddings with product quantization (2-bit or 4-bit) and stores them in a memory-mapped index. Embedding pooling (default factor: 2) further reduces index size by ~50%. Indexes support incremental updates so only changed files are re-encoded.

6. Search

The search pipeline:


Index Management

colgrep init

Build or incrementally update the index for a project without running a search. If the index already exists, only changed files are re-encoded.

colgrep init                                        # current directory
colgrep init ~/projects/myapp                       # specific project
colgrep init -y                                     # auto-confirm for large codebases (>10K code units)
colgrep init --model lightonai/LateOn-Code          # use a specific model
colgrep init --pool-factor 1                        # disable embedding pooling (more precise)

This is useful for:

# Check index status
colgrep status

# Clear current project index
colgrep clear

# Clear all indexes
colgrep clear --all

# Show statistics
colgrep --stats

Indexes are stored outside the project directory:

PlatformLocation
Linux~/.local/share/colgrep/indices/
macOS~/Library/Application Support/colgrep/indices/
Windows%APPDATA%\colgrep\indices\
Each project gets a directory named {project}-{hash8}. Inside:

ColGREP automatically detects and repairs index/metadata desync from interrupted operations.


Supported Languages

Code (25 languages, tree-sitter AST parsing)

LanguageExtensions
Python.py
TypeScript.ts, .tsx
JavaScript.js, .jsx, .mjs
Go.go
Rust.rs
Java.java
C.c, .h
C++.cpp, .cc, .cxx, .hpp, .hxx
C#.cs
Ruby.rb
Kotlin.kt, .kts
Swift.swift
Scala.scala, .sc
PHP.php
Lua.lua
Elixir.ex, .exs
Haskell.hs
OCaml.ml, .mli
R.r, .rmd
Zig.zig
Julia.jl
SQL.sql
Vue.vue
Svelte.svelte
HTML.html, .htm

Text & Config (11 formats, document-level extraction)

FormatExtensions
Markdown.md
Plain text.txt, .rst
AsciiDoc.adoc
Org.org
YAML.yaml, .yml
TOML.toml
JSON.json
DockerfileDockerfile
MakefileMakefile
Shell.sh, .bash, .zsh
PowerShell.ps1

Installation

The pre-built binaries from Quick Start are the fastest way to get started. For hardware acceleration on Linux/Windows, or to build from source, use Cargo.

Cargo

Install from crates.io:

# CPU-only (all platforms)
cargo install colgrep

# macOS with full acceleration (same as pre-built binary)
cargo install colgrep --features "accelerate,coreml"

# Linux with OpenBLAS
cargo install colgrep --features openblas

# Linux with CUDA
cargo install colgrep --features cuda

# Linux with CUDA + TensorRT
cargo install colgrep --features "cuda,tensorrt"

# Windows with DirectML
cargo install colgrep --features directml

Build from Source

git clone https://github.com/lightonai/next-plaid.git
cd next-plaid
cargo install --path colgrep --features "accelerate,coreml"  # or your preferred features

Hardware Acceleration Features

FeaturePlatformDescription
acceleratemacOSApple Accelerate for vector operations
coremlmacOSCoreML for model inference
openblasLinuxOpenBLAS for vector operations
cudaLinux/WindowsNVIDIA CUDA for model inference
tensorrtLinuxNVIDIA TensorRT for model inference
directmlWindowsDirectML for model inference
OpenBLAS setup (Linux) __CODE_BLOCK_17__ Then: __INLINE_CODE_140__

ONNX Runtime

ONNX Runtime is downloaded automatically on first use. No manual installation required.

Lookup order:


Environment Variables

VariableDescription
ORT_DYLIB_PATHPath to ONNX Runtime library
XDG_DATA_HOMEOverride data directory
XDG_CONFIG_HOMEOverride config directory
HF_TOKENHuggingFace token for private models
HUGGING_FACE_HUB_TOKENAlternative HF token variable

Citation

@software{next-plaid,
  title  = {NextPlaid, ColGREP: Multi-vector search, from database to coding agents.},
  url    = {https://github.com/lightonai/next-plaid},
  author = {Sourty, Raphaël},
  year   = {2026},
}

@misc{LateOn-Code,
title  = {LateOn-Code: a Family of State-Of-The-Art Late Interaction Code Retrieval Models},
author = {Chaffin, Antoine},
url    = {https://huggingface.co/collections/lightonai/lateon-code},
year   = {2026}
}

License

Apache-2.0

See Also