Scanning
odirscan uses a two-phase scanning process built on colly, an asynchronous web scraping framework for Go.
Phase 1: Scan
The Scanner.Scan() method crawls open directory URLs and collects file links.
Directory Validation
Before processing links on a page, odirscan validates that the page is a genuine open directory listing by checking that:
- The HTML
<title>text matches the first<h1>text - Both are non-empty
Pages that fail this check are skipped.
Link Processing
For each <a href> on a validated page:
- Directories (href ends with
/) -- Visited recursively, unless the absolute URL contains a keyword fromSkipSubdirKeywords. - Files (everything else) -- Added to findings, unless the file extension maps to a MIME type matching a prefix in
SkipMimeTypePrefixes.
MIME-Type Filtering
Files are filtered by their extension's MIME type using Go's mime.TypeByExtension. The default skip list excludes:
image/*font/*text/cssaudio/*video/*
Subdirectory Filtering
Directories containing any of the configured keywords in their absolute URL are skipped. The defaults target version control directories, dependency caches, and IDE folders.
Phase 2: Tag
The Scanner.Tag() method performs HTTP HEAD requests on every file URL discovered in Phase 1. For each response, it extracts:
| Field | HTTP Header | Description |
|---|---|---|
ContentType | Content-Type | MIME type of the file |
ContentLength | Content-Length | File size in bytes |
LastModified | Last-Modified | When the file was last modified (RFC 1123 format) |
ScanTime | Date | When the server responded |
The result is a list of ScanFinding objects ready for display in the web UI.