Scanning

odirscan uses a two-phase scanning process built on colly, an asynchronous web scraping framework for Go.

Phase 1: Scan

The Scanner.Scan() method crawls open directory URLs and collects file links.

Directory Validation

Before processing links on a page, odirscan validates that the page is a genuine open directory listing by checking that:

The HTML <title> text matches the first <h1> text
Both are non-empty

Pages that fail this check are skipped.

Link Processing

For each <a href> on a validated page:

Directories (href ends with /) -- Visited recursively, unless the absolute URL contains a keyword from SkipSubdirKeywords.
Files (everything else) -- Added to findings, unless the file extension maps to a MIME type matching a prefix in SkipMimeTypePrefixes.

MIME-Type Filtering

Files are filtered by their extension's MIME type using Go's mime.TypeByExtension. The default skip list excludes:

image/*
font/*
text/css
audio/*
video/*

Subdirectory Filtering

Directories containing any of the configured keywords in their absolute URL are skipped. The defaults target version control directories, dependency caches, and IDE folders.

Phase 2: Tag

The Scanner.Tag() method performs HTTP HEAD requests on every file URL discovered in Phase 1. For each response, it extracts:

Field	HTTP Header	Description
`ContentType`	`Content-Type`	MIME type of the file
`ContentLength`	`Content-Length`	File size in bytes
`LastModified`	`Last-Modified`	When the file was last modified (RFC 1123 format)
`ScanTime`	`Date`	When the server responded

The result is a list of ScanFinding objects ready for display in the web UI.

Phase 1: Scan​

Directory Validation​

Link Processing​

MIME-Type Filtering​

Subdirectory Filtering​

Phase 2: Tag​