Skip to main content

Scanning

odirscan uses a two-phase scanning process built on colly, an asynchronous web scraping framework for Go.

Phase 1: Scan

The Scanner.Scan() method crawls open directory URLs and collects file links.

Directory Validation

Before processing links on a page, odirscan validates that the page is a genuine open directory listing by checking that:

  • The HTML <title> text matches the first <h1> text
  • Both are non-empty

Pages that fail this check are skipped.

For each <a href> on a validated page:

  • Directories (href ends with /) -- Visited recursively, unless the absolute URL contains a keyword from SkipSubdirKeywords.
  • Files (everything else) -- Added to findings, unless the file extension maps to a MIME type matching a prefix in SkipMimeTypePrefixes.

MIME-Type Filtering

Files are filtered by their extension's MIME type using Go's mime.TypeByExtension. The default skip list excludes:

  • image/*
  • font/*
  • text/css
  • audio/*
  • video/*

Subdirectory Filtering

Directories containing any of the configured keywords in their absolute URL are skipped. The defaults target version control directories, dependency caches, and IDE folders.

Phase 2: Tag

The Scanner.Tag() method performs HTTP HEAD requests on every file URL discovered in Phase 1. For each response, it extracts:

FieldHTTP HeaderDescription
ContentTypeContent-TypeMIME type of the file
ContentLengthContent-LengthFile size in bytes
LastModifiedLast-ModifiedWhen the file was last modified (RFC 1123 format)
ScanTimeDateWhen the server responded

The result is a list of ScanFinding objects ready for display in the web UI.