Supported file types

Curator extracts text and metadata from common document, text, and image files. Scanned PDFs and images are read with OCR for text and a vision model for content. That extracted content is what makes a file searchable and lets chat cite it.

Any file can still be browsed by filename in All Files even when Curator cannot read its content. It just will not appear in content or semantic search until its text can be extracted.

Supported types

[!NOTE] PLACEHOLDER: the exact list of supported extensions is not confirmed here. Fill in the confirmed types in the table below and remove this note. Do not publish a precise extension list until it is verified in the app.

Category	Types	Notes
Documents	PLACEHOLDER	PLACEHOLDER
Text	PLACEHOLDER	PLACEHOLDER
PDFs	PLACEHOLDER	Text PDFs read directly. Scanned PDFs go through OCR.
Images	PLACEHOLDER	Read with OCR for text and a vision model for content.

For how this content is turned into search, see How processing works.