Skip to content

Supported file types

Curator extracts text and metadata from common document, text, and image files. Scanned PDFs and images are read with OCR for text and a vision model for content. That extracted content is what makes a file searchable and lets chat cite it.

Any file can still be browsed by filename in All Files even when Curator cannot read its content. It just will not appear in content or semantic search until its text can be extracted.

[!NOTE] PLACEHOLDER: the exact list of supported extensions is not confirmed here. Fill in the confirmed types in the table below and remove this note. Do not publish a precise extension list until it is verified in the app.

CategoryTypesNotes
DocumentsPLACEHOLDERPLACEHOLDER
TextPLACEHOLDERPLACEHOLDER
PDFsPLACEHOLDERText PDFs read directly. Scanned PDFs go through OCR.
ImagesPLACEHOLDERRead with OCR for text and a vision model for content.

For how this content is turned into search, see How processing works.