Question 1

What can I extract text from?

Accepted Answer

PDFs (digital and scanned), JPGs, and PNGs. Research papers, slide decks, scanned book pages, screenshots of articles, whiteboard photos, and more. Up to 25 pages or 25 MB per file in the free tier.

Question 2

How is this different from a basic OCR tool?

Accepted Answer

We use vision-capable AI models (Claude and OpenAI) instead of legacy OCR engines, so the output is structured, readable, and preserves layout intent — even on tilted scans, multi-column papers, or handwritten notes. You get a clean .txt and a structured .json artifact, not raw character salad.

Question 3

Do my files leave secure storage?

Accepted Answer

Files are stored in private S3 with signed download URLs and removed after 90 days. Extraction calls go to Anthropic and OpenAI APIs under our enterprise terms — neither provider trains on API content.

Question 4

Can I extract text from images of audio waveforms or video frames?

Accepted Answer

Yes — any image where the text is visible. For video, run the file through the standard transcribe path (it will detect the video and produce a transcript). Text Extract is for static documents and images.

Question 5

How much does it cost?

Accepted Answer

Free preview shows the first page or image. Paid plans charge ~2 credits per PDF page and 1 credit per image. Starter, Team, and Growth plans all include text extraction.

Question 6

What languages are supported?

Accepted Answer

90+ languages, including non-Latin scripts (Arabic, Chinese, Japanese, Korean, Hebrew, Cyrillic, Devanagari, Thai). Mixed-language documents work too.

Extract text from PDFs and images

What people use it for

Research papers

Slide decks

Scanned pages

Screenshots

Multi-column docs

Foreign-language docs

How it works

Sign up free

Drop your file

Download clean text

Why this beats basic OCR

Frequently asked questions

Ready to extract?