Extract text from PDFs and images
Drop a PDF or image and get clean, structured text in seconds. Research papers, slide decks, scanned pages, screenshots — powered by the same AI vision models top labs use.
Drop in
Sign up free to upload — preview shown opposite.
Get back
Sample outputTitle: Attention Is All You Need Abstract The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. 1. Introduction Recurrent neural networks, long short-term memory and gated recurrent neural networks in particular, have been firmly established as state of the art approaches in sequence modeling and transduction problems such as language modeling and machine translation [...] — extracted from Vaswani et al., "Attention Is All You Need" (2017)
What people use it for
Research papers
Pull text out of PDF papers — abstracts, body, citations — in extractable form for notes, summaries, or downstream analysis.
Slide decks
Conference talks, training decks, sales presentations — get all the slide content as text without retyping.
Scanned pages
Book pages, receipts, contracts, handwritten notes — readable, searchable text from any photo or scan.
Screenshots
Articles, social posts, error messages, dashboards — turn any screenshot into copyable, searchable text.
Multi-column docs
Newspapers, legal filings, scientific journals — column order preserved correctly, no scrambled output.
Foreign-language docs
Arabic, Chinese, Hebrew, Cyrillic, Devanagari, Thai — 90+ languages with full character fidelity.
How it works
- 1
Sign up free
Create a free account in 30 seconds — no credit card required.
- 2
Drop your file
Upload a PDF or image (JPG/PNG). Up to 25 MB per file in free preview.
- 3
Download clean text
Get a .txt and structured .json — readable, searchable, ready to use.
Why this beats basic OCR
Legacy OCR
- Garbled output on tilted scans
- Multi-column docs come back scrambled
- Handwriting fails or hallucinates
- No structure — just a wall of text
AI vision (Claude + OpenAI)
- Reads tilted, low-light, photo-of-screen captures
- Reading order preserved across columns
- Handles handwriting and mixed scripts
- Structured .json with sections and headings
Frequently asked questions
Ready to extract?
Sign up free — no credit card. Get instant text from your first PDF or image, plus full access to transcripts, captions, translation, and subtitle repair.
Get started free