Migrating to 2.0#
Camelot 2.0 rolls up a backend migration, performance work, an optional
neural backend, and a few small breaking changes. For most users import
camelot and camelot.read_pdf(...) keep working unchanged. This page
lists what to check when upgrading from 1.0.x, then points at the new
opt-in features.
The full, dated list of changes is in the changelog.
Breaking changes#
Python 3.10+#
Python 3.9 (EOL October 2025) is no longer supported. The minimum is now Python 3.10.
line_scale default is 15 (was documented as 40)#
The CLI and read_pdf docstring used to say the flavor="lattice"
default line_scale was 40, but the Lattice parser always defaulted to
15. The docs now match the implementation. If you relied on the
documented-but-unimplemented 40, set it explicitly:
camelot.read_pdf("file.pdf", flavor="lattice", line_scale=40)
Table.to_excel drops the index/header by default#
Table.to_excel now defaults to index=False, header=False to match
Table.to_csv — Excel exports no longer carry the pandas auto-generated
row index / column header. Opt back in with:
table.to_excel("out.xlsx", index=True, header=True)
TableList materialises its input#
TableList(...) now consumes an iterable into a list at construction (so
bool() / len() work on TableList(generator())). A generator
passed in is exhausted immediately rather than at first access.
PDFHandler.pages is a property#
PDFHandler.pages is now a lazily-resolved property (was an attribute).
Reads are unchanged; only code that set it after subclassing is affected.
PDF backend is now playa-pdf#
The backend moved from pypdf + pdfminer.six to
playa-pdf: a smaller install set,
more accurate encrypted-PDF handling, and faster hot paths. Pure import
camelot callers should see no API change. pdfminer.six is no longer a
direct dependency — playa.miner exposes a PDFMiner-compatible layout
API, so imports through Camelot keep working.
Default lattice engine is "combined"#
flavor="lattice" now defaults to engine="combined" (raster OpenCV
detection plus the PDF’s native vector ruled lines). It is safe by
construction — raster always runs and vector lines can only add — so it is
never worse than the old "raster" default. Pass engine="raster" to
restore the exact pre-2.0 behaviour. (There is no engine="auto".)
New (opt-in) features#
Neural backend for borderless / scanned tables#
A new optional flavor="ml" uses a Table Transformer model for table
structure and fills cell text from the PDF (so it can’t hallucinate
values). It targets borderless tables, where the heuristic parsers plateau.
With OCR it also reads scanned / image-only PDFs. These pull heavier
dependencies, imported lazily, so the core install is unaffected:
pip install "camelot-py[ml]" # borderless
pip install "camelot-py[ml,ocr]" # + scanned PDFs
tables = camelot.read_pdf("report.pdf", flavor="ml") # device="cuda"/"xpu" optional
See How It Works for the design and a borderless benchmark, and How Camelot compares to other tools for how Camelot’s flavors line up against other tools.
Other additions worth knowing#
flavor="auto"pickslatticeornetworkper page.TableList.filter(...)drops low-quality tables by row/column count, accuracy, or whitespace.Table.confidence— a unified[0, 1]quality score inparsing_report.per_page=overrides,replace_text=, list-formstrip_text=,bytes/ file-likeread_pdfinput, and acpu_countcap for parallel runs.