pdf_utils
pdf_split_pages(input_pdf, splits=None)
¶
Splits a PDF into individual pages or chunks in a temporary directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_pdf
|
Union[BytesIO, BinaryIO, str]
|
Input PDF file in bytes, binary mode, or a file path |
required |
splits
|
Optional[List[int]]
|
Optional list of page numbers to split at. If provided, pages will be grouped into chunks ending at these page numbers. For example, if splits = [4, 9], the result will have pages 1-4, 5-9, and 10-end. If not provided, default to splitting into individual pages. |
None
|
max_workers
|
Maximum number of concurrent workers for parallel processing |
required |
Returns:
Type | Description |
---|---|
Tuple[List[Path], TemporaryDirectory[Any]]
|
Tuple containing: - List of paths to individual PDF pages or chunks - Temporary directory object (caller must call cleanup()) |
Example
paths, tmp_dir = split_pdf_temp("input.pdf")
Use paths...¶
tmp_dir.cleanup() # Clean up temp files when done