From version 3.819.1007
Additional advanced parameters were added like below under the “TABLE” option.
To utilize these options users must understand functions and terminologies of the base Python technology of this plugin which is called pdfplubmer.
For more resources, please visit these websites.
- Operation mode either TEXT mode or TABLE mode
- Input PDF file (.pdf but digitally generated PDF only) as input (file path)
- Output file name and path (For TABLE option, you can choose .csv or .txt – For TEXT option only .txt is available)
- Page number
- Table number (when there are multiple tables in a PDF they will have index (number) from the top to bottom)
- Separator to be used to separate values in table
- Horizontal and Vertical Strategies – this is used to determine the boundaries of values in the table when “lines” are not very clear
- Lines – strict
String Full file path for the output file
Csv Full file path for the output file
File Full file path for the output file
0 Execution Successful
1 The table is not included in PDFfile
9 All other responses from the plugin
For TABLE option
Text from Image
For TEXT option
Text from Image
More tips for TABLE option
Table index: Select the table within the selected page.
Separator: Please enter a separator which will be inserted between words in exported .txt file (default=‘,’)
What are VERTICAL and HORIZONTAL Strategies
Use the page's graphical lines — including the sides of rectangle objects — as the borders of potential table-cells.
Use the page's graphical lines — but not the sides of rectangle objects — as the borders of potential table-cells.
For vertical_strategy: Deduce the (imaginary) lines that connect the left, right, or center of words on the page, and use those lines as the borders of potential table-cells. For horizontal_strategy, the same but using the tops of words.