Natural Language Processing (NLP)
DoxaScan Composer’s Natural Language Processing (NLP) function allows users to talk with documents in natural English speech and say things such as “redact social security numbers, “whiteout email addresses”, or “blackout gender identities” to perform those functions with ease.
Installation, Opening/Closing the Application and Help
Installing DoxaScan Composer
Install the application:
- Launch the DoxaScan installer file. This program will install DoxaScan PDF Composer, and support components into the “Program Files (x86)\DoxaScan8” folder. It will also install the DoxaScan Scan Separator which creates barcode separation pages and DoxaScan ComPara; these may be launched from the desktop icons.
- Follow the standard prompts and instructions.
Doing so will install the DoxaScan product family, which will function in trial mode until it is licensed. To fully activate the program, it must be authorized with a license key.
Opening and Closing the Application
Start the application with the icon placed on the desktop or by selecting the program through the Windows program menu. To exit the app, select the Exit icon in the upper right corner of the application home screen or the “X.”
Authorizing DoxaScan Composer
To activate the product once it has been installed, the license must be approved with a license key. Email or call DoxaScan if you have not automatically received your key in an email after purchase. To access the application in trial mode, enter your email and click on the Free Trial button.
Illustration: DoxaScan Registration Screen displayed on installation.
If you have received your license key, enter your email address and the License Key on the Product Registration screen. Click the Register button that appears to complete the registration.
Accessing Application Training
Access the application training by selecting the appropriate tile from the Training panel of the Main screen.
Illustration: Access the User’s Manual through the Training panel of the dashboard.
Overview of the Composer Interface and Page Assembly Process
Overview of Working with Composer
With Composer, users select files or folders of data for easy page assembly and manipulation. Composer can be used to merge several files into one file or conversely, break a single record into multiple files.
Illustration: DoxaScan Overview of the Process
File Assembly with PDF Composer File Splitting with PDF Composer
Some of the significant uses for Composer include:
File splitting and naming
Privacy information identification and redaction
Converting file formats
Encrypting PDF files
The Composer interface is designed to provide a simple user experience and shortened the learning curve, allowing for immediate use and benefits.
Illustration: DoxaScan Composer Home Dashboard Screen.
The Home Dashboard interface provides access to primary functions to work with your documents. The Projects tile selections allow for creating a project from scratch, opening an existing file, or importing from a designated scanning folder. Recent Files provides a simple interface to recall your most recent documents. Built-in Notifications and Training panels keep you informed and offer simple one-click access to information.
Creating a new project presents the project screen where settings and actions can be edited and processing initiated.
More advanced and future functionality within the application is enabled or disabled depending on your license key. Some of the license controlled aspects of the product include:
- Privacy inspection and redaction
- Privacy Auditor and networked application management
- Cloud-based AI services
Starting and Naming a Project
Creating a New Project
A Composer Project can be created when you click on the New Project tile from the Home Dashboard. This triggers a popup window to name the new project. After naming the project, the Project screen is loaded. Tabs on the left of this screen are used to present settings and actions for:
- Home (returns to the Home Dashboard)
- Document Structure
- Search and Privacy Redaction
- Split Files
- Form Fields
Illustration: PDF Composer Project Screen.
Loading a File or Files into the Project
Composer allows you to add files to a project by selecting them individually with the Add icon. Multiple files are added to your project with the All Files in a folder icon. These icons present a standard file browse window to select the file or folder. Additionally, as a shortcut, the Pictures icon offers a browse dialog box directly to the user’s Picture file folder. Use the Before and After icons to add a file before or after the selected page in the right panel.
Illustration: DoxaScan Composer icons for loading file or files into a project.
Document Structure/Page Assembly and Editing
The Document Structure/Page Assembly Interface
Once a file or files are loaded into a project, the data is represented with thumbnails in the left panel, and all pages in the files are displayed in the right panel.
Illustration: DoxaScan Composer Document Structure Interface
Options to manipulate and view these pages in the right panel appear in the top right. These include:
- Edit “edit this page”
- Delete “delete selected pages”
- Rotate “rotate selected pages”
- Enlarge “make the thumbnails larger”
- Shrink “make the thumbnails smaller”
- Print “print selected pages or the entire document”
- Save “save selected pages or the entire document”
- Email “email the current document”
Single pages in the right panel can be selected by clicking on the thumbnails, and multiple files can be selected with the standard CTRL and Shift keys functions.
The right panel can be expanded to show more page thumbnails by selecting the Collapse icon in the top menu. To display the left panel with file thumbnails again, select the Expand icon or click on the Document Structure View tab.
Selecting a file thumbnail in the right panel highlights its pages in the left panel with a pink background and displays its file size information in a collapsible popup. Learn about file sizing here.
Illustration: Selected files show the associated pages highlighted in pink and the file size popup.
The file types of the files loaded in the project are represented with icons in the file thumbnails in the left panel. The numbers in the upper right of these icons indicate how many pages are in the file.
Reordering Pages and Files in the Page Assembly Panel CAUTION: Composer is not intended to be a heavy-duty image manipulation application. High-resolution image files are rendered with lower resolution when added to a Composer project.
In the Page Assembly Panel, pages can be reordered by selecting them and drag-and-dropping them on the thumbnail of another page. The selected pages will replace the page that they were “dropped” upon.
Double click on an individual page in the Page Assembly window to initiate editing it in the Page Editing window. Mouse over the Edit icon in the lower right corner to display the page editing tools. These include such functions as sketch, highlight, add text, whiteout, blackout, watermark, and more.
Illustration: The Page Editing window presents tools and commands to markup and edit pages.
Use the Copy Text icon on the Page Edit menu to extract paragraphs or selected text from image-based or text-based files by using the application’s OCR capture technology on the top right Page Editing menu. Select the tool and drag-and-drop a rectangular region around the area of interest. The resulting OCR text can then be pasted from your clipboard into various applications. For advanced users, PDF Composer has settings that users can manipulate to improve OCR accuracy. Access the Settings tab’s OCR icon to present the OCR Settings screen.
Use the Rotate command to rotate the page.
See Optimizing File Size for details on manipulating the file size.
The In, Out, All, Window, and Pan icons all provide natural ways to change the page viewport or magnification.
Upon exiting the Page Editing screen, you will be prompted to save any changes you’ve made. It will occur when you apply annotations onto the page such as lines, text, redactions, etc.
Optimizing File Size
As documents are added into the application, the file size can be managed. As you select documents in the Document Tree, it will display the estimated size impact on the newly assembled document, and your size limit fit in a popup window. The popup can be closed by clicking on the double arrow Collapse icon on the popup. You can set your size limit as part of the settings interface. See the File Size Limits in Document Settings. NOTE: DoxaScan Composer is not intended to be a heavy-duty image manipulation application. Large, high-resolution images will be automatically resized when added to the application.
Illustration: Selected file presents the expandable and collapsible File Size window.
Color and greyscale images can be resized to alter the overall size of the document. Select the desired page to display the Page Editing functions and select the Sizing button to resize the page by reducing the number of colors, or converting it to pure black and white (Binarization or Thresholding).
Illustration: Selecting the Sizing setting within Page Editing displays settings to lower the file size.
As you adjust the Sensitivity or Threshold sliders, the image will convert to the black and white image. You’ll notice that by increasing the sensitivity, it will accommodate more neighboring pixels to help determine the resulting image black and white content. Apply any desired changes or select the Reset. Note that once Apply is selected, it cannot be undone. Read more about Adaptive Thresholding Technology and Composer.
Illustration: Changing the Sensitivity and Threshold settings within Page Editing will display the altered page.
Illustration: Sensitivity and Threshold changes applied result in smaller file size. This file has decreased from 1118KB to139KB.
CAUTION: Composer is not intended to be a heavy-duty image manipulation application. High-resolution image files are rendered with lower resolution when added to a Composer project.
CAUTION: Once the Apply icon has been clicked, you cannot undo your changes with the Reset icon
Adaptive Thresholding Technology and Composer
Adaptive thresholding assists in cleaning “dirty” documents or documents that have a colored background which interferes with the foreground data. This can be especially helpful in cleaning forms to improve OCR accuracy. Thresholding converts a color or grayscale image to bitonal. Most scanner and capture software can apply necessary thresholding technology. DoxaScan uses Adaptive Thresholding with advanced algorithms and sensitivity settings, allowing you to optimize the thresholding for your documents. DoxaScan offers adaptive thresholding with the following controls when the Sizing icon is clicked in the Page Editing screen.
Illustration: Adaptive Thresholding submenu with applicable settings.
When working with intricate images, it is best to test your documents with various settings. The Threshold value should be set between 0 and 256. Higher values cause the resulting bi-tonal image to be darker. The exact value necessary depends on the paper, ink, and scanner brightness setting. The Threshold default value is 128. The Sensitivity value should be in the range of 0 to 100. High values bring out small fonts but also increases the number of specks or noise in the image. The default sensitivity value is 50.
Change the settings by using the slider bars for the Threshold and Sensitivity for Reduce to Black/White and Quality for Reduce Colors. If you are not happy with the results, use the slider bars again or hit the Reset icon to return the page to its original form. The changes to the document do not apply automatically; you must click the Apply icon. Note that once the Apply icon has been clicked, you cannot undo your changes with the Reset icon.
CAUTION: Once the Apply icon has, you cannot undo your changes with the Reset icon.
To save your file with the Page Assembly changes, click the Save icon and make your selections in the resulting pop-up windows. Remember to save the files with a different name if you do not want to overwrite the original. Refer to Naming Files and Folders with Barcodes for more on automatically mining barcodes for naming purposes and Saving and Emailing Files with Security and Encryption for more details on the save settings.
- Click the Save icon
- Select your desired file options such as type of PDF file, encryption, and digital rights in the Save and Security popup and click the Save
- Name the new file in the popup file browsing window.
- Select whether to save the selected pages or all pages in the popup window. Clicking Yes saves the chosen pages, and No saves all the pages.
Illustration: Saving changes made in the Page Assembly process.
CAUTION: Use a different file name if you do not want to overwrite an original file.
Privacy Inspector and Redaction
Privacy Inspector and Single Pattern Search Overview
A note about PDF Composer’s Privacy Inspection vs. Single Pattern Search functions, both functions allow you to locate potentially sensitive information in your documents, verify the selections, and then apply redaction.
The only difference is the Privacy Inspector allows you to select multiple search patterns in the Settings section and locate them in your documents in a single search. The Single Pattern Search function enables you to choose a unique search pattern from a pulldown menu and find that pattern in the text.
Illustration: Composer’s Privacy Inspector and Single Pattern Search screen.
Privacy Inspector Overview
Identifying the existence of sensitive data and personal identity on your documents has grown increasingly important.
From the Settings tab, select the Inspector icon from the upper right menu. Doing so displays the available search patterns and the setting for how redactions are to be treated (Whiteout/Blackout/Highlighted). Select your desired privacy search patterns. Several pre-existing scripts using regular expressions (regex) are supported and built-in. This provides common word matching patterns for Social Security Numbers, Credit Card Numbers, Email Addresses, IP Addresses, Gender, Cost, and others. You may also access this settings screen with the Privacy Settings icon from the Search and Privacy Redaction tab. See Creating your Search Patterns to learn how to create and name personalized regex patterns to add to the Privacy Inspector list.
See Text vs. Image-Based File Redactions to understand how file type effects redactions.
Illustration: Composer’s Privacy Inspector Settings screen.
Creating Search Patterns
You can generate search patterns that automatically add to the Privacy Inspector Settings list and the Single Pattern Search pulldown list. Click the Edit icon from the Setting’s Privacy Inspector screen to create your own regex search patterns in the Privacy Pattern Editor. Select the New icon and enter the appropriate information. When completed, select the Save icon or Cancel icon to return to the previous screen. To edit an existing pattern, select the Edit icon and choose the pattern of the pulldown, make your changes, and click on the Update icon.
Illustration: Create custom search strings with regex.
Initiating a Privacy Inspector Search
Select the Search and Privacy Redaction tab and click on the Inspect icon in the left panel. Privacy Inspector will cycle through your selected document set, regardless of the source, and identify suspected keywords or text strings that match your identity settings. Each “hit” is captured and provided in a tree called the Redaction List in the left panel.
Illustration: Initiate a Privacy Inspector search with the Inspect icon.
The hits are viewed, cleared (Clear Results icon) or redacted (Approve All icon) using the icons in the Privacy Inspector area of this screen. They can also be exported to an external file. See Working with the Search Results List to learn how to view and delete items in the Search Results List. Also, see Text vs. Image-Based File Redactions to understand how file type effects redactions.
Single Pattern Search Overview
Using the Single Pattern Search function works primarily the same as the Privacy Inspector detailed above. The main difference is Single Pattern Search searches a single search string available from a pull-down menu using the provided scripts (such as Social Security Numbers, Credit Card Numbers, Email Addresses, IP Addresses, Gender, and Currency) or your own typed text. See Privacy Inspector and Single Pattern Search Overview.
Illustration: The Single Pattern Search screen.
See Text vs. Image-Based File Redactions to see redaction differences in file types.
Initiating a Single Pattern Search
Select the Search and Privacy Redaction tab to load the Search & Redact screen. Use the pull-down menu to select the pattern to search and redact in the Single Pattern Search section of this screen. Optionally, you can type a custom text string.
Select Search Pages options and whether to match whole words and match letter case.
Click the Search icon to find matching patterns which are then highlighted in the document and summarized in the Redaction List.
Illustration: Use the pull-down list to select a single search pattern.
The hits can be viewed, cleared (Clear Results icon) or redacted (Approve All icon) using the icons in the Privacy Inspector area of this screen. They can also be exported to an external file. See Working with the Search Results List to learn how to view and delete items in the Search Results List. Also, see Text vs. Image-Based File Redactions to understand how file type effects redactions.
Working with the Search Results List
Select any hit in the Search Results List (also known as the Redaction List), and the page is presented with the search string highlighted in the Page View.
Illustration: Selecting a “hit” in the Search Results list to view it in the Page View.
To remove an item from the Search Results List, right-click to bring up the “Remove from Redaction List” selection and click it.
Illustration: Results can be removed from the list by right-clicking and selecting the “Remove from Redaction List.
See Text vs. Image-Based File Redactions to see redaction differences in file types.
Text vs. Image-Based File Redactions
Files that are pure text-based PDFs will behave differently than image-based files in how the data is redacted. Text-based files have the text replaced with dots “……” representing the number of characters replaced from the identified search pattern.
Illustration: Text-based files are shown in this example with highlighting and then redaction.
Images are redacted with a whiteout or blackout area as they are scanned image files that are OCR processed first. This selection is found in the Settings’ Privacy Inspector screen.
Illustration: Text-based files are shown in this example with highlighting and then redaction.
Image-based files need a searchable text to perform the redaction process. OCR capabilities are built into Composer and are intended to provide accurate text of the underlying image.
Illustration: DoxaScan Composer OCR’s image files to capture text for privacy searches.
Illustration: Redaction is applied to image-based files as specified in the Setting’s Privacy Inspector.
For advanced users, PDF Composer has settings that users can manipulate to improve OCR accuracy. Access the Settings tab’s OCR icon to present the OCR Settings screen. See OCR Settings for more on fine-tuning OCR results.
Redaction via Page Annotation Tools (Annotation Redactions)
Files can also have manual redactions applied using the whiteout or blackout annotation tool. A sketch, line, or rectangular region is available to draw overlaying the text or image data.
Illustration: Manual annotation redactions can be applied to image-based or text-based files using the whiteout or blackout icon tools.
Manual annotation redactions will place a markup annotation overlaying the image or underlying text. If the original source page was an image, future OCR attempts will not capture any underlying text. If the page was OCR’d before, PDF Composer will remove all OCR data on pages that contain manual annotations.
Pages that were originally text-based files (Word, RTF or PDF Text), the underlying text is still remaining on the document. See the Caution below.
Illustration: An example of an annotation blackout redaction added to an image file.
CAUTION: Underlying text is remaining on the document for files that were initially text-based (Word, RTF or PDF Text, etc.). Therefore, the marked-up text can be recovered. To truly integrate any editing or redaction made via page markup tools, the files must be saved as image files.
Splitting and Naming Files with Barcodes
Barcode Options Screen
Several settings can be configured to define the barcode mining within the application. Click on the Settings tab and then the Barcode icon to access the Barcode Settings screen. Optionally, go to the Split Files tab and select the Barcode Settings icon
Use these settings to define the type of barcode you are looking for, the barcode search order, and image enhancement techniques to use on the barcodes. Additionally, define the PDF input and output and separation page instructions. Define the barcodes to use for file and folder naming in the Document Settings. See Naming Files and Folders with Barcodes.
Illustration: The Barcode Settings screen provides selections to manage how barcodes are identified, improved, and used.
Selecting Options to Identify and Filter Barcodes
From the Barcode Settings screen set:
- Which barcode standards to detect
- Which barcode enhancement should be run
- What search order to look for barcodes
- How should image enhancements be applied to increase barcode accuracy
Selecting Barcode Types
Illustration: The Barcode Type settings
Composer recognizes many different barcode standards, including Barcode 39 and 128 linear barcodes and PDF-417, QRCodes and Data Matrix 2D barcodes. To select which barcode types Composer should detect, check the box next to the barcode type. Multiple selections can be made. To avoid false reads of data, only select the desired barcode types found in your documents.
Selecting Barcode Enhancements
Several image enhancement functions are available. They are provided here for advanced users’ reference.
Illustration: The Barcode Enhancements and filters to improve barcode recognition.
Noise Reduction runs an image through a noise reduction filter before scanning for barcodes. The filter removes marks from an image that are unlikely to be part of a barcode. A larger value will remove larger marks from the image. This increases the chances of finding a barcode in a poor-quality image but also increases the time taken to process an image. A typical value for this option is 10.
Skew Tolerance in the Image Enhancements settings controls the maximum angle from the horizontal or vertical at which a barcode will be recognized. The table below shows the possible values for this property along with the approximate maximum angles:
0 = up to 5 degrees
1 = 13 degrees
2 = 21 degrees
3 = 29 degrees
4 = 37 degrees
5 = 45 degrees
When Composer scans an image for a barcode string, it does not scan every line of the image. The Line Jump setting controls how many scan lines are missed between checks for a barcode. A Line Jump value of 1 means that every scan line in the image will be checked. A lower value for Line Jump will impact the performance of the application but may be useful for poor quality images.
When the Median Filter box is selected, DoxaScan will apply a median filter to the image before checking for barcodes. This is a useful option for high-resolution images that contain speckles of black and white. It is not recommended for images where the black bars or white spaces are less than 2 pixels wide.
When the Use Over Sampling box is enabled, the barcode reader samples three lines at a time (skipping two lines between each sample) and takes the average pixel value. This is useful for images containing both black and white speckles.
Use the Minimal and Maximum Barcode Length settings to help eliminate unwanted barcodes from the output stream.
See Working with Barcodes That Aren’t Recognized for more hints on increasing the readability of your barcodes.
Page Separation Settings
When splitting files on specified barcodes, a new file is created every time a page is detected with a valid barcode of any of the Barcode Types selected in the Barcode Type area. You may choose to select the Remove Separator Pages, Split on Common Barcodes, and Split if Contains options. These easily allow you to use separator pages and quickly delete them. Split on Common Barcodes enables users to save files using the same separation page barcode. This action works best when the resulting file name includes a timestamp %time.
If the Remove Barcode Separator Pages option is selected, separator pages are used for splitting and renaming but are not included in the output files. If a keyword like “invoices” is entered as the Split if Barcode Contains field and you use separator pages with the word “invoices” in the barcode, DoxaScan will split the scanned stack into separate files every time it detects a barcode with “invoices.” For more information, see Working with Separator Pages, Selecting Options to Identify Barcodes, and Renaming Files and Creating Folders.
Barcode Search Order
When opting to split files based on barcodes, users must indicate what order the application should use to search for barcodes on a scanned page: Top to Bottom, Left to Right and Right to Left.
Illustration: Barcode Search Order options.
DoxaScan Composer uses a confidence ranking of (1-5) for each unique barcode found. The higher the confidence setting, the more confident the software must be to capture a barcode value. See Splitting Files with Barcodes to see an example of when the confidence level is displayed.
Illustration: Barcode Confidence setting options and results showing barcode 1, 2, and 3 were code 39 barcodes with confidence rankings of 5.
Barcode Minimum and Maximum Length
You can also control the barcode results by setting a minimum and maximum character length. Any barcodes found that are within this range are kept as valid barcodes. Others are ignored.
Illustration: Barcode Length settings.
Selecting Only Numeric Barcodes
Select the ‘Barcode is Numeric’ checkbox to locate numeric barcodes during barcode mining.
Illustration: Only Numeric Barcodes settings.
Working with Barcodes That Aren’t Recognized
Barcodes can be a bit finicky to work within some cases. The quality of the original print, as well as the settings of the scanner, can have a significant impact on the ability to read barcode data.
The first step in diagnosing barcode problems is visually inspecting the barcodes in the original document. Look for bars that may be touching or full of dots. Below are examples of good and bad images from the same barcode. The good one is the original document before printing it out or scanning. The bad one is the result of a poor-quality printout or scan. It is always a good idea to zoom in or magnify a barcode to see what the bars actually look like.
Illustration: Barcode quality examples.
Good Image: Bad Image:
Hints for Creating High-Quality Barcode Scans
- Print barcode page with a laser printer set to at least 300 dpi.
- Increase the DPI of your scan device. Try going 300 dpi or higher.
- Try scanning in color or greyscale. This action will automatically improve the resolution of the image. If you want to convert the output files back to black and white as color images can be quite large, check the Save as Black and White box in the PDF Options
Illustration: Save as Black and White option.
See Selecting Barcode Enhancements for settings which affect the readability of barcodes.
Working with Separator Pages and Using Specific Barcode Values
The Remove Separator Pages option in the Page Separation area of the Barcode Settings screen applies to situations where scans are created with barcode separator pages. These pages usually only contain barcodes which identify the following data pages in a stack of multiple documents for scanning. The barcode information can be used to name the files and then the separator pages can be removed.
If your documents do not have barcodes, DoxaScan provides a couple of options to create custom barcodes. With the DoxaScan Scan Separator included with your Composer license, create your own separation pages. Alternatively, use the Add Sep icon of the File Splitting screen to place a sequential barcode on pages you have selected in the page editing panel. This method will find open white space on the selected pages to add a barcode based on the definition entered in the Separator Script field of the Barcode Settings.
Illustration: The Add Sep icon adds a barcode to the selected pages and names the barcode as defined in the Separator Script field of the Barcode Settings.
CAUTION: As checking the Remove Separator Pages checkbox removes all pages on which it detects a barcode, it should not be used when you have added a barcode with the Add Sep icon to add a barcode to a data page. It should only be used in situations where there are no barcodes on the data pages.
To remove separator pages, select the Remove Separator Pages checkbox, and every page with a barcode will be removed when a split action is initiated. The files will be named as directed on the Separation Output section of the Document Settings screen. See Using Keywords in File Names and Folders to understand how to assign a file name based on the barcodes if desired. See Document Settings for more. Learn about the free separator page creator software; Scan Separator included with your Composer license.
Selecting Split if Contains field allows you to enter your custom barcode text string to prompt a file split. If you enter a string here, splits will only occur on barcodes that contain that string. A practical application of this option would be to enter “invoice” as the split string. If your scanning stack contains many multipage invoices which are barcoded with the invoice number such as “invoice xxx” on the first page of every invoice, DoxaScan would easily split the stack into multiple files at the first page of every invoice.
CAUTION: As checking the Remove Separator Pages checkbox removes all pages on which it detects a barcode, it should only be used in situations where there are no barcodes on the data pages.
CAUTION: The image quality of barcodes can significantly impact their readability. We recommend a minimum of 300 dpi scanning and using Barcode Enhancement options.
Illustration: Using the Remove Separator Pages option and renaming the file based on Barcode1
Illustration: Using the Split if Contains option with a value of “invoice” and renaming the file based on Barcode 1, barcode search order top to bottom.
CAUTION: If you select Process Splits and DoxaScan does not find any valid barcodes in a file, a pop-up warning is displayed asking you to set proper barcode settings.
CAUTION: If a file contains multiple occurrences of the same barcode and File Name is set to the barcode name, the file will overwrite itself. To prevent this, use the barcode name with a timestamp for the file name or page designator (%time or %page).
Naming Files and Folders with Barcodes
Naming and “routing” (creating folders) files based on barcodes allows you to scan many pages of documents and easily name and separate them based on the barcodes. Real-life examples may be to scan a set of invoices or forms into one file and then let Composer do the work of naming the files and routing them to folders created based on the barcodes. Additionally, if your documents do not have barcodes, DoxaScan provides a couple of options to create your custom barcodes. With the DoxaScan Scan Separator included with your Composer license, create your own separation pages. Alternatively, use the Add Sep icon to place a barcode on pages you have selected in the Page Assembly panel. This method will find open white space on the selected pages to add a sequential barcode that is based on the definition entered in the Separator Script field of the Barcode Settings.
Illustration: The Add Sep icon adds a barcode to the selected pages and names the barcode as defined in the Separator Script field of the Barcode Settings.
CAUTION: Checking the Remove Separator Pages checkbox removes all pages on which it detects a barcode, it should not be used when you have added a barcode with the Add Sep icon to add a barcode to a data page. It should only be used in situations where there are no barcodes on the data pages.
With Composer, you can define how files are named and folders created in the Settings tab’s Document Settings screen. To access this screen from the Settings tab, click on the Document icon. See Using Keywords in File Names and Folders to learn how the barcode keywords are created.
Illustration: The Document Settings provides for file naming and folder creation based on barcodes.
Splitting Files with Barcodes
To split files based on barcodes, confirm the selections in the Barcode Options, and Document Settings screens are correct. See Barcode Options screen and Naming Files and Folders with Barcodes to learn about these settings.
Once a project is populated with a file or files, select the Split Files tab. The pages of the files are displayed in the right panel Page Assembly screen where they can be viewed using the viewing tools in the top-right menu. The following menu items are available in the File Splitting menu.
|Select this icon to begin the splitting process. Note that PDF files are temporarily converted for TIF for barcode recognition.|
|Adds a barcode into blank space within the document. The barcode text is based on the separator script defined in your settings.|
|Save the split files after confirming the splits in the right panel Page Assembly window. Pages with identified splits are shown highlighted with a purple background.|
|Click to go directly to the Barcode Settings screen.|
|Enlarge or reduce the thumbnails of the files in the left panel with these icons.|
Illustration: The Split Files tab interface.
Illustration: Split locations are shown by highlighting the page with a purple background, and the proposed new file breakdown is displayed in the file panel on the left with the number of pages in the file noted in the top right of the file’s thumbnail.
Note you can click on one of the file thumbnails in the right panel to display the detected barcodes and the proposed file name and destination path.
Illustration: Clicking on the file thumbnail displays a popup with detect barcodes and proposed file names and destination paths. Below is the corresponding file in the Windows file system after the Splitting
CAUTION: Remember to save the split files after confirming the splits in the right panel Page Assembly window. Pages with identified splits are shown highlighted with a purple background.
Bookmarking PDF Files
Inserting Bookmarks Based on Barcodes
DoxaScan Composer allows the user to create bookmarks based on barcodes contained in a project’s documents.
Access the Barcode Settings screen with the Settings tab and the Barcode icon to set bookmarks on pages according to the barcode. Select the type of barcode to search for, the barcode search order for the pages and any filters. Enter the codes for the barcodes you want to use as bookmarks in the PDF Bookmarks section of this screen. Set the Separator Script code to your pre-determined text pattern. See the Barcode Options Screen for more details on this screen and Using Keywords and Text to Create File Names, Folders, and Bookmarks to learn about keyword codes
Entering the desired barcode key in the Bookmark Layout field of the PDF Bookmarks section of the Barcode Settings screen identifies the barcodes to use for the addition of bookmarks.
Illustration: The Barcode Settings screen with the PDF Bookmarks settings highlighted
Once the barcode settings have been identified, select the Bookmarks tab.
Illustration: Before processing, the Bookmarks tab shows all pages have a bookmark in the left panel.
To set the bookmarks based on the barcodes as identified in the Barcodes Settings, click on the On Barcodes icon. Composer will flash up a few messages about reading the file and identifying the barcodes. When the file is processed, three bookmarks are identified now in our example illustrated below. You may view the pages in the Page Assembly to confirm the selections if desired, and the Clear icon deletes the bookmarks. If you’d like to save the bookmarks, click on the Save icon in the Page Assembly panel and select your save options.
Illustration: Three bookmarks have been identified in our example.
CAUTION: Remember to save the file with the bookmarks after confirming them in the right panel Page Assembly window.
Entering Data in a PDF Forms Document
Frequently PDF is used as a final publishing format. However, PDF has an option to be used as an entry form that can be edited and saved by the user. If you have documents with form fields, the data can be entered via Composer. To enter data, load a project with a PDF file with form fields. Click on the page to edit in the Page Assembly screen then click on the Edit Form Fields tab. The detected fields are shown in the left panel. Select a field in the left panel and then right-click to present the data entry Form Fields Screen. Enter the data in the entry screen and click the Update icon. Note that the entered data does not display until the next field is selected.
Alternatively, once the field has been selected, you can click on the Edit icon to bring up the data-entry Forms Field screen.
Be sure to save your entries by returning to the Document Structure View tab and saving your file. Learn more about saving files at Saving Files and the Save and Security Window.
Clicking on the Rename icon in the left panel allows you to change the name of the field in the left panel, but note, it does not change the name of the field in the actual document.
Illustration: The Forms Field screen to enter data into PDF fields.
Using Keywords and Text to Create File Names, Folders, and Bookmarks
Renaming Using Keyword Codes
Keyword codes are used in the Separation Output section of the Document Settings screen and the PDF Bookmarks section of the Barcode Settings to create the file and folder names and bookmarks based on the barcode data, date, time, and the page number. DoxaScan can detect multiple barcodes and reads the document from top to bottom, right to left or left to right. The brackets surrounding the keyword name are required, and keyword codes are not case sensitive.
Illustration: The Separation Output screen from the Document Settings screen lets you use keywords to name files and folders. The PDF Bookmark section of the Barcode Settings can contain keywords.
Produces today’s date using the format: MMDDYYYY. For example, if today’s date is July 14, 2010, the resulting value would be 07142010
%Date = 07142010
Produces the page number.
Returns the current hour of day
Returns the current Minutes of day
Returns a sequential number starting at 1
Produces Military formatted time using hours, minutes, seconds, and milliseconds. If the time is 2:12 p.m. with 10 seconds and 432 1000’s of a second is %Time =141210432
%barn or %barcoden
Produces the barcode designated in the name. Barcode1 through Barcode10 can be used.
See Samples below
When there are multiple barcodes on a page, you may create the name based on any combination of the barcodes by using the Barcoden keyword. Remember that you must select the barcode search direction in the Barcode Settings screen. DoxaScan can use up to ten barcodes for file naming.
Illustration: Sample barcode used in the chart below
Samples: Keyword codes with examples using the scanned page with barcodes above in top to bottom order:
%barn or %barcoden
Produces the barcode designated in the name. Barcode1 through Barcode10 can be used.
%bar1 = 202
%bar1-%bar2 = 202-M999999 AB
%date-%bar4 = 07142010-TESTFIRST with a date of July 14, 2010
Users often find it convenient to combine multiple keywords when naming files. For example, they may want to rename to the first detected barcode and add the date and time. Keywords may be combined in any order. To use multiple keywords enter the codes in the same entry field (either File Name, or Create Sub Folder)
Renaming with Free Form Text
To name files or folders with your custom free form text, enter the text string in the File Name or Create Sub Folder fields on the Separation Output section of the Barcode Settings screen. For example, if the word “invoice” is entered for the file name, all documents will be named as “invoice” followed by sequential numbers.
Illustration: Name File with text “invoice.” Resulting in files named invoice_1.pdf, invoice_2.pdf, invoice_3.pdf, etc.
Illustration: File name with Date keyword and text “invoice.” Resulting in files named 08272015 – invoice_1.pdf, 08272015 – invoice_2.pdf, 08272015 – invoice_3.pdfetc.
Saving and Emailing Files with Security and Encryption
Saving Files and the Save and Security Window
With Composer, users select files or folders of files for easy page assembly and manipulation. Composer can be used to merge several files into one file or conversely, break a single file or set of files into multiple files based on barcodes. If you have loaded many files and not elected to split them, a Save Merged action will combine all the pages in all the files into one new file—a file merge function. If you have elected to split a file or set of files based on barcodes, clicking Save, Splits saves the files to the folder named in the Separation Folder field in the Document Settings.
To save your file or files from within the Page Assembly right panel, click the Save Merged or the Save Splits icon as appropriate and make your selections in the resulting Save and Security popup window. If a split action has not been performed, Save Splits is not available. Remember to save the files with a different name if you do not want to overwrite the original. Refer to Naming Files and Folders with Barcodes for more on automatically mining barcodes for naming purposes.
The following options are available on the Save and Security Screen that appears when you select a save action.
- Save to PDF/A (often used for archival file storage)
- Save to Searchable PDF (usually used to make an image-based file searchable through an OCR process).
- Encrypt. You are selecting this checkbox to apply Security and Digital Rights to your document. An owner and user password will then be required.
- Disable Cut/Copy
- Disable Print
- Disable Modify
Adding Digital Rights to PDF Output Files
Users have the option of protecting their output files by applying password security and other digital rights to the document. Digital Rights Management is applied to outbound PDF files to restrict how users access the documents. You can elect to disable printing, modification, or text extraction through the cut/paste mechanism. To limit the permissions of the output PDF file, use the checkboxes on the bottom of the Save and Security screen.
To encrypt files with passwords, select the Encrypt checkbox. In addition to a password, you can also choose from any of the Digital Rights available.
Files that are saved to PDF/A are not eligible for encryption or DRM rights and will disable if the option is selected.
Illustration: The Save and Security popup offers settings to determine how the file should be saved.
Illustration: The Save with Encryption option displays input boxes for the user to create an owner password with a confirmation.
Illustration: Enabling any Digital Rights options will require a User password as well as an Owner password with a confirmation.
Emailing Files and the Send Mail Window
With Composer, users select files or whole folders of files for easy page assembly and manipulation. Composer can be used to merge several files into one file or conversely, break a single file or set of files into multiple files based on barcodes. After you have performed these actions, you may choose to email the results. To begin emailing files, enter your Email settings in the Settings tab. Click Email Settings for more on these selections.
Once the Email settings have been established, clicking on the Email icon in the Page Assembly panel presents the Send Email popup with similar options as the Page Assembly’s Save icon presents.
With DoxaScan Compser’s Privacy Auditor users can automatically detect potential privacy items within documents such as social security numbers, credit card numbers, phone numbers, IP addresses, email addresses and more.
Inspect from within Search & Redact
The inspection tasks are initiated within Search and Redact and will perform a search on all patterns identified within settings to search as part of Privacy Auditing.
The results are displayed in the Search and Redact list with page number and occurrence count.
Editable Search Results
The search results list now has a right click menu item to remove a selected redaction candidate item from redaction. This was a planned enhancement.
A local or centralized database would control the auditing activity. In addition, it will control print, save, and email actions when privacy data is suspected on documents.
This offers uniqueness not found in other PDF tools and has the potential for companies to standardize on this tool instead of uncontrolled PDF tools.
The detailed Audit option will store all text infractions.
Changes to the Identity Inspector are now preserved.
Local or Network DB
Managers would be able to identify a local (MS Access) or shared network database to manage auditing settings and logging data. This offers IT staff a centralized management platform to control auditing and actions when infractions are encountered.
The privacy auditor now brings up a search tool allowing IT staff to search users privacy activity. Files with infractions are identified along with user, date and scanner (if applicable). Detailed auditing stores the individual text string and page ID for each infraction encountered.
Search by name and/or date. Search by email, date or both.
The privacy tree displays all users and associated files matching the search criteria (date and email string).
The user dashboard summarizes infractions and allows selecting to see detailed infractions.
When a user is selected the dashboard displays files encountered and summarizes the infraction rates for PII data groups (SSN, CCards, etc.)
PII Tile Selections
Selecting any of the tiles will display the suspect text extracted from each document.
Tiles will display green if no infractions and red if infractions are detected.
When a file is selected, the manager will see the document (If not a local path) and the infraction is highlighted for them.
Many file types
Works with common formats with visual clarity. Import folders, images, text-based files and everything in between. We support:
- Native TIF, PNG, JPG
- PDF Text, PDFOCR, PDFImage
Privacy auditor can inspect for defined patterns common in the Personal Identity world. Create custom scripts or use the built-in ones for Credit Cards, Social Security Numbers, Gender, Email Addresses, IBAHN or SWIFT codes and more.
File sizing tools
File sizing tools work to identify the sizing impact of documents you import. Use tools to reduce the file size using 2D adaptive thresholding or color reduction techniques.
Split and name files automatically at barcodes found within the document and save them to names from the extracted document barcodes or other content.
Page separation tool
A Separation Page Tool is included with the software allowing users to create barcode separation pages to insert into a scanning stack.
It supports multiple fields and drop down boxes for easy input.