Resting Anchor

The Anchorage

Personal website of Gregory K. Maxey, Commander USN (Retired)

Extract Data from Forms
(A Microsoft Word Help & Tip page by Gregory K. Maxey)

DISCLAIMER/TERMS OF USE

The information, illustrations and code contained in my "Microsoft Word Tips" are provided free and without risk or obligation.

Click to acces PayPal Verification Service Click to acces PayPal Verification Service

However, the work is mine. If you use it for commercial purposes or benefit from my efforts through income earned or time saved then a donation, however small, will help to ensure the continued availability of this resource.

If you would like to donate, please use the appropriate donate button to access PayPal. Thank you!

Click to donate British Pound Sterling                   Click to donate US dollars                   Click to donate EU euros

This Microsoft Word Tips & Microsoft Word Help page describes my "Extract Data from Forms Version 3.1" Word template add-in.

Summary\Applicability & Background

Published originally as version 2.0, this add-in significantly expands the capabilities of "Extract Content Control Form Data" which it replaced. With this expanded capability, the add-in provides tools to process and extract data from:

The extracted data can be stored or viewed:

The add-in was developed in Word 2010 but tested and applicable to Word versions 2007/2010/2013/2016. It compliments a similar add-in, Word add-in to extract data from protected forms developed and published by my long time friend and frequent collaborator Graham Mayor.

I have previously published other template add-ins to facilitate data extraction  including Extract Batch Form Data and Extract Data from Word Document File. This add-in refines and further develops many of the techniques I used in those.  I have left them posted on this site so visitors can study and reference the VBA code.

Version 3.0

Version 3.0 started life after Graham proposed using the ADODB engine and SQL to write the extracted data to Excel.  I had been using the ADODB\SQL method for data extraction to Access in my version 2.0 and immediately saw the benefit of using it for Excel also.

Like most projects that Graham and I collaborate on, the capabilities of version 3.0 expanded significantly beyond just using ADODB for Excel. We have upgraded of the title\tag utility to support naming formfields and created a new utility to convert formfields to content controls.

Site Note IconNote: One or more images in this tips page may still reflect version 2.0.  I did not take the time to update every image so you may see some minor differences between the dialogs shown here and the dialogs in the actual add-in.

Version 3.1

Version 3.1 shifts the emphasis from tagging (to process) content controls purposed for form data fields to tagging (to filter and exclude) content controls that perform other roles (e.g., a rich text control used to group boiler plate text and form data fields).

I have become convinced that the editing restriction "No Changes (Read Only) with exceptions (editors) is the most effective and robust method for using content controls in forms.  Therefore the use and need of content controls for purposes other than form data entry is minimized.

Version 3.2

Resolved an issued encountered when the utility attempted to write individual field data to an Excel worksheet when the length of the data string exceeded 255 characters.

Version 3.3

Excludes Access as a target data storage file if the form field count exceeds 255. Processes extraction to Excel when the form field count exceeds 255 using an instance of Excel (vice ADODB connection).

Version 3.4

Corrects data fields conflict errors when batch processing files.

Content Controls in Forms

Before going into more details about the add-in, I need to discuss a few points about defining and using content controls in the forms. Understanding these points and applying them when you build your form documents, should ensure complete success with the add-in.

Site Note IconNote: A checkbox control is not included with Word 2007. However, in many cases a two item dropdown list e.g., Yes and No/True or False etc., can be substituted for a checkbox.

The illustration below depicts a representation of each content control supported by the add-in. The illustration depicts the form in design mode with content control titles applied.

extract_form_data_2

In the illustration above I have used content control titles descriptive of the content control type.  Each content control is purposed for data entry by the user.

In practice content controls purposed for data entry should be assigned a unique title descriptive of the intended data (e.g., the dropdown list shown might be titled "Favorite Fruit").

The  next illustration depicts an example form in which a rich text content control is purposed for grouping\editing restrictions.  It is shown without a title and is tagged using the default filter tag. It contains the four form data fields shown titled with meaningful titles.

extract_form_data_3

Remember, as stated earlier, if your form includes content controls (excluding group content controls) not purposed for form data entry, then those controls must be tagged using a defined common filter tag.

Site Note IconNote: The extraction utility default common filter tag is Do not extract, as shown. However, when designing your forms you can define and apply any suitable common filter tag string.

Content control title and tag properties can be defined using the properties dialog shown above or by using the add-in Title/Tag Form Data Fields utility described below.

Site Note IconNote: For additional information on using content controls in forms see my:
Create Forms with Content Controls.

Getting Started

The Extract Data from Forms add-in is presented as a Word template and supplied in zipped format.  You can download the template using the link at the end of this tips page.

When installed, the add-in displays a control group and controls on the Word Ribbon Add-Ins tab as shown below:


extract_form_data

extract_form_data_1A
User interface Word 2007/2010/2013/2016 (Word 2010 shown)

"Launch Extractor" is the add-in primary interface control.  The controls in the "Utility Tools" menu initiate secondary support utilities. "Reset Defaults" resets the add-in stored values to the first use condition.

Add-in Utility Tools

As the utility tools are intended to simplify preparation for using the extraction utility, I will describe them first.

Convert Formfields to Content Controls

Site Note IconNote:  During development, Graham and I discovered that content controls are inherently processed considerably faster than formfields.  Additionally, the seemingly senseless and moronic [offense intended] internal changes that Microsoft made to the Word object module with Word 2013 significantly degraded many VBA looping processes including the process to  loop through document formfields.  Therefore, if you and your form users have access to Word 2010 or higher, I strongly encourage you to convert your forms that use formfields into forms that use content controls.

While the add-in fully supports data extraction from legacy formfields, this utility is available to assist users with Word 2007 or later who wish to upgrade legacy forms and take advantage of faster processing speed, increased flexibility in data field titles and spell checking afforded with content controls.

Provided the active document contains 1 or more defined formfields, this utility displays an interface similar to those shown below:

extract_form_data_38

Ready to process.

extract_form_data_38A

Forms must be unprotected to convert.

During conversion, the new content controls created are titled as determined by the following conditions:

The option "Add global editors to content controls" flags each content control created as an editable region in documents with restricted editing No changes (Read only) applied.  This is the restriction method that I recommend for using content controls in forms.

When global editors are added documents unprotected automatically by the utility or by a user entered password are re-protected using the No Changes (Read Only) method.

Title/Tag Form Data Fields

This utility assists you in to  cycle through, review and set data field names\bookmarks (formfields) or titles/tags (content controls).

Since the name or title applied  will eventually define the column/record headings in the extracted data storage files, you should take care apply meaningful and unique names/titles to each form data field.

When assigning names/titles to data fields and using Access or Excel as the target data storage file, you must avoid using reserved keywords in the title names. For example, "Date" and "Image" are both Access restricted keywords.

 You can avoid most keyword conflict issues by using a multi-word title (i.e., "Date_of_Issue", "Current_Picture", etc.)

If you use this utility to title content controls that will be mapped to an XML data node, you must avoid using invalid XML characters in the title name if the title is used to define the data node basename.

The add-in includes code to prevent using invalid XML characters and limited code to assist in preventing keyword conflicts. However, the keywords are subject to change and may not be  inclusive.  If you experience data extraction errors to either Access or Excel a possible cause is a keyword conflict.

extract_form_data_4

The illustration above depicts our first example form I created using the standard Word interface.  When the utility is launched ...

The following provides a description of the numbered labels shown above:

  1. Links to additional information on using the utility.
  2. Indicates the type of control selected and used to switch between formfields and content controls when comingled in the same document.

Site Note Icon Note: The data extraction utility does not support extraction of data from comingled formfields and content controls to the same target data file.

  1. Prevents using invalid XML characters in content controls titles. Spaces typed in the Title field are automatically converted "_" (underscores). Displayed in content control mode only.

Site Note Icon Note: As illustrated above, the space in "Rich Text" is an invalid XML character resulting in the flagged condition.

  1. Prevents defining the same title to two or more content controls.  Shown in content control mode only.
  2. Prevents using a reserved keyword as a formfield name or content control title.
  3.  Status indicator.  Provides visual queues to prompt user action.
  4. Displays the current name\title (if defined) of the selected control and facilitates entry of user defined titles.
  5.  Displays the current tag (if defined) of the selected content control and facilitates entry of user defined tags.  Shown in content control mode only.
  6. Displays the saved common filter tag. The default common filter tag is "Do not extract"
  7. Command buttons. 

The utility interface in Formfield mode is simplified with fewer required options.

extract_form_data_4A

The utility is intended to name/title controls for use with data extraction and therefore it is Draconian in nature. The "Set" command buttons on the utility interface are enabled only when a valid name\title is defined for the selected control as shown above.

No title (empty current name/title) field is not a valid title.  The utility can not be used to remove names/titles or selectively name/title controls.

Create Outlook Folders

Before you can process Outlook mail item attachments using the add-in, appropriate folders and message rules must be created and defined in Outlook.  This ensures that incoming messages from your form respondents are detected and properly routed.  The add-in ribbon controls "Create Outlook Folders" creates/validates the required folders.

extract_cc_form_data_5
Illustrates folders created in Outlook

Detailed steps for setting up Outlook rules is beyond the scope of this discussion.  There are plenty of examples in Outlook support forums for via Google search.  Basically you simply need to create a rule based on the subject line of forms returned by your respondents via e-mail.

extract_cc_form_Data_6
A representative form return subject line

extract_cc_form_data_7

A simply rule to move messages to the Forms Received folder created by the add-in.

Create Report

The "Create Report" is used to create a new document using information contained in a form open as the active document.  The control displays a dialog that allows you to define the template associated with document you want to create.  A more complete demonstration follows the discussion on using the add-in.

extract_form_data_8

First Use/Main Dialog

Disclaimer

The first time you click the add-in "Launch Extractor" control, the main userform interface dialog displays a disclaimer notice.  Before you can use the add-in to extract data or use the supporting utilities described above, you must accept the terms of the disclaimer.

extract_cc_form_data_9

The add-in main dialog contains four common controls illustrated above.

  1. Cancel control.  Cancels setup for data extraction and closes the dialog.
  2. Donate control.  Displays links and controls that enable you to contribute to the support of this website and my work with donations processed with PayPal.
  3. The "Resting Anchor" logo control.  Opens this tips page.
  4. Execute control (shown here a "Accept").  A multi-purpose command execute control.

Extract Data

After accepting the disclaimer, the "Extract Data" page is displayed.  Using the controls on this page you set the type of extraction, source file or files to extract from and the target.

extract_form_data_10
Default "First Use" dialog when the active document is a new blank document

The main dialog consists of four control sections 1) Data Fields , 2) Source File\Files, 3) Target and 4) Notification Panel that you can use to define parameters for data extraction. The numbers in the illustration above are indexed to the following:

  1. Use to select the type of data field.  The add-in supports extracting data from legacy form fields, content controls (all in document) or filtered content controls (excludes any content control tagged with the common filter tag).
  2. Defines the number of data fields in the active document or batch/attachment files to process. Used to establish and validate the columns in data storage files.  When "Active document" is selected as the source file, the field count is calculated and displayed  automatically based on number of data fields present in the active document.  For batch and attachment modes, you must enter the number of defined data fields in a batch or attached document.

Site Note IconNote:  The value you assign in this field is critical to validating batch documents for processing. If the number you enter does not match the physical number of tagged content controls or form fields in the batch documents, processing will terminate or be incomplete.

  1. Saves the displayed data field value to the registry.  Saved values are automatically recalled and displayed when subsequent batch or attachment process is initiated.
  2. One of three links used to display additional information.
  3. Extracts the file name along with defined data fields.  Must be checked or unchecked when the association between the form and the target data file is established.  Changing this value after after the association is establish will result in a data field conflict.
  4. Use to specify how incomplete data fields are processed.  1) Notify - when an incomplete field is encountered, you will be prompted to skip or process the individual form, 2) Skip  - forms with one or more incomplete fields are automatically skipped and not process, 3) Process - forms are processed using available data.
  5. Defines the data source.  The add-in supports extracting data field content from a) the active document, b) a batch folder containing similar forms, or c) the Outlook "Form Received" folder.  Modes b) and c) are hereafter referred to as the "batch" modes.  The data source can be selected by the user or set based on saved settings from previous runs. If the active document contains one or more defined data fields it is automatically set as the initial data source when launching the extractor.
  6. Allows users to browse to and select the browse folder path.
  7. Displays the defined batch folder containing the batch files.  All files in the batch folder must be similar with the same number of data fields. Valid paths have a green background. Invalid paths have a rose background.
  8. The extractor utility automatically creates a sub-folder "Processed" in the defined batch folder.  Check this box if you want to automatically transfer processed batch files to the "Processed" folder after data is extracted.
  9. Sets the target data file type.
  10. Browse to and set the target data file path.  Note - if extracting to an existing data target file not displayed, use the target file "Browse" command (14)
  11. Displays the target folder path.  Path may be entered manually or defined using the target path "Browse" command (12).  Valid paths have a green background. Invalid paths have a rose background.
  12. Browse to and set an existing target data file.
  13. Displays the target data file name.  Target data files may be pre-existing files created externally by the user, pre-existing files created automatically by the extractor utility or new files created automatically by the extractor utility. The target data file name can be typed manually or pre-existing files can defined using target file "Browse" command (14).  Note - the extractor utility default data target file name is "FormData" with the appropriate extension as shown above.  FileData can be change to any valid file name.
  14. Clears the data in the defined target data file and re-associates the form data fields with the target data file.  Note - This is basically the same thing as deleting and creating a new data file.
  15. Displayed when Access or Excel is set as the target file. Used to select the worksheet or table in the data file used to store data.  For new Excel data files Sheet1 (as shown) is the default sheet name.  For new Access data files Table1 is the default table name.
  16. Displayed when Excel is the target file only.  Allows data to be written to a new sheet row regardless of blank or duplicated data field names.  (Not recommended)
  17. Displays limited informational notifications to the add-in user. 

The extractor utility user interface automatically reconfigures based on options selected.  The following additional controls are not shown in the initial "First Use" display illustrated above.

extract_form_data_11
  1. A conditional control displayed only when Content Controls  (Filtered) is set as the defined form data field type. Use this text field to define and set a new default common filter tag string.  The add-in default common filter tag is "Do not extract" as shown.
extract_form_data_12

extract_form_data_13
  1. Displayed only when Outlook is set as the data source and and the Incomplete fields option is set to "Notify" or "Skip."  When checked displays the conditional E-Mail Return Message page tab (22).
  2. Defines the return incomplete form message.
extract_form_data_14

extract_form_data_15
  1. The "Extract images/content" is a conditional control displayed only in "Word table" target mode.  Checking this option extracts physical content (e.g., images, formatting, etc.) to the data storage file.
  2. The "Text delimiter" is a conditional control displayed only in "Delimited text" target mode.  It sets the delimiter character to use in delimited records.

Example Extracted Data Records

In the following illustrations and discussion, I will show the results of actual data extractions using the add-in.

Single Documents

Data in form field or content control data fields from single documents (i.e., open active document) can be extracted using any of the four target modes.

extract_form_data_19
Our first demonstration content control form as the active document

extract_form_data_20

Add-in configured to extract active document content controls to a Word table.
The data file path and name is "D:\Data Stores\FormData.docx"
Extracting physical content (i.e., images/formatting)

Site Note IconNote: Since all form content controls (excluding the group content control) is purposed for form data entry, the "content control" data field type is selected.


extract_form_data_21

A "Processing Report" is displayed following each execution.

 extract_form_data_22
D:\Data Stores\FormData.docx content

Site Note IconNote:  Most settings applied are "sticky" settings. This means that after you execute  "Extract Data" and redisplay the add-in, your previous settings are recalled.

extract_form_data_23

Here I have extracted the data a second time without the "Extract images\content" option.

Site Note IconNotes:
 
1. The example form data fields may seem whimsical or impractical but are intended to illustrate the unique capability to extract "rich content" (i.e., rich text formatting, graphics, and building block content) to a Word table target data file.

2. The extraction of "rich content" is only supported in the "Word table" target mode when the conditional "Extract images\content" option is selected.  In all other target modes only text content is extracted and recorded, checkboxes resolve as true (checked)or false (unchecked), and the filename is extracted for linked pictures in picture content controls.

3. While the numerical limit of a Word table is 62 columns, here you see the practical limit is probably no more 10-15 data columns.


extract_form_data_24
Our second demonstration form as the open active document.

Site Note IconNote:  Design mode displayed so you can see the rich text content control used for editing restrictions.


extract_form_data_24A

Add-in configured to extract active document content controls to a delimited text file.
The data file path and name is "D:\Data Stores\FormData.txt"
The pipe characters defines the text delimiting character.

Site Note IconNote: Since a rich text content control is purposed for editing restrictions and tagged with the common filter tag (Do not extract), Content Controls (Filtered) is selected as the data field type.


extract_form_data_25
D:\Data Stores\FormData.txt content

Site Note IconNote:  Checkbox content controls "checked" symbols resolve as "True" in text only data storage files.  The "unchecked" symbols resolve as "false."

Batch Documents

Data from formfield or content controls data fields in a batch of documents, stored either as Word documents in a file folder or as documents attached to e-mail in the Outlook "Forms Received" folder can be extracted using any of the four target modes.

The Word table and Delimited text file modes were demonstrated above.  In the following illustrations and discussion, I will demonstrate the Excel and Access target modes.

The batch documents consist of six numbered Survey Result documents stored in a local file folder named "Test Forms" on my PC desktop.

extract_form_data_26

Forms received from respondents saved as numbered files in a local folder.

extract_cc_form_data_27
Representative form document.  The form contains seven content control data field elements.

extract_form_data_28

Excel Target Mode

In the either batch mode, the user must define 1) The number of data fields contained in the form to process (here I have entered "7"), 2) The batch folder containing the files to process,
3) The path of the target data file, 4) Then name of the target data file (here the default Excel target file is used) and 5) In the case of Access or Excel, the sheet or table number (here the default Sheet1 is used).

Site Note IconNote: The add-in will disable the "Extract Data" command button until all of the require entries are completed.  It is up to you to ensure the correct data is entered.


 extract_form_data_29

Processing Report

extract_form_data_29A
Optional detailed processing report created as a new Word document.

extract_cc_form_data_30
D:\Data Stores\FormData.xlsx spreadsheet content.

Site Note IconNote:  Image files can be inserted, linked or inserted and linked in Word documents.  When a link is used, the image file name will be extracted.  Otherwise, a forward slash "/" indicating a Word inline shape is extracted to the data file.

In versions after 3.1, any Excel workbook file created automatically by the Add-In will include a hidden row 2 used to ensure that data from long data fields (>255 characters) can be extracted and transferred.

extract_form_data_30A
This row contains a dummy text string that overrides the Excel 255 character default text limit.
Once you have extracted actual data that exceeds this limit or if you do not require this capability, the dummy row can be unhidden and deleted.

extract_form_data_30B
  
extract_form_data_31

The add-in automatically creates a "Processed" sub-folder in the user defined batch folder. Processed files can be moved automatically to the new sub-folder.

Incomplete Fields

Form respondents may not always complete each field in your form. While this may be obvious when processing the active document, the add-in detects an incomplete field during processing.  If you elect the "Incomplete Field(s): Notify" option, you are alerted and asked to continue processing or skip the incomplete form.

extract_form_data_31A

Here the Access is the selected target mode and the data is being extracted to a user entered pre-existing file and user defined table. The user has elected to be notified when a form with incomplete fields is detected.
 
extract_form_data_32

The user elects to skip an incomplete form.

 extract_form_data_32A
Processing Report

extract_form_data_33

Skipped documents are retained in the original defined batch folder.

Processing Anomalies

The add-in contains over twenty predefined user informational notifications and alerts that are displayed based on the processing configuration\results.

So far you have only seen examples of the informational notifications.  When the add-in detects missing data or other conflicts, one of several alerts is displayed.

extract_form_data_34

extract_form_data_34A

While the example form does in fact have eight (8) content controls, only 7 are purposed as form data fields.  The rich text content control titled "Form Container" is tagged with the common filter tag "Do not extract" and excluded from data extration.  It also must be excluded from the user defined number data fields.  The numeral 8 entered as shown above results in a data field conflict (i.e., one more defined data fields than there are columns in the target data file).

extract_form_data_34C

extract_form_data_34D

Here the form designer has used an Access Reserved Keyword as a data field title.

Care and attention both when designing the form and setting the values for data extraction should minimize your experience with these alert messsages.

The examples provided should give you enough confidence to explore the add-in and employ it to meet your specific needs.

Create Report

Employing a user defined template and the "Create Report" control, you can create a report (or other summary type document) based on data field content in the open form as the illustrations and discussion below will demonstrate.

extract_form_data_35

Form document opened in Word as the active document.

extract_form_data_36
Representative user defined document template associated with the form document.

 extract_form_Data_37
Representative new report (summary document) created from user defined template.

You can download the template add-in here:  Extract Data From Forms.  For installation instructions, see: Organizing your Macros\Template Add-Ins.

That's it! I hope you have found this tips page useful and informative.

Share

DISCLAIMER/TERMS OF USE

The information, illustrations and code contained in my "Microsoft Word Tips" are provided free and without risk or obligation.

Click to acces PayPal Verification Service Click to acces PayPal Verification Service

However, the work is mine. If you use it for commercial purposes or benefit from my efforts through income earned or time saved then a donation, however small, will help to ensure the continued availability of this resource.

If you would like to donate, please use the appropriate donate button to access PayPal. Thank you!

Click to donate British Pound Sterling                   Click to donate US dollars                   Click to donate EU euros

Search my site or the web using Google Search Engine

Google Search Logo