Useful Regular Expressions

Introduction

This appendix contains some useful regular expressions to use in Print&Share. Remark: You have also an example list of useful regular expressions (regex) in Print&Share. You find the list in the Recognition-dialog, Specific-tab where you will have a drop down list with regular expressions.

For more information and an explanation about regular expressions, go to: http://www.regular-expressions.info or a similar website.

Examples

String and text (general)

Example 1

  • Goal: Find the number in the text between the dashes.
  • Input: ABC-1234-XYZ
  • RegEx: (?<=.*?).+(?=-)
  • Result: 1234

Example 2

  • Goal: Find the string KLM in the text.
  • Input: ABCKLMXYZ or ABCklmXYZ
  • RegEx: (KLM)|(klm)
  • Result: KLM or klm

File extensions

Example 1

  • Goal: Match all PDF-files, ending with .pdf extension.
  • Input: document.pdf
  • RegEx: .*[.]pdf
  • Result: document.pdf

The above regular expression is in its simplest form. If you search the PDF file name inside text, you would like to add word-boundaries to it by using \b:

  • RegEx: \b.*[.]pdf\b

\b is the First or last character in a word.

In case you want to match document.PDF, document.pDf and document.pdf you can make the regex case insensitive by using (?i):

  • RegEx: (?i).*[.]pdf

XML-files

Example 1

  • Goal: Get the value of an XML-tag.
  • Input: <MY_TAG>my information</MY_TAG>
  • RegEx: (?<=<MY_TAG >).+(?=</MY_TAG >)
  • Result: my information

Example 2

  • Goal: Get the value of an XML-tag and remove leading numbers.
  • Input: <ORDERNUMB>000123</ORDERNUMB>
  • RegEx: (?<=<ORDERNUMB>.*?)[1-9].+(?=</ORDERNUMB>)
  • Result: 123

Example 3

  • Goal: Get the value of an XML-tag and remove text before a specific word.
  • Input: <MY_TAG>my information I don’t need this</MY_TAG>
  • RegEx: (?<=<MY_TAG>.*?).+(?= I don’t need this.+</ MY_TAG>)
  • Result: my information

Example 4

  • Goal: Get the value of an XML-tag and remove text after a specific word.
  • Input: <MY_TAG> I don’t need this my information </MY_TAG>
  • RegEx: (?<=<MY_TAG>.*? I don’t need this).+(?=</ MY_TAG>)
  • Result: my information