Useful Regular Expressions
Introduction
This appendix contains some useful regular expressions to use in Print&Share. Remark: You have also an example list of useful regular expressions (regex) in Print&Share. You find the list in the Recognition-dialog, Specific-tab where you will have a drop down list with regular expressions.
For more information and an explanation about regular expressions, go to: http://www.regular-expressions.info or a similar website.
Examples
String and text (general)
Example 1
- Goal: Find the number in the text between the dashes.
- Input:
ABC-1234-XYZ - RegEx:
(?<=.*?).+(?=-) - Result:
1234
Example 2
- Goal: Find the string
KLMin the text. - Input:
ABCKLMXYZorABCklmXYZ - RegEx:
(KLM)|(klm) - Result:
KLMorklm
File extensions
Example 1
- Goal: Match all PDF-files, ending with .pdf extension.
- Input:
document.pdf - RegEx:
.*[.]pdf - Result: document.pdf
The above regular expression is in its simplest form. If you search the PDF file name inside text, you would like to add word-boundaries to it by using \b:
- RegEx:
\b.*[.]pdf\b
\bis the First or last character in a word.
In case you want to match document.PDF, document.pDf and document.pdf you can make the regex case insensitive by using (?i):
- RegEx:
(?i).*[.]pdf
XML-files
Example 1
- Goal: Get the value of an XML-tag.
- Input:
<MY_TAG>my information</MY_TAG> - RegEx:
(?<=<MY_TAG >).+(?=</MY_TAG >) - Result:
my information
Example 2
- Goal: Get the value of an XML-tag and remove leading numbers.
- Input:
<ORDERNUMB>000123</ORDERNUMB> - RegEx:
(?<=<ORDERNUMB>.*?)[1-9].+(?=</ORDERNUMB>) - Result:
123
Example 3
- Goal: Get the value of an XML-tag and remove text before a specific word.
- Input:
<MY_TAG>my information I don’t need this</MY_TAG> - RegEx:
(?<=<MY_TAG>.*?).+(?= I don’t need this.+</ MY_TAG>) - Result:
my information
Example 4
- Goal: Get the value of an XML-tag and remove text after a specific word.
- Input:
<MY_TAG> I don’t need this my information </MY_TAG> - RegEx:
(?<=<MY_TAG>.*? I don’t need this).+(?=</ MY_TAG>) - Result:
my information