General
In this article we are going to create an XML file from a text file.
We use a .txt file as source to conver to an XML file, but it is also possible to use other document types, for example to convert a PDF to XML.
Configuration
This article asumes that you know how to create and configure a File Processor channel.
- Create a new channel and call it
TXT to XML
. - Input: Select Local/Network and specify the path where the .TXT files are located.
In our case we use path:E:\FP\Winking\Ricoh\in
- Input Filter: We only want to process our .TXT files, therefor we apply an Input Filter.
- Click the Add button and add a Property-Filter Type.
- Configure the property:
File name
,Match Regex
with regular expression:(?i).*[.]txt$
The regular expression accepts only file names with a .txt extension.
- Output: Select Local/Network and specify the path where the resulting .XML files should be saved.
In our case we use path:E:\FP\Winking\Ricoh\result
- Post-Action: We want to delete the original input file after successful processing. Use Method-Type with option Delete (input file).
Now we have configured a basic channel which moves files with a .txt-extension from one folder to another. As you have noticed, we did not add any Converter in the steps above.
In the following steps we will configure the conversion to an XML file.
Configuring the XML conversion
Our goal is to convert a text file to an xml file. Therefor we will use a converter. These steps will guide you in the process of configuring such a converter. Your actual implementation might differ from this example.
Our Text file looks like this:
Title: Schaum's Outline of Signals and Systems
Author: Hwei Hsu
ISBN10: 0070306419
Pages: 470
Title: WPF 4 Unleashed
Author: Adam Nathan
ISBN10: 0672331195
Pages: 825
...
Our XML structure will look like this:
<?xml version="1.0" encoding="Windows-1252"?>
<Books>
<Book>
<Title />
<Author />
<ISBN />
<Pages
</Book>
<Book>
<Title />
...
</Book>
...
</Books>
Now we will create the scheme for this XML-structure:
- In the Channel Options, go to the Conversion-tab.
- Click Add Converter.
- Select Add to xml.
- In the Schema-section, give the root-element a name:
Books
.
- Click the Add-button, next to the root-element and choose Container element (repeating) and click Add.
We have a repeating container because we have multiple books.
- Select the newly created element and on the right side in the properties panel, give the repeating container a name:
Book
. - Click the button next to the Start repeating group: label and disable the checkbox.
- Next we will add the child elements for a Book-element.
- Click the Add-button next to the Book-element, select String and click Add.
- Select the newly added child-element and On the right side fill in
Title
next to the Name: label.
- Click on the button next to the Recognition: label to configure the recognition.
Here we will use recognition and define which text we should take from the .txt file. - Enable the Enable recognition for: Title checkbox.
- Select Property: Content.
- For Content Filter: use Label.
- Label: Whole word and fill in
Title:
- Value position: Right
- Value type: Everything
- Click the Add-button at the bottom and select Trim. Change the value Start to Both.
- Click OK to close the recognition dialog.
For now we have added the Title-property.
If we now test our XML-converter by putting our text file in the input folder and starting the File Processor, the resulting file will have this content if you configured everything correctly:
<?xml version="1.0" encoding="Windows-1252"?>
<Books>
<Book>
<Title>Schaum's Outline of Signals and Systems</Title>
</Book>
<Book>
<Title>WPF 4 Unieashed</Title>
</Book>
<Book>
<Title>Mastering Serial Communications</Title>
</Book>
</Books>
Now you can add the other elements, like Author, ISBN, Pages to complete your XML.
After adding the other elements your scheme should look similar to this:
Test Files
We used a text file (.txt) called books.txt
with the following content:
Title: Schaum's Outline of Signals and Systems
Author: Hwei Hsu
ISBN10: 0070306419
Pages: 470
Title: WPF 4 Unleashed
Author: Adam Nathan
ISBN10: 0672331195
Pages: 825
Title: Mastering Serial Communications
Author: Peter W. Gofton
ISBN10: 0895881802
Pages: 289