In this article we are going to create an XML file from a text file.
We use a .txt file as source to conver to an XML file, but it is also possible to use other document types, for example to convert a PDF to XML.
This article asumes that you know how to create and configure a File Processor channel.
- Create a new channel and call it
TXT to XML.
- Input: Select Local/Network and specify the path where the .TXT files are located.
In our case we use path:
- Input Filter: We only want to process our .TXT files, therefor we apply an Input Filter.
- Click the Add button and add a Property-Filter Type.
- Configure the property:
Match Regexwith regular expression:
The regular expression accepts only file names with a .txt extension.
- Output: Select Local/Network and specify the path where the resulting .XML files should be saved.
In our case we use path:
- Post-Action: We want to delete the original input file after successful processing. Use Method-Type with option Delete (input file).
Now we have configured a basic channel which moves files with a .txt-extension from one folder to another. As you have noticed, we did not add any Converter in the steps above.
In the following steps we will configure the conversion to an XML file.
Configuring the XML conversion
Our goal is to convert a text file to an xml file. Therefor we will use a converter. These steps will guide you in the process of configuring such a converter. Your actual implementation might differ from this example.
Our Text file looks like this:
Title: Schaum's Outline of Signals and Systems Author: Hwei Hsu ISBN10: 0070306419 Pages: 470 Title: WPF 4 Unleashed Author: Adam Nathan ISBN10: 0672331195 Pages: 825 ...
Our XML structure will look like this:
<?xml version="1.0" encoding="Windows-1252"?> <Books> <Book> <Title /> <Author /> <ISBN /> <Pages </Book> <Book> <Title /> ... </Book> ... </Books>
Now we will create the scheme for this XML-structure:
- In the Channel Options, go to the Conversion-tab.
- Click Add Converter.
- Select Add to xml.
- In the Schema-section, give the root-element a name:
- Click the Add-button, next to the root-element and choose Container element (repeating) and click Add.
We have a repeating container because we have multiple books.
- Select the newly created element and on the right side in the properties panel, give the repeating container a name:
- Click the button next to the Start repeating group: label and disable the checkbox.
- Next we will add the child elements for a Book-element.
- Click the Add-button next to the Book-element, select String and click Add.
- Select the newly added child-element and On the right side fill in
Titlenext to the Name: label.
- Click on the button next to the Recognition: label to configure the recognition.
Here we will use recognition and define which text we should take from the .txt file.
- Enable the Enable recognition for: Title checkbox.
- Select Property: Content.
- For Content Filter: use Label.
- Label: Whole word and fill in
- Value position: Right
- Value type: Everything
- Click the Add-button at the bottom and select Trim. Change the value Start to Both.
- Click OK to close the recognition dialog.
For now we have added the Title-property.
If we now test our XML-converter by putting our text file in the input folder and starting the File Processor, the resulting file will have this content if you configured everything correctly:
<?xml version="1.0" encoding="Windows-1252"?> <Books> <Book> <Title>Schaum's Outline of Signals and Systems</Title> </Book> <Book> <Title>WPF 4 Unieashed</Title> </Book> <Book> <Title>Mastering Serial Communications</Title> </Book> </Books>
Now you can add the other elements, like Author, ISBN, Pages to complete your XML.
After adding the other elements your scheme should look similar to this:
We used a text file (.txt) called
books.txt with the following content:
Title: Schaum's Outline of Signals and Systems Author: Hwei Hsu ISBN10: 0070306419 Pages: 470 Title: WPF 4 Unleashed Author: Adam Nathan ISBN10: 0672331195 Pages: 825 Title: Mastering Serial Communications Author: Peter W. Gofton ISBN10: 0895881802 Pages: 289