Aim:
Take a UTF-8 encoded xml file, read the contents into a ElementTree then iterate through and print out the element contents.
Example:
<?xml version="1.0" ?>
<text>
<group>
<line>English</line>
<line>Français</line>
</group>
</text>
from xml.etree import ElementTree()
tree = ElementTree()
f = open( "filename.xml" )
tree.parse(f)
for group in groups.findall("//group"):
for line in group.findall("line"):
print(line.text.encode('utf-8'))
The important code is the .encode('utf-8') part. Internally the ElementTree is storing the decoded bytes so if you call line.text it will try and encode the bytes into the default encoding which is ASCII. This will fail as the ç character isn't in the ASCII range.
If you call line.text.encode('utf-8') it will encode into UTF-8 so everything will be fine and dandy.