Parse and import documentation from XML and DOCX files into LMS
15 Feb 2015 GitHub source code data-wrangling
XML to CSV custom parser. Convert document paragraphs and chapters of technical and law documentation into csv table. ####Code source on GitHub https://github.com/alexbra/xmltocsv
####Applied technologies
Python, ElementTree, BeautifulSoup
###Description There are dozens of law and technical documents and more than 20,000 QUIZ questions stored in xml, html and docx files. The task is parse and convert them into csv files. Further, csv files will be import into learning management system.
####xml files format
####Parser
I used xml.etree.ElementTree
Python module to parse xml files and mammoth
and BeautifulSoup
for docx files.
I had a csv file for each document and question file.
parts.csv
documents structure
book_id book_name | name | code | parent_code | type | url | paragraphs | |
---|---|---|---|---|---|---|---|
book_332 | 332. Book | 2. Chapter | 1298385727632_34 | folder | ../content/12983857.html | ||
book_332 | 332. Book | 1298385727632_35 | 1298385727632_34 | lesson | ../content/12983857.html | 1298385727632_36 |
questions.csv
testing questions
bookcode | partcode | qcode | qname | type | is_correct | answname | bookname | partname | group |
---|---|---|---|---|---|---|---|---|---|
book_332 | 1298385727632_35 | 1298385727632_18_2 | Name of question | multiple | 1 | answ | 332.Book | name | coll_2008 |
As a result, I transformed more than 120 documents and 20,000 quiz questions into learning management system. They are ready for learning of 100K+ company workers.