| 328 | Jason, I'd hold off doing a lot of work on the style sheet approach at this stage. I see style sheets coming into their own in the next stage, where we have to process the intermediate xml into SQL insert commands. But I could be wrong. |
| 329 | |
| 330 | The input is sourced from the export zip file which is attached to page [[Onyx Export and Purge]]. If you unzip this file you will find a directory structure for each "part" of the BRICCS questionnaire. Each directory has a variables.xml file plus a collection of data files for each participant exported from Onyx. It's the collection of variables.xml files which contain the metadata for each "part". |
| 331 | [[BR]] |
| 332 | We obviously need to explore this in conjunction with trying to ascertain what the final input into i2b2 will be like. I expect a few iterations before we get anywhere near what is required. At the moment, I see producing an intermediate form of ontology from Onyx as a first step. This intermediate stage (probably a number of xml files for each Onyx part) will be used to produce SQL for the Ontology Cell and the ontology dimension table within the CRC Cell (the data mart). I believe the intermediate ontology might then be used to drive an intermediate format for the participant data files (the 0000001.xml type files within the export zip). We can then use whatever comes out of that process to produce CSV files for import into i2b2 (fact table and the patient table). The latter is managed from within the i2b2 workbench. |
| 333 | {{{ |
| 334 | ONYX Export File |
| 335 | | |
| 336 | +-->variables.xml---(program?)--->intermediate ontology +---(XSLT)--> SQL inserts into Ontology Cell tables |
| 337 | | | | |
| 338 | | | +---(XSLT)--> SQL inserts into CRC ontology_dimension table |
| 339 | | | |
| 340 | | V |
| 341 | +-->nnnnnnnnn.xml---(program?)--->intermediate data---------(XSLT)--> CSV file for import into CRC fact and patient tables |
| 342 | }}} |
| 343 | |
| 344 | I'm agnostic as far as techniques are concerned (the bits in brackets). But I see XSLT as being admirably suited to doing the grunt work on producing SQL inserts and CSV files. The inserts into the Ontology cell for the demo data system were held in a file exceeding 250M in size, and I suspect even the first stab at Onyx will produce something relatively large. I've produced a first cut at processing a set of variables.xml files into an intermediate ontology and will attach a complete set corresponding to my example export zip file. I've struggled with aspects of trying to get a relatively systematized view from the collection of variables files. The idea being that one xsd file covers the whole structure, whatever variables file is chosen. There are some complex convolutions to derive variables and their grouping into different structures which I think is easier to explore programatically, at least for the moment.[[BR]] |
| 345 | |
| 346 | The project I've started is currently in SVN within my sandbox area: onyx-to-i2b2. I'm uncertain about the structure of the project, and how it should eventually look, which is why it is sandboxed for the time being. When we have a better idea, code, examples, xslt, everything should be moved into another area of SVN and mavenized. There might need to be another within the admin area which depicts scripts for exporting from onyx and readying for import into i2b2. |
| 347 | |
| 348 | === Comments On Structures So Far === |
| 349 | |
| 350 | |