Changes between Version 12 and Version 13 of OnyxExportOntology


Ignore:
Timestamp:
01/28/11 10:03:36 (14 years ago)
Author:
jeff.lusted
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • OnyxExportOntology

    v12 v13  
    330330The input is sourced from the export zip file which is attached to page [[Onyx Export and Purge]]. If you unzip this file you will find a directory structure for each "part" of the BRICCS questionnaire. Each directory has a variables.xml file plus a collection of data files for each participant exported from Onyx. It's the collection of variables.xml files which contain the metadata for each "part".
    331331[[BR]]
    332 We obviously need to explore this in conjunction with trying to ascertain what the final input into i2b2 will be like. I expect a few iterations before we get anywhere near what is required. At the moment, I see producing an intermediate form of ontology from Onyx as a first step. This intermediate stage (probably a number of xml files for each Onyx part) will be used to produce SQL for the Ontology Cell and the ontology dimension table within the CRC Cell (the data mart). I believe the intermediate ontology might then be used to drive an intermediate format for the participant data files (the 0000001.xml type files within the export zip). We can then use whatever comes out of that process to produce CSV files for import into i2b2 (fact table and the patient table). The latter is managed from within the i2b2 workbench.
     332
     333We obviously need to explore this in conjunction with trying to ascertain what the final input into i2b2 will be like. I expect a few iterations before we get anywhere near what is required. At the moment, I see producing an intermediate form of ontology from Onyx as a first step. This intermediate stage (probably an xml file for each Onyx part) will be used to produce SQL for the Ontology Cell and the ontology dimension table within the CRC Cell (the data mart). I believe the intermediate ontology might then be used to drive an intermediate format for the participant data files (the 0000001.xml type files within the export zip). We can then use whatever comes out of that process to produce CSV files for import into i2b2 (fact table and the patient table). The latter is managed from within the i2b2 workbench.
    333334{{{
    334335ONYX Export File
     
    344345I'm agnostic as far as techniques are concerned (the bits in brackets). But I see XSLT as being admirably suited to doing the grunt work on producing SQL inserts and CSV files. The inserts into the Ontology cell for the demo data system were held in a file exceeding 250M in size, and I suspect even the first stab at Onyx will produce something relatively large. I've produced a first cut at processing a set of variables.xml files into an intermediate ontology and will attach a complete set corresponding to my example export zip file. I've struggled with aspects of trying to get a relatively systematized view from the collection of variables files. The idea being that one xsd file covers the whole structure, whatever variables file is chosen. There are some complex convolutions to derive variables and their grouping into different structures which I think is easier to explore programatically, at least for the moment.[[BR]]
    345346
    346 The project I've started is currently in SVN within my sandbox area: onyx-to-i2b2. I'm uncertain about the structure of the project, and how it should eventually look, which is why it is sandboxed for the time being. When we have a better idea, code, examples, xslt, everything should be moved into another area of SVN and mavenized. There might need to be another within the admin area which depicts scripts for exporting from onyx and readying for import into i2b2.
     347The project I've started is currently in SVN within my sandbox area: onyx-to-i2b2. I'm uncertain about the structure of the project, and how it should eventually look, which is why it is sandboxed for the time being. When we have a better idea, code, examples, xslt, everything should be moved into another area of SVN and mavenized. There might need to be another project within the admin area of SVN which depicts scripts for exporting from onyx and readying for import into i2b2.
    347348
    348349=== Comments On Structures So Far ===
    349 
    350 
     350Looking at the collection of variables.xml files from an Onyx export:
     351   * Most parts of the Onyx export are stages, except for Participants.
     352   * Most stages are Questionnaire based, but not all: !BloodSamplesCollection, Consent and !UrineSamplesCollection are the exceptions.
     353   * Not all variables are question based. This is obviously true for non Questionnaires, but there are some variables even within stages that are not question based.
     354   * Some variables have structured names "Admin.Participant.barcode" which are designed into Onyx. Easy enough to unpack into a structured ontology path.
     355   * Some variables have structured names "famhist_death_sudden.brother4_death_sudden" which are allowed for but at the discretion of the questionnaire designer. The example is taken from the !RiskFactorQuesionnaire. These look more of a problem to unpack in a meaningful way.
     356