Version 11 (modified by 14 years ago) ( diff ) | ,
---|
Deriving an Initial Ontology from an Onyx Export
This is about deriving a first ontology from Onyx for trialing the import of an ontology and subsequent data into i2b2.
Nick's comments from meeting with Jason, Jeff and Dave
Working from variables.xml in MedicalHistoryInterviewQuestionnaire Top level of xml is the questionnaire - nothing pulled from export xml to populate this. Second level is taken from the stage name - all variables have a stage name attribute in the export xml. Third level is section name - all variables have a section name attribute. Fourth level is questionName - all variables have a questionName attribute. Fifth level is name of variable - defined in the <variable name=""> section of the export xml (or possibly constructed from variable name and category name, see below). Note that in some cases this will be the same as the fourth level. If this isn't a good idea, then the name will need to be extended by means of an additional string (maybe '.Question'?) <label> is derived from the variable's attribute "label" <type> is derived from the <variable valueType=""> declaration. * * * I've done the high BP questions (Do you.., When did you..., Have you received...) and then skipped all the other conditions as the question structure is essentially the same, albeit some with multiple categories for (e.g.) type of diabetes, treatment of diabetes, or with multiple question sets for multiple event conditions (MI, etc). * * * Discussion This is messy. Pulling the data only from each <variable></variable> element definition makes sense, but the 'category' variables (Y,N,PNA,DK,etc) would then have 'labels' of 'Y', 'N' and so on - put hundreds of those into i2b2 and you've got a very confusing ontology. Jeff's idea of using the <category> child elements of the original <variables> might therefore be a better idea, but it requires a more complex filter - where a variable has category child elements then the filter needs to construct additional ontology entries from those child elements (combining the variable label and the category label) and ignore the following variables (with the same root variable name) or possibly add detail from them, but where the variable doesn't have category child elements then it must behave differently. Thinking further, it would be better if the primary question <label> element related to the fourth level, not the fifth. Thus in the hierarchy there would be the primary question label, with 1 or more variables underneath it. NOTE a GOTCHA with questions that generate an integer value: the boolean variable is named e.g. part_hist_highbp_onset_cat but the integer value associated with it is just part_hist_highbp_onset. Definitively do not need: page attribute, required attribute, condition attribute, validation attribute. I don't think we need to bring over either the category 'code' attribute, or the 'missing' attribute, but maybe we do? Also don't think we need exclusiveChoiceCategoryVariable attribute, as the ontology just needs to provide all possible variables for all possible participants. Does the stage attribute of a variable ALWAYS match it's questionnaire attribute? Looks like it does. What is the 'script' attribute for? Does the ontology need to include the category variable 'code' attributes? Depends on the structure of the participant answer files. Looks like the codes aren't even mentioned in the answer files, so ignore them for now.
This is Nick's original example:
<briccs_questionnaire> <MedicalHistoryInterviewQuestionnaire> <MAIN> <part_hist_highbp> <label>Have you ever suffered from high blood pressure?</label> <part_hist_highbp> <type>text</type> <label></label> </part_hist_highbp> <part_hist_highbp.N> <label>No</label> <type>boolean</type> </part_hist_highbp.N> <part_hist_highbp.Y> <label>Yes</label> <type>boolean</type> </part_hist_highbp.Y> <part_hist_highbp.PNA> <label>Prefer not to answer</label> <type>boolean</type> </part_hist_highbp.PNA> <part_hist_highbp.DK> <label>Don't know</label> <type>boolean</type> </part_hist_highbp.DK> <part_hist_highbp.comment> <label>Comment</label> <type>text</type> </part_hist_highbp.comment> </part_hist_highbp> <part_hist_highbp_onset_cat> <label>When did you first suffer from high blood pressure?</label> <part_hist_highbp_onset_cat> <type>text</type> <label></label> </part_hist_highbp_onset_cat> <part_hist_highbp_onset_cat.YEAR> <label>Year</label> <type>boolean</type> </part_hist_highbp_onset_cat.YEAR> <part_hist_highbp_onset> <label></label> <type>integer</type> </part_hist_highbp_onset_cat.YEAR> <part_hist_highbp_onset_cat.PNA> <label>Prefer not to answer</label> <type>boolean</type> </part_hist_highbp_onset_cat.PNA> <part_hist_highbp_onset_cat.DK> <label>Don't know</label> <type>boolean</type> </part_hist_highbp_onset_cat.DK> <part_hist_highbp_onset_cat.comment> <label>Comment</label> <type>text</type> </part_hist_highbp_onset_cat.comment> </part_hist_highbp_onset_cat> <part_hist_highbp_treat> <label>Have you received treatment for your high blood pressure? </label> <part_hist_highbp_treat> <type>text</type> <label></label> </part_hist_highbp_treat> <part_hist_highbp_treat.N> <label>No</label> <type>boolean</type> </part_hist_highbp_treat.N> <part_hist_highbp_treat.Y> <label>Yes</label> <type>boolean</type> </part_hist_highbp_treat.Y> <part_hist_highbp_treat.PNA> <label>Prefer not to answer</label> <type>boolean</type> </part_hist_highbp_treat.PNA> <part_hist_highbp_treat.DK> <label>Don't know</label> <type>boolean</type> </part_hist_highbp_treat.DK> <part_hist_highbp_treat.comment> <label>Comment</label> <type>text</type> </part_hist_highbp_treat.comment> </part_hist_highbp_treat> </MAIN> </MedicalHistoryInterviewQuestionnaire> </briccs_questionnaire>
Jeff's Revision of the above into a no-attributes style xml:
<?xml version="1.0" encoding="UTF-8"?> <source xmlns="http://briccs.org.uk/xml/v1.0/oi" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > <name>briccs</name> <stage> <name>MedicalHistoryInterviewQuestionnaire</name> <section> <name>MAIN</name> <question> <name>part_hist_highbp</name> <label>Have you ever suffered from high blood pressure?</label> <variable> <name>part_hist_highbp</name> <label></label> <type>text</type> <variable> <name>N</name> <label>No</label> <type>boolean</type> </variable> <variable> <name>Y</name> <label>Yes</label> <type>boolean</type> </variable> <variable> <name>PNA</name> <label>Prefer not to answer</label> <type>boolean</type> </variable> <variable> <name>DK</name> <label>Don't know</label> <type>boolean</type> </variable> <variable> <name>comment</name> <label>Comment</label> <type>text</type> </variable> </variable> </question> <question> <name>part_hist_highbp_onset_cat</name> <label>When did you first suffer from high blood pressure?</label> <variable> <name>part_hist_highbp_onset_cat</name> <label></label> <type>text</type> <variable> <name>YEAR</name> <label>Year</label> <type>boolean</type> </variable> <variable> <name>PNA</name> <label>Prefer not to answer</label> <type>boolean</type> </variable> <variable> <name>DK</name> <label>Don't know</label> <type>boolean</type> </variable> <variable> <name>comment</name> <label>Comment</label> <type>text</type> </variable> </variable> <variable> <name>part_hist_highbp_onset</name> <label></label> <type>integer</type> </variable> </question> </section> </stage> </source>
Jeff's comments after initial work on the above ideas:
Learned some obvious lessons that require revision of the approach...
Basically not all variables are the result of a question. And it comes in various guises...
- Even in a Questionnaire stage, there are some variables not attached to any question, eg: QuestionnaireRun.version and other sorts of metric-like data.
- Some stages are not questionnaires (and therefore contain no questions), eg: the sample stages.
- Some are not even stages: see the Participant file within the export. The Participant file contains participant data retrieved via the PMI plus "other" admin type data, eg: Admin.Participant.birthDate and Admin.Action.user.
Jason's 'bash' at a stylesheet
I've been working on a stylesheet to take the variables.xml onyx extract and make it into Jeff's suggested ontology schema format so here it is :-
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="*"> <xsl:apply-templates select="variables"/> </xsl:template> <xsl:template match="/variables"> <xsl:element name="source"> <xsl:element name="name">briccs</xsl:element> <!-- only interested in variable elements with a child of categories --> <xsl:apply-templates select="variable[categories]"/> </xsl:element> </xsl:template> <!-- only want one stage element and subsequent section etc so apply only when it's the first in the list rather than all of them --> <xsl:template match="variable[categories][position()=1]"> <xsl:element name="stage"> <xsl:element name="name"> <xsl:value-of select="attributes/attribute[@name='stage']"/> </xsl:element> <xsl:element name="section"> <xsl:element name="name"> <xsl:value-of select="attributes/attribute[@name='section']"/> </xsl:element> <!-- to get the depth right we now need to go up the tree and get all the variables with categories and loop thru them --> <xsl:for-each select="//variable[categories]"> <xsl:element name="question"> <xsl:element name="name"> <xsl:value-of select="@name"/> </xsl:element> <xsl:element name="label"> <xsl:value-of select="attributes/attribute[@name='label']"/> </xsl:element> <!-- now get all the categories below where we are --> <xsl:apply-templates select="categories"/> </xsl:element> </xsl:for-each> </xsl:element> </xsl:element> </xsl:template> <xsl:template match="categories"> <xsl:element name="variable"> <xsl:element name="name"> <xsl:value-of select="../attributes/attribute[@name='questionName']"/> </xsl:element> <xsl:element name="type"> <xsl:value-of select="../@valueType"/> </xsl:element> <xsl:apply-templates/> </xsl:element> </xsl:template> </xsl:stylesheet>
Seems to work quite well but needs more effort around the variable/variable bit below question as I'm not entirely sure where we are sourceing this data from. I've attached the file as well for easier editing etc.
Feedback and intermediate results from Jeff
Attachments (5)
-
onyxontology.xml
(2.6 KB
) - added by 14 years ago.
Nick's initial example of an intermediate form of xml that could be used to derive an ontology.
-
mapVariablesToOnyxOntology.xslt
(2.1 KB
) - added by 14 years ago.
maps the variables xml file to the onyx schema
-
onyx-i2b2-ontology.png
(20.5 KB
) - added by 14 years ago.
Graphical representation of the schema
-
onyx-to-ont-examples.zip
(51.9 KB
) - added by 14 years ago.
Intermediate ontology files. Third attempt.
-
onyx-i2b2-ontology.2.xsd
(2.4 KB
) - added by 14 years ago.
Schema used to produce the 3rd attempt at intermediate ontology files.
Download all attachments as: .zip