== Onyx Export and Purge ==
=== Sources of Info ===
The Onyx User Guide has a useful Chapter 12 "Topics for System Administrators" with details of the export and purge functions.
* The User Guide can be found here http://wiki.obiba.org/confluence/display/ONYX/Onyx+User+Guide
Other useful links to the obiba wiki
* Configuring export and purge http://wiki.obiba.org/confluence/display/ONYX16x/Configuring+Data+Export+and+Purge
* Data Exportation from Onyx. Interesting, but don't know how up to date: http://wiki.obiba.org/confluence/display/ONYX/Data+Exportation+from+Onyx
* Onyx Variables: http://wiki.obiba.org/confluence/display/ONYX/Onyx+Variables
=== Overview ===
'''Exporting data''' from Onyx means reading data from the Onyx database and writing it to one or more export destinations. Exporting does not delete any data from the Onyx database. Deleting data from the database is done by the purge function. An export destination is a compressed zip file. Participant data and experimental conditions data can be exported. Configuration of data export is done entirely in configuration files, not through the Onyx user interface. Some things that can be configured:
* Which data is selected for export
* Directory to which export files are written
* How many export destinations are defined
System administrators trigger an export via the Onyx web interface. The configuration file controls everything else. It is __NOT__ possible from within the web interface to choose which participants will be exported. Any selective export can only be tailored via the configuration file.
'''Purging data''' means deleting data from the Onyx database. Only participant data can be purged — not experimental conditions data.
Configuration of data purging is done entirely in configuration files, not through the Onyx user interface. As per data export, only a system administrator can execute a purge of data by a function from within the user interface.
=== Sample Config Export File and resulting Export Zip File ===
The following represent an export of only four participants. The zip file contains a lot of xml. It's worth opening and just pondering how we might approach this. Virtually everything is captured regarding a participant and the interview process. How much of this do we want in i2b2?[[BR]]
[raw-attachment:export-destinations.xml Export destinations file][[BR]]
[raw-attachment:BRICCS-20110106095220.zip Resulting export zip file]
=== How much do we want to export and purge? ===
It looks as if the export config file attached results in almost all data being exported for those participants whose interview status is closed, completed or cancelled.[[BR]]
Some aspects are excluded which I (Jeff) do not fully comprehend:
1. Notably to do with the variable 'Participants:Admin.Interview.exportLog.destination'.
1. Some aspects of the Consent table are not exported.
The question of how much we purge is an open question. Remember that this may affect the reporting tool. I mention this here because I believe the purge config file that we have as a default will result in almost all participant data going that does not have an open status.[[BR]]
On the whole it seems sensible to export as much as we can and then archive export files; ie: retain them forever. We may wish to consider encryption given the idea of retaining in perpetuity. [[BR]]
Why export everything? Because it gives us more than one bite of the cherry for the import into i2b2 (or any other piece of software). The detail shown in the export file is quite daunting. It is conceivable that if we filtered during the export we might get this wrong, or change our minds later.
=== Filtering the exported data ===
All of the exported data is in XML format. Given the large amount of detail exported, we need some way of marshalling this into a somewhat simpler form '''''prior''''' to organizing it for import into i2b2.[[BR]]
The idea is to come up with a programmable process (an automated process) that will act as a first filter which can be applied to all exports.[[BR]]
Whatever process we come up with, it is likely to be a process with a number of steps, and we are unlikely to get it correct first time. The process will be one involving manual inspection of example files from within an export zip file together with some programming to decide on a what data can be '''''eliminated'''''.[[BR]]
The manual inspection is the thinking bit. Don't jump to conclusions on first inspection.[[BR]]
For instance, this is an extract from a Participant's file...
{{{
Ready
InProgress
Ready
InProgress
... similar lines removed ...
Interrupted
InProgress
Ready
InProgress
}}}
It's probable in my judgement that this could be filtered out. But what about:
{{{
JeffLusted
JeffLusted
JeffLusted
... similar lines removed ...
JeffLusted
JeffLusted
JeffLusted
}}}
=== Experiment with Exclusion at the Entities Level ===
Altered the export-destinations.xml file so the type="EXCLUDE" scripts were commented out throughout the file. The following is just the first instance of this:
{{{
}}}
I then ran an export. The export produced a zip file containing all the participants that had a completed status on my test system, even those that had been previously exported. So the conclusion is that the exclude condition above ensures that duplication of exported participants does __NOT__ take place. As an aside, there are no participants on my test system with a closed or cancelled status.
[[BR]][[BR]]
I assume that there are a number of ways of achieving the same result using !JavaScript. On the Obiba web site, the equivalent of the above is given by:
{{{
...
}}}
The export-destinations.xml configuration file that shipped with the Briccs questionnaire has the following:
{{{
}}}