Changes between Initial Version and Version 1 of i2b2 - UHL clinical informatics system integration PATS v1


Ignore:
Timestamp:
03/12/13 10:58:27 (12 years ago)
Author:
Richard Bramley
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • i2b2 - UHL clinical informatics system integration PATS v1

    v1 v1  
     1[[i2b2 - UHL clinical informatics system integration PATS|back]]
     2
     3Version 1 of the i2b3 PATS data integration load uses a job on the UHL Data warehouse server to move data from the DWPATS database on the UHL data warehouse server to i2b2 database on the UHLSQLBRICCSDB\UHLBRICCSDB server, using the DWBRICCS database on the UHL data warehouse server as an intermediate.
     4
     5== Procedure
     6
     7The job on UHL Data warehouse server contains 3 steps.
     8
     9=== 1. Update I2B2 Patients
     10
     11Runs the stored procedure USP_DWH_IMPORT_BRICCS_PATIENTS in the DWBRICCS database on the UHL data warehouse, which copies patients from the destination database on the I2B2 database server to the DWBRICCS database using a linked SQL server.
     12
     13=== 2. Delete PATS Observations SSIS package
     14
     15Steps:
     16
     17    1. Deletes all records from the PATS_OBSERVATION_FACT staging table in the DWBRICCS database on the UHL data warehouse.
     18    1. Deletes PATS observations from the Observation_Fact table in the production i2b2 database with a concept code begining with 'PTS'.
     19
     20=== 3. Load_PATS_Observations SSIS package
     21
     22This package performs a separate import for each field in the Data Dictionary table in the DWBRICCS database into the PATS_OBSERVATION_FACT staging table.  There are two mechanisms depending on whether the field is a number or text.  The complete contents of the staging table are then moved en masse into the Observation_Fact in the production i2b2 database.
     23
     24This package actively ignores all records that fail when they are imported into the PATS_OBSERVATION_FACT staging table.  This results in two types of records being ignored:
     25
     261. Records for which the start date is NULL.
     271. Duplicate records.
     28
     29== NULL Start Dates
     30
     31An invetsigation showed 56 records in which the start date was NULL.  All of these records were from one record on the EPS_EPSINIVW table.  This might be an error or there may be a valid reason for this.
     32
     33The field DateOfAdmission is used as the start date for the EPS_EPSINIVW table.  Alternative fields for start date on this table are date_timeOfProcedure and procedureStartTime.  Both of these, however, also have records where the date is NULL.  All three fields could be coalesced to reduce the risk of an error.
     34
     35== Duplicate Processing
     36
     37Version one of the data load identifies some records as being duplicates because they have the same patient, sample collection datetime and concept code.  When a duplicate is identified it discards the most recent record.  This is probably not correct for several reasons:
     38
     391. If there are more that two duplicates, it only discards one record and so there will still be a duplicate.
     401. Common sense and reason 1 suggest that it should be keeping the most recent record.
     411. There may be a better way to identify which record is correct.  For example, if the result has been suppressed (result suppression will not solely solve the problem).
     421. Both records may be valid.
     43
     44Paul Smalley has agreed to look at the duplicate records to find out the reasons for the duplication.