wiki:i2b2 Data Import PATS Mapping v1

Version 45 (modified by Richard Bramley, 12 years ago) ( diff )

--

Back

The PATS mappings are split into text and numeric fields.

Text Fields

Text fields are mapped to the i2b2 Observation Fact table like this

i2b2 FieldSource
Encounter_Numi2b2 Patient ID
Patient_Numi2b2 PatientID
Concept_CdEnum lookup value (see below)
Provider_Id"@"
Start_DateThe source table observation date field (see below)
Modifier_Cd"@"
ValType_Cd"T" (text)
Units_Cd"@"
Location_Cd"@"
Update_DateThe current date using the GETDATE() Function
SourceSystem_Cd"BRICCS"
Upload_Id"1"

SOMETIME (RB): Convert Update_Date to use a date parameter. It's just a little quicker.

SOMETIME (RB): Should the SourceSystem_Cd not be "PATS"? Or maybe "MINAP" or whatever.

SOMETIME (RB): The Upload_Id is presumably always one because everything is reloaded each time. Might be nice to change this to something more meaningful.

Enum Lookup Value

The Concept Code to use is found in the table DATA_DICTIONARY_ENUM_VALUES in the DWBRICCS database on UHLDWH server. This table contains records for concept code keyed on source table name, column name and column value. NULL column values may be mapped to the field '(empty)'.

ACTION (RB): Check if all possible values of column value contain a mapping in the DATA_DICTIONARY_ENUM_VALUES table.

Source Table Observation Date Field

For each source table in the there is a single field that is used as the start date for all columns from that field. The observation date field to use for each table is stored in the pats_tables table in the DWBRICS on the UHLDWH server.

Numeric Fields

Numeric fields are mapped to the i2b2 Observation Fact table like this

i2b2 FieldSource
Encounter_Numi2b2 Patient ID
Patient_Numi2b2 PatientID
Concept_Cd"PTS:{TABLE_NAME}_{COLUMN_ID}"
Provider_Id"@"
Start_DateThe source table observation date field (see above)
Modifier_Cd"@"
ValType_Cd"N" (numeric)
TVal_Char"E" (see below)
Units_CdValue calculated by ufn_get_units function (see below)
Location_Cd"@"
Update_DateThe current date using the GETDATE() Function
SourceSystem_Cd"BRICCS"
Upload_Id"1"

TVal_Char

The TVal_Char describes the meaning of the value contained in the NVal_Num field.

For all values currently the value of TVal_Char is set to "E", which means equals. That is, the value in the NVal_Num field is the result of the test. All possible values of TVal_Char are:

ValueMeaning
EEquals
NENot Equal
LLess Than
LELess Than or Equals
GGreater Than
GEGreater Than or Equals

From a casual look at the data, I think that a value of less than X, or greater than Y is sometimes used for Pathology data.

ufn_get_units Function

The function ufn_get_units in the DWBRICCS database is called with parameters of table_name and field_name to determine the units.

ACTION (RB): Work out what this function actually does. The function just selects the units column from the Data_Dictionary table in the DWBRICCS database. If the result is an empty string, it returns 'units' instead. Don't know why it would do that.

ACTION (RB): Rewrite numeric data extract to do the same, but with a join. Or it could be moved into the query that copies all the numeric fields into the recordset for looping with the for each. Either really.

Tables

The PATS i2b2 data integration load maps source table types to either Text or Numeric. All other types are ignored. The mapping is as follows.

Source Typei2b2 Type
VARCHARText
INT or FLOATNumeric
All other typesNOT LOADED

In addition, all fields with certain names are also ignored.

Ignored Field Names

Comments about data:

COMMENT (RB): Which fields should be removed?

COMMENT (RB): All date fields are ignored, but could they be converted into numeric fields. For example, length of stay, time between stays, etc. This may already be being done.

COMMENT (RB): There are quite a few denormalised fields. For example: drug1, drug2 - could these be normalised into one concept code. Or: readmittedDate1, readmittedDate2 - could these be different encounters? Are these already being recorded?

COMMENT (RB): Some dates are recorded as varchars and so presumably are being imported

COMMENT (RB): Are some text fields also being ignored because they don't have entries in the enum data dictionary? Yes they are ignored.

ACTION (RB): Rewrite extracting of varchar fields to check that there are any entries in the ENUM Data Dictionary for that field. This should reduce the number of queries that need to be run in the for each and speed things up quite a bit.

CATH_CATHINITIALVW

CS2003F_PostProcedure

CS2003F_PreProcedure

CS2010F_PostProcedure

CS2010F_PreProcedure

EPS_EPSINIVW

GGHMINAP_GGHMINAP

GGHMINAPNEW

GGHMINAPNEW_GGHMINAPINIVW

GGHPACEICD_GGHPACEICDINIALVW

GGHPCI_FLWGGHPCIVIEW

GGHPCI_INITIALGGHPCIVIEW

LRIMINAPNEW

LRIMINAPNEW_LRIMINAPNEWINITIALVW

VERTED_VERTEDINIALVW

Merging Tables

Currently all the tables are imported separately. However, some tables can probably be merged.

Potential Merges:

  • LRIMINAPNEW, GGHMINAP_GGHMINAP, GGHMINAPNEW,LRIMINAPNEW_LRIMINAPNEWINITIALVW, GGHMINAPNEW_GGHMINAPINIVW

Also, the table MINAPGGH_MINAPGGHINIVW also needs mapping to the same merged view.

The table MINAPLRI_MINAPLRIINIVW only contains 2 patients, all of whose data has been moved to the MINAPGGH_MINAPGGHINIVW table.

Note: See TracWiki for help on using the wiki.