LEGACY - i2b2OntologyCRC-1 – LCBRU IT

Context Navigation

Version 15 (modified by jeff.lusted, 15 years ago) ( diff )
--

i2b2 Ontology and CRC Discussion: 1

Questions asked and points to ponder. Access to the demo database is useful here...

Ontology Cell tables compared to CRC concept_dimension table

There are three ontology tables within the demo Ontology cell: birn, i2b2 and custom_meta.

When I do a select count(*) from <table_name> where c_visualattributes like '%LA%' I get the following figures:

birn: 37
custom_meta: 0
i2b2: 93807

When I do a "select count(*) from concept_dimension" within the CRC cell I get a count of 73590.

The totals are 93844 as opposed to 73590. Why is there a difference and what does it mean?

In an Ontology table, what does c_columndatatype really mean?

The docs state: "either ‘T’ for text or ‘N’ for numeric and describes the datatype of the concept". Yet the length of the column is varchar 50.

Within the demo project:

select count(*) from i2b2 where c_columndatatype like '%N%' ===> returns 0
select count(*) from i2b2 where c_columndatatype like '%T%' ===> returns 134762
select count(*) from i2b2 ===> returns 134762

Yet within the query tool (from the Workbench) I can select terms from a Labtest and constrain by value. This is undoubtedly possible because of the optional metadataxml column. But even so, what does c_columndatatype really mean?

By comparison with the observation_fact table in the CRC cell:

select count(*) from observation_fact where valtype_cd like '%T%' ===> returns 5055
select count(*) from observation_fact where valtype_cd like '%N%' ===> returns 23731

Facts constrained by enumerated concepts

Facts can be constrained by simply pointing at a concept row within the concept_dimension table with very little else alongside it. For example, a numeric value like AGE can be divided into an enumeration of concepts, say one for each age from 0 to 114, as is the demo system. Here is a selection based upon this:

select * from observation_fact where concept_cd like 'DEM|AGE:50' ;

This retrieved 3 rows of patients aged 50 from the fact table:

ENCOUNTER_NUM	PATIENT_NUM	CONCEPT_CD	PROVIDER_ID	START_DATE	MODIFIER_CD	VALTYPE_CD	TVAL_CHAR	NVAL_NUM	VALUEFLAG_CD	QUANTITY_NUM	INSTANCE_NUM	UNITS_CD	END_DATE	LOCATION_CD	CONFIDENCE_NUM	OBSERVATION_BLOB	UPDATE_DATE	DOWNLOAD_DATE	IMPORT_DATE	SOURCESYSTEM_CD	UPLOAD_ID
2005000014	1000000014	DEM\|AGE:50	@	22-MAY-06	@	@	-	-	-	-	-	@	25-APR-07	@	-	-	25-APR-07	25-APR-07	25-APR-07	DEMOGRAPH\|DEMO	-
2005000082	1000000082	DEM\|AGE:50	@	25-MAY-06	@	@	-	-	-	-	-	@	25-APR-07	@	-	-	25-APR-07	25-APR-07	25-APR-07	DEMOGRAPH\|DEMO	-
2005000095	1000000095	DEM\|AGE:50	@	05-MAY-07	@	@	-	-	-	-	-	@	25-APR-07	@	-	-	25-APR-07	25-APR-07	25-APR-07	DEMOGRAPH\|DEMO	-

I'm in two minds about this. It is a neat way of arranging data as long as values (whether text or numeric) have a reasonably discrete range. It seems this is the reason why labtests results etc do not follow this paradigm for their numeric values.

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text