Data Dictionary File
Data Dictionary File
Column Name | Description | Examples | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
name | Name of the column. No spaces. | ind_id, site_id, etc. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
unit | Units of the data collected in the column. | For a column weight, the units can be pounds, kilograms, etc. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
type | Data type of the information collected in the column. It can be string, integer, decimal and can be combined with fixed_set, encoded, multi_fixed_set, or multi_encoded. |
* For fixed-set, encoded, multi_fixed_set, and multi_encoded values, the corresponding values in the file must match EXACTLY, in both case and content. Using the "string, fixed_set" example above, "schizophrenia,disorganized" or "schizophrenia, Disorganized" will NOT be recognized as "schizophrenia, disorganized"
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
min |
Valid for numeric data. The minimum value allowed for the data. |
You may use pre-defined variables in min/max columns, which will be substituted by the system. Available variables are current_year or current_date.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
max | Valid for numeric data. The maximum value allowed for the data. |
You may use pre-defined variables in min/max columns, which will be substituted by the system. Available variables are current_year or current_date.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
min_length |
Valid for string data. The data value must have a minimum length of min_length This column will be ignored if the entered "type" is integer or decimal |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
max_length |
Valid for string data. The data value must have a maximum length of max_length This column will be ignored if the entered "type" is integer or decimal |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
unique |
Assign a one or more labels (separated by a pipe '|') to identify a column, or a set of columns whose values should be unique. Note: Unique columns allow null values. |
For a column ind_id which should contain unique data the value can be set to u_ind_id For a set of columns site_id, family_id, subject_id which combined should contain unique data the value can be set to u_sfs_id. The same label u_sfs_id should be assigned for all three columns
Example: Desired effect (user, col_a), and (user, col_b) combinations should be unique.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
mandatory |
Columns which do not allow null values.
|
Expression Grammar Access value of a field using format, c["<column-name>"], i.e. c["sex"], etc. Constant string should be enclosed in single or double quotes, i.e "a" or 'a' Constant integers should not be quoted, i.e. 159, 35.0 Constant booleans should be either True or False Constant date's must be reprsented as date( "<date>"), where
<date> is date in IS0 8601 format
Examples
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
primary_key | For column, or set of columns which combined uniquely identify records in the phenotypic file set this to Y |
If site_id, family_id, subject_id uniquely identify records in a file
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
resolution | Valid for floating point values. |
Data contained in the column will be truncated to n digits after the decimal point. If the data is 4.1212, and resolution for column is specified to 2 the data will be truncated to 4.12. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
values | Valid when the type column contains fixed_set, encoded, multi_fixed_set, or multi_encoded value. |
Fixed Set * Specifies the set of values which are allowed in the column of the corresponding phenotypic file. Example: For a column Gender the valid values are Male, Female, or Unknown Consequently, the values column should contain Male|Female|Unknown Format: <value_1>|<value_2>|..|<value_n> separated by a pipe '|'. NOTE: In the corresponding phenotypic file, no more than one value may be entered per cell. That is, the defined options are mutually exclusive. Multi Fixed Set * Specifies the set of values which are allowed (either alone or in combination) in the column of the corresponding phenotypic file. Example: For a column TherapyDays, the valid values may be any combination of Monday, Wednesday, or Friday. Consequently, the values column should contain Monday|Wednesday|Friday
Format: <value_1>|<value_2>|..|<value_n> separated by a pipe '|'. NOTE: In the corresponding phenotypic file, any combination of these defined values may be entered per cell. Multiple values should be separated by a pipe '|'. Encoded * Specifies the set of values which are allowed in the column of the corresponding phenotypic file. These values are paired with their respective meanings. Example: For a column Gender the valid values are M, F, or U. The corresponding meanings are Male, Female, and Unknown, respectively. Consequently, the values column should contain M=Male|F=Female|U=Unknown Format: <value_1>=<meaning_1>|<value_2>=<meaning_2>|..|<value_n>=<meaning_n> separated by a pipe '|'. NOTE: In the corresponding phenotypic file, no more than one value may be entered per cell. That is, the defined options are mutually exclusive. Multi Encoded * Specifies the set of values which are allowed (either alone or in combination) in the column of the corresponding phenotypic file. These values are paired with their respective meanings. Example: For a column TherapyDays, the valid values may be any combination of M, W, or F. The corresponding meanings are Monday, Wednesday, and Friday, respectively. Consequently, the values column should contain M=Monday|W=Wednesday|F=Friday Format: <value_1>=<meaning_1>|<value_2>=<meaning_2>|..|<value_n>=<meaning_n> NOTE: In the corresponding phenotypic file, any combination of these defined values may be entered per cell. Multiple values should be separated by a pipe '|'. * For fixed_set, encoded, multi_fixed_set, and multi_encoded values, the corresponding values in the file must match EXACTLY, in both case and content. Using the "Encoded" example above, "m" will NOT be recognized as "M". |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
description |
Detailed text describing the information contained in this column. This field cannot be left empty. |
Advanced Examples
Consider a column Age which contains integer values between 0 and 120. The column has specially designated values -1 for Missing, -2 for Not collected
name | type | min | max | values |
---|---|---|---|---|
age | integer,encoded | 0 | 120 | -1=Missing|-2=Not Collected |
Consider a column Twins which can contain only the following values Monozygotic, or Dizygotic.
name | type | values |
---|---|---|
Twins | string,fixed_set | Monozygotic|Dizygotic |
Consider a column TherapyDays that represents which days a subject may receive theraupeutic intervention.
name | type | mandatory | values |
---|---|---|---|
TherapyDays | string,multi_encoded | n | M=Monday|W=Wednesday|F=Friday |
Since this field is not mandatory, "NULL" is an acceptable value if a given subject receives no therapy. But note that a "NULL" entry may only appear by itself - that is, it may not be piped (concatenated) to any of the defined values (M, W, or F). For example, if a subject receives therapy on Monday and Wednesday but not Friday, the correct entry would be "M|W" and not "M|W|NULL".
In addition, defined values may appear no more than once in a given entry. For example, if a subject receives therapy twice on Monday and once on Friday, the correct entry is "M|F" and not "M|M|F".