SEND Implementation Wiki - Define Fundamentals
What is the Define File? |
---|
The "define" file is a file which describes information about the SEND datasets, such as which domains are represented, which fields are present in each domain, usage of CT, and special calculations or notes on the population or calculation of fields. The define file has two primary benefits:
As of define 2.0, define is submitted as an xml file (previous versions allowed PDF). |
Where Do I Find Specification? |
---|
|
Structural Basics |
---|
Key Concepts |
The majority of the content of the define.xml file consists of the specifications for domains and variables. Variables are defined through ItemDef elements, usually 1 per variable per domain, although shared columns like STUDYID and USUBJID require only 1 definition regardless of how many domains use them. Domains are specified as ItemGroupDef elements, which are in turn collections of ItemRef elements, or references to the variables' ItemDef elements. Another key aspect to the define.xml includes CT, included as CodeList elements. These are referenced by the ItemDef (variable) elements through a CodeListRef subelement. CodeLists can either be a single reference to an external code list (e.g., SEND CT) or an itemized list of terms for a sponsor-specific list. |
Element Overview |
The following is a summary of the primary elements contained within a define.xml file.
|
Viewing |
---|
Opening a define.xml plain (e.g., with notepad) will just show raw xml, which is readable to robots. If you want to view a define file in a more human-friendly way, you can use a style sheet, which is like a companion file that gives instructions to a browser on how to represent the xml file in a nice way. Style sheets can take on many flavours, and no one style sheet out there is "definitive" - it is mainly a matter of preference. From time to time, the CDISC XML Technology team publishes a define.xml style sheet for public use. They can be found here: https://wiki.cdisc.org/display/PUB/Stylesheet+Library |
Preparation |
---|
Getting a Base File |
If you are using a vendor solution to create SEND files, it typically will come bundled with functionality to output a define.xml file. There are also third party standalone define.xml products. If you need to create one yourself, then you have options:
These are good as a good starting point for your define.xml, as it will create all of the structural basics for you; however, it does not have the ability to populate the company-specific information such as comments, desired data types, custom controlled terminology, and so on. Another option is to use an example define file provided from the define-xml site. These have more realistic examples, although not SEND-based. |
Refining |
The raw define file needs several additions and refinements. Every study
Case by case:
|
Style sheet |
If you want the recipient to be able to view the define in the same presentation as you, then make sure to include the style sheet you use in the package. This is not required, but can be helpful. |
Advance Define Concepts |
---|
Value-level Metadata |
Value-level metadata needs to be defined when data in all rows of a variable cannot be described by a single collection of metadata. Using the LB domain as an example, the LBORRES variable contains both qualitative and quantitative test results. The quantitative results may be integers or floating point values, and the floating point values may have different precisions. Some data may be collected and some derived. The qualitative results may use different result coding schemes that need documenting in different codelists. Thus, LBORRES cannot be described with a single collection of metadata at the variable level, and value-level metadata is required All of the attributes and child elements (data type, length, significantdigits, codelist, origin, derivation method, comments) available for variable-level metadata are also available for value-level metadata. Additionally, value-level metadata needs to have some qualifier that describes the subset of data that is being described. Continuing the example of LBORRES, using the entry in LBTEST or LBTESTCD might be a good way to break up the values in the dataset into subsets which allow LBORRES to be defined with one set of metadata per test. In other words, the values in LBORRES could be described separately for each test (LBORRES values for RETI could be described separately from LBORRES values for GLUC...). As of define-XML 2.0, multiple variables can be used, with different comparator operators, to create a WhereClause that identifies subsets of the dataset, e.g., creating a WhereClause that allows you to define metadata for LBORRES when LBTEST=PROT AND LBSPEC=URINE. |
Extending CT |
When official SEND CT is extended (values exist on study beyond those in the codelist), you'll need to provide these extensions in the define.
|