Datasets

Each analysis can have a collection of Datasets (SDTM, ADaM, or Custom).

Users can track Dataset metadata (including all metadata required for Define-XML), QC/Test Results and Issues (bugs, change requests).

Dataset Index

Navigate to the Analysis that contains the Datasets you would like to view.
In the upper, right of the Analysis Details page, click DATASETS (42) link.
The Dataset Index page displays all the datasets defined for the analysis, their status, type, current review period structure, etc.
Status – displays the status of the Dataset (New, Programming, Ready, Tested).
SAS Dataset Name – For example, 'ADSL', 'Demog', 'AE', 'SAE', 'CONMED'.
Run Order – Can be used to specify the programming execution order of Datasets. Leave blank to auto-populate this field to 0. Valid values are 0, 1, 1.1, 1.01
Domain – Domain name of dataset. Required for SDTM. Not required for ADaM.
Description – displays the total number of variables and a list of keys for a given Dataset.
Type – Defines the dataset type (e.g. ADaM, SDTM, Custom). This is a managed drop down list.
Class – Defines the subset within a type (e.g. Events, Finding, Special Purpose). This is a managed drop down list
Structure – displays information regarding the structure of the Dataset (e.g., one record per subject).
Program – Name of program that generates the dataset.
Assignments – displays assigned programming responsibilities:
- P = Programmer
- T = Tester
- S = Statistician
Variables – displays the total number of variables and a list of keys for a given Dataset.

Filter Dataset Index

There are 2 different tools to help you filter the Dataset Index page.

As soon as you change the Status filter or type in the text box, the Dataset list will refresh with the filter applied.

Filter by Status

In the upper, left-hand corner, locate the heading that reads, Datasets at <status> (#).
Click one of the status options in the list to apply the filter.

Filter by Text

In the upper, right-hand side, click inside the text box that reads, Type to filter.
Enter your search criteria.
Hit Enter key or click the button.
The text-based filter will apply to the fields that appear in the index.

Note

If your search terms contain any space characters, surround the search terms with double-quotes (e.g., "Adverse Events").

Add Datasets

There are several ways to add Dataset metadata to an analysis:

Using an Excel template available in the system.
By cloning from existing analysis.
By adding them one at a time on the Dataset Index page.

Add Datasets via Excel

STEP 1: Get a copy of the dataset metadata import spreadsheet.

Get a copy of the Candid Dataset Import Excel template if you do not already have one: In the main menu, on the left, expand the Help main menu item. Click the Dataset Importing link.
Click the button.
A sample copy of the Datasets-only Excel Template file will be saved to the Downloads folder on your machine. Alternatively, work from a previous template you have worked with (as long as the structure matches the sample).
Add or Update rows in the Dataset worksheet (see Dataset Excel Template - Dataset Tab Column Definitions).

STEP 2: Import updated spreadsheet from STEP 1.

Navigate to the Analysis you would like to work on.
Click the DATASETS (76) index link in the upper, right corner of the page (above the TFL status doughnut chart).
Click the button.
Click the Import link.
On the Dataset Import Form select the Excel file you have been working on.
Check whether you want to update metadata in existing datasets or not. If you check the box the system will attempt to update any dataset that matches by Name.
Click the button.
The system will do a validation check and open the Dataset Import Preview page. If no validation errors are found the system will display the list of datasets it will attempt to add or update, depending on your previous selection, otherwise the validation errors will diplay messages about what to fix in your template.

Add Dataset by cloning an existing one

Navigate to the Analysis you would like to work on.
Click the DATASETS (29) index link in the upper, right corner of the page (above the Dataset status doughnut chart).
Locate the Dataset you would like to clone and click the button.
The dataset details page will open. In the header of the Details section click the button.
Complete the Clone Dataset wizard:
Provide a new name
Provide a new description
Click the button.

Add Datasets one at a time

Navigate to the Analysis you would like to work on.
Click the DATASETS (147) index link in the upper, right-hand corner of the page (above the Dataset status doughnut chart).
Click the button.
Complete the form that opens:
SAS Dataset Name – name for the Dataset (e.g., AE).
Run Order – Used to specify the programming execution order of Datasets. Leave blank to auto-populate this field to 0. Valid values are 0, 1, 1.1, 1.01.
Domain – (optional except for ADaM Define-XML).
Description – long name for the Dataset (e.g., Adverse Events).
Type – Dataset structure (e.g., ADaM, SDTM, or Custom).
Class – available classes are definied by the administrator (e.g., Basic Data Structure, Subject Level).
Structure – (optional).
Program – Name of program that generates the dataset.
Click the button to add.
Add another Dataset by following steps 1-5 above, or click the button (located in the upper right-hand side of the page) to get out of Add mode.

Edit Dataset Metadata

Datasets can be edited from the index page, the dataset's details page or via the Excel template.

Edit Dataset metadata from index page

Navigate to the Analysis you would like to work on.
Click the DATASETS (12) index link in the upper, right-hand corner of the page (above the Dataset status doughnut chart).
Click the button in the upper, right-hand corner of the page.
The Dataset index will be put into Edit mode. While in edit mode you can edit any cell in the grid. Your value is saved in any edited cell when you tab out, or click out of the cell.

Edit Dataset metadata from details page

Navigate to the Analysis you would like to work on.
Click the DATASETS (32) index link in the upper, right-hand corner of the page (above the Dataset status doughnut chart).
Click the button for the Dataset you would like to edit.
On the Dataset Details page, you can view the Dataset's status, update team assignments, review issues, add test results.

Dataset Detail Sections:

Status Displays an overview of the Dataset's workflow; the current status will be highlighted and underlined.

New -> Programming -> Ready -> Tested

Details Displays Dataset metadata: such as the name, type, class, program, structure, documentation, domain.

Issues Displays all issues associated with this Dataset: the Status (Open or Closed), Description, and Resolution for each issue on the Dataset; to view an issue in more detail, click the button.

Test Results For each test result, view the Status (Failed or Passed), Description, Test Method, Program, and Issue; you can also edit or delete individual test results by clicking on the and buttons, respectively.

Status Changes Lists the Dataset workflow audit trail.

Change Status in bulk

Navigate to the Analysis you would like to edit.
Click the DATASETS (20) index link in the upper, right-hand corner of the page (above the Dataset status doughnut chart).
Click the button.
Click
Click the green, left-hand drop down menu to select the FROM status., e.g: Programming, Ready, Tested, Canceled.
Click the green, right-hand drop down menu to select the TO status.
Click .
Check the box next to each Dataset you would like to update.
Click
The next step will ask you to confirm your changes; click to proceed.
Candid will display the results of your status update; any errors will also be displayed.
Click .

Update Assignments in Bulk

From the Dataset index page click the button.
Select the Manage Assignments option.
Click the button begin adding users, or click to begin removing assigments.
Click
In the left-hand dropdown list, select the role you would like to add.
Click on the empty box and choose the desired name of the person who has the role you selected above.
Click the button.
Repeat steaps 3 through 7 for additional roles/users you would like to add to add/remove.
For each TFL to which you would like to add your selected users, click the checkbox in the Selected column.
Click
If the assignments are correct, click the button, if everything is correct.

Note

Only users who are on the analysis team can be assigned to datasets.

Dataset Import Guide

Edit Dataset Metadata via Excel

For Define-XML

STEP 1: Export existing dataset metadata to Excel.

Navigate to the Analysis you would like to work on.
Click the DATASETS (77) index link in the upper, right-hand corner of the page (above the Dataset status doughnut chart).
Click the button.
Click the Export to Excel link.
An Excel file in the Candid dataset template format will be saved to the Downloads folder on your machine.
Edit the worksheets (see Dataset Excel Template - Dataset Tab Column Definitions).

STEP 2: Import updated spreadsheet from STEP 1.

Navigate to the Analysis you would like to work on.
Click the DATASETS (77) index link in the upper, right corner of the page (above the TFL status doughnut chart).
Click the button.
Click the Import link.
On the Dataset Import Form select the Excel file you have been working on.
Check whether you want to update metadata in existing datasets or not. If you check the box the system will attempt to update any dataset that matches by Name.
Click the button.
The system will do a validation check and open the Dataset Import Preview page. If no validation errors are found the system will display the list of datasets it will attempt to add or update, depending on your previous selection, otherwise the validation errors will diplay messages about what to fix in your template.

Define-XML Excel Template Worksheet Definitions

Datasets: Contains the dataset names and related metadata to import. If the Import column has the value 'Yes' Candid will create a Dataset record as well as look for a worksheet with the same name as the dataset containing the variables and try to add them. If there is no corresponding variable worksheet then only the dataset record will get created.
CDISC CT: Contains the standard CDISC controlled terminology. You are free to edit as you want. Recommended to delete any unused terminology for the given deliverable.
Custom CT: Here is where you can add any additional controlled terms.
External CT: For example MedDRA and WhoDrug external dictionary references.
VLM: Value Level Metadata parent records (where clause range checks (child records) are entered on the Where Clauses tab).
Where Clauses: These are the where clause range check values. The children of the VLM tab (1 VLM record must have at least 1 or more Where clause range check records)
Lookup: Please do not edit. These are the drop down values used by the Variable worksheets. For Variable Types, Roles, Origins, Cores, Value Level Metadata where clause comparators and where clause check types.
Template: This worksheet is for reference only. You can use this tab to copy and create new Variable worksheets from.

Note

Any tabs that have a tab name that matches a name in a row on the dataset tab are considered Variable metadata for that dataset you want to import. Add a separate tab that follows the Template rules to define the variable list for each dataset you wish to import. The name of the tab should exactly match the name of the dataset on the Datasets tab.

All other tabs are ignored.

Define Excel Template Data Entry Guide

Dataset Tab Column Definitions

Column A Domain: Required for SDTM. Not required for ADaM.
Column B Name: Required. Must be a unique value.
Column C Repeating: Required. Enter 'Yes' or 'No'. Indicates whether a domain contains more than one record per subject or only one record per subject.
Column D Class: Required. Must match a Dataset Class you have defined in the system.
Column E Reference Data: Required. Indicates whether the dataset contains reference data (not subject data).
Column F Structure: Optional.
Column G Description: Required.
Column H Standard: Required. Enter a value the matches one of the Dataset Type values defined in the system.
Column I Version: Required.
Column J Import: Required. Enter Yes or No. Entering 'Yes' means you want Candid to import that Dataset.
Column K Comment: Optional.
Column L Comment Page #: Optional. Enter page number of an external document containing the details about how the dataset was created.
Column M Program: Optional and Not related to Define. This field is for tracking testing details.
Column N Default Test Program: Optional and Not related to Define. Leave blank for Candid to auto-populate this field using default settings.
Column O Default Test Method: Optional and Not related to Define. Leave blank to auto-populate this field using default settings. Must enter a valid Test Method defined in the system to override the default.
Column P Run Order: Can be used to set the order of datasets in define.xml. Seting all dataset's run order value to 0 and they will be sorted by Name. Valid values are 0, 1, 1.1, 1.01.
Column Q Id: For system use only. May not be present. Does not output to define.xml. Safe to ignore.
Columns R - W Programmer1, Programmer2, Tester1, Tester2, Statistician1, Statistician2: Optional and not related to Define-XML. Represents assigned Programmers, Testers, and Statisticians. Must select from the drop down lists which are provided during export and can be found on the Lookup tab. These lists are made up of the Statisticians and Programmers listed on the Analysis team at the time of export. Names in these fields that do not match an analysis team member will not be added. A single name cannot be assigned to more than one role. If these columns are not present on the Dataset tab no processing will be done on existing assignments already in the system.

How to Enter Custom Controlled Terminology into the Template

The rules for adding data to the Custom CT tab are the same as the CDISC CT tab (the colors have no meaning for the import; are there for readability only).

It's best (not required) to not put CDISC controlled terminology on the Custom CT tab. The reason is that it would be very easy to put a duplicate CDISC code on the Custom CT tab (Candid's validation rules will reject duplicates).

CDISC CT & Custom CT Column Definitions

Adding the Controlled Term (parent row)

Column A Code: Required. Must be a unique value.
Column B Codelist Code: leave blank.
Column C Extensible: Required. Enter 'Yes' or 'No'.
Column D Codelist Name: Required. Enter the long name of the controlled term.
Column E Submission Value: Required. This will represent the OID in Define and the value that is used to map controlled terms to variables.
Column F SAS Format Name: Optional. This value will represent the SAS Format Name for the controlled term and must be 7 characters or less.
Column G Synonyms: not relevant to parent row.
Column H Definition: Optional. Enter the description of the controlled term.
Columns I-L not relevant to parent row
Column M Active: Required. Enter 'Yes' or 'No'. 'Yes', means the value is applicable to the deliverable.
Column N Rank: not relevant to parent row.
Column O DataType: Optional. Enter 'text', 'integer', or 'float' to override Candid's default of determining the type based on the Controlled Term's set of submission values.

Adding the Controlled Term Items

Adding the Controlled Term Items (child rows)

Column A Code: is required and must be a unique value.
Column B Codelist Code: enter the parent Code from the parent's value in column A.
Column C Extensible: Enter 'Yes' if this is an extended controlled term item (see Note below about how to add Extended CDISC items).
Column D Codelist Name: is required, long name of the controlled term item.
Column E Submission Value: is required, represents the preferred term.
Column F SAS Format Name: leave blank - not relevant to child rows.
Column G Synonyms: optional, represents the longer version of the preferred term or a semi colon separated list of synonyms for the preferred term.
Column H Definition: optional, description of the controlled term item.
Column I Std CRF Coded Value: Optional. Enter if your company has a standard CRF coded value.
Column J Std CRF Decode: Optional. Enter if your company has a standard CRF Decode value.
Column K CRF Coded Value: Optional. Coded term as specified on the CRF.
Column L CRF Decode: Optional. Decode term as specified on the CRF.
Column M Active: Required. Enter 'Yes' or 'No'. 'Yes', means the value is applicable to the deliverable. A 'No' means the value is part of the controlled terminology but was not used/or valid in the source data for the deliverable.
Column N Rank: Optional, allows specifying a controlled term Rank. Will output to define.xml if populated.
Column O DataType: not relevant to child rows.

Note

About Order Numbers: each controlled term item will get an order number automatically. That number is determined by the order they appear in the worksheet.

Note

How to add Extended CDISC Items: In your Candid metadata spreadsheet's CDISC CT tab:
1. Enter an asterisk character (*) before the unique 'dummy code' for the extended item in Column A (e.g. *C468721.1).
2. Enter 'Yes' in Column C - Extensible.
When you do this only the '*' character will output to define.xml rather than the full code value (e.g. [*]).

External Libraries

Column A Name: Required. Can be the official or internal name. Must be a unique value.
Column B Description: Optional. Your own internal description for the library.
Column C Dictionary: Required. Official name of the external dictionary. This value is output to define.xml. Must be a unique value.
Column D Version: Required. The version designator of the external codelist.
Column E Ref: Optional. Reference to a local instance of the dictionary.
Column F Href: Optional. URL of an external instance of the dictionary.

Variable Tab Column Definitions

Column A Variable: Required. SAS name for the variables.
Column B Label: Label: Required for Variables, optional for VLM. The variable's long name (aka description).
Column C Key: Key: Optional. An integer value that identifies the variable as uniquely identifying a row in a dataset (or is among a set of variables that uniquely identifies a row in a dataset).
Column D Order: Order: Required. Should be an integer. Defines the order variables will listed.
Column E Type: Type: Required. Defines the variable's SAS data type (e.g. text, integer, float, date).
Column F Length: Length: Required. Variable's SAS length property.
Column G Significant Digits: Optional. Number of significant digits for float variable types.
Column H CDISC Description: Optional. Not output to define.xml
Column I Comment: Comment (aka long description) value that will go into the define.xml.
Column J Internal Comment: Optional. Values in this field are for internal use only and will not be placed in the resulting define.xml.
Column K Origin: Required unless value level metadata is associated with this variable. Must be one of the values listed in the Origin section in the Lookup tab. Represents the description of the origin of the variable.
Column L CodeList: Optional. Enter the controlled term's Submission Value (aka, code list name, SAS Format Name) if applicable.
Column M Display Format: Optional. Display format supports data visualization of numeric float and date values.
Column N Method: Optional. Enter details about how the variable is derived (aka Derivation, Computational Algorithm), if applicable.
Column O Method Name: Optional. If applicable, enter the name representing a derivation.
Column P Role: Required, unless value level metadata is associated with this variable.
Column Q Core: Required. For example Cond, Exp, Perm, Req (see Lookup tab for complete list).
Column R CRF Page #: Required if Origin equals CRF, or Origin equals Predecessor. for CRF origins enter page number references in one of the following three ways: 1 or 5, 7 or 5 7 or 9-15. For Predecessor origins enter the DATASET.VARIABLE_NAME predecessor value.
Column S **Method Page #: Optional. Page number references to a Computational Algorithm pdf to package with the define. Enter page number references in one of the following three ways: 1 or 5, 7 or 9-15.
Column T Comment Page #: Optional. Page number references to a Comment.pdf to package with the define. Enter page number references in one of the following three ways: 1 or 5, 7 or 5 7 or 9-15. Adding Value Level Metadata.

VLM Tab column definitions

** Follows the same rules for variable tabs

Where Clause Tab column definitions

Columns A, B, and C must match a Dataset name, Variable name, and SAS Field name set on the VLM tab. Every row in the VLM tab must have at least 2 rows in the WhereClause tab. 1 row for representing the 'match' back to the VLM metadata and a subsequent row(s) for the RangeCheck values.

Column A Dataset: Required. Must match a corresponding Dataset value on the VLM tab.
Column B Variable: Required. Must match a corresponding Variable value on the VLM tab.
Column C SASFieldName: Required. Must match a corresponding SASFieldName value on the VLM tab.
Column D Row Type: Required. Must be 1 of 2 possible values ('Whereclauses' or 'Rangecheck'). There much be at least 1 row below every WhereClause row and columns E-J must be empty. For each RangeCheck row, columns A-C must be empty. There can be more than 1 RangeCheck row below each WhereClause row to represent all the RangeCheck options for a given where clause.
Column E Check Dataset: Required. The name of the source dataset for the RangeCheck.
Column F Check Variable: Required. The name of the source variable for the RangeCheck.
Column G Comparator: Required. RangeCheck comparator value (e.g EQ, IN, NOTIN).
Column H Check Value: Required. The RangeCheck value.
Column I Comment: Represents the 'Join Comment'. Only required if the Range Check variable is on a dataset different from the Value List Item dataset
Column J Type: Required. Value must be 'Soft' or 'Hard'.