Survival data requirements
Some analyses require survival data, for example Kaplan-Meier plots. Survival data is uploaded separately after a dataset has been created from uploaded TMA scores. A button labelled Attach survival is located on the dataset view, within the About this dataset panel.
Summary of main requirements:
- Survival data must have censoring information - that is, whether the given event (e.g. relapse, or death) had occurred or not occurred the last time the patient was seen.
- Sample identifiers must exactly match the identifiers in the original TMA data. Samples not present in the original TMA will be ignored. Complete survival information must be specified for at least 50% of the samples in the original TMA.
- Two layouts are accepted for survival data files. Both are tables containing sample identifiers in rows, with the top left entry (cell A1 if using Excel) labelled 'Sample'.
- Layout one contains, for each sample, patient survival time in months, and whether an event (such as patient death or relapse) occurred (0 or 1). The columns must be labelled 'Sample', 'SurvivalMonths' and 'EventOccurred'
- Layout two contains, for each sample, a start date (e.g. date of patient operation), and either an event date (such as date of patient death or relapse) or follow-up date (when the patient was last seen alive/relapse-free). Dates must be specified in YYYY-MM-DD format. The columns must be labelled 'Sample', 'StartDate', 'EventDate' and 'FollowUpDate'
Please note that the date-based (secondary) format is converted into time in months and event occurred by TMA Navigator, and this primary format is given for survival data downloaded from the dataset page. For the purposes of the conversion, a 'month' is taken to be 30 days.
Further detail on the survival data requirements is given below.
[Top]Example data
The following example datasets have survival data available for download. They can be opened using Excel or a text editor to look at the file layout, and re-uploaded if you wish to try out the upload facility.
Tab separated format (.tsv)
Excel format (.xls)
[Top]Input file formats
The following list summarises acceptable file formats:
- Microsoft Excel 97 or later (.xls or .xlsx)
For multiple worksheets, only the first one will be used. Survival data must start from the top-left corner (cell A1) and the worksheet must otherwise be empty. Please note that worksheets with formulas are not currently supported. - Comma-separated values (CSV) (.csv)
Columns are separated by commas; rows are separated by new lines. Fields may optionally be quoted using single or double quotes. - Tab-separated values (TSV) (.tsv or .txt)
Columns are separated by tabs; rows are separated by new lines. Fields may optionally be quoted using single or double quotes.
[Top]Data layout one - survival and censoring
Layout one has three columns, which must be titled 'Sample', 'SurvivalMonths' and 'EventOccurred'. 'SurvivalMonths' specifies the length of time (in months) that the patient survived without an event occurring (e.g. relapse or death) since a given start date (e.g. the start date might be the date of operation). If the patient had not had an event (e.g. relapse) when last seen, 'SurvivalMonths' specifies the time up to that point. The 'EventOccurred' indicates whether the patient had experienced an event when last seen or not - indicated by a 0 for no event occurred (e.g. no relapse) or a 1 (event occurred). Therefore samples with a 'EventOccurred' value of 0 represent censored data.
Example input data:
Sample | SurvivalMonths | EventOccurred |
---|---|---|
1 | 20.4 | 0 |
2 | 5.6 | 1 |
3 | 12.2 | 1 |
[Top]Data layout two - start, event and follow up dates
Layout two has four columns, which must be titled 'Sample', 'StartDate', 'EventDate' and 'FollowUpDate'. 'StartDate' indicates the date from which survival time is measured and is obligatory for every sample specified. 'EventDate' indicates the date an event occurred (e.g. death or relapse), or is left blank if the event had not occurred when the patient was last seen. 'FollowUpDate' refers to the date when the patient was seen.
If both 'EventDate' and 'FollowUpDate' are specified, 'EventDate' takes precedence and the follow-up date is not used. For clarity, we recommend specifying only one of 'EventDate' and 'FollowUpDate' for each sample. An upload error will be given if any sample does not have either an 'EventDate' or a 'FollowUpDate'. 'EventDate' and 'FollowUpDate' cannot be earlier that 'StartDate'.
Please note that dates should be specified in yyyy-mm-dd format (i.e. ISO-8601 format). For example, 27th May 2011 would be specified as 2011-05-27. This is to ensure dates are specified unambiguously, rather than the region-specific dd-mm-yyyy or mm-dd-yyyy, for example. Excel and most spreadsheet programs are able to convert to this format.
Example input data:
Sample | StartDate | EventDate | FollowUpDate |
---|---|---|---|
1 | 1999-04-26 | 2004-02-13 | |
2 | 1999-07-02 | 2008-10-03 | |
3 | 1999-08-13 | 2007-01-03 |