IEEE International Conference on Data Mining
ICDM 2008 Data Mining Contest:
Radioxenon monitoring for verification of the Comprehensive nuclear-Test-Ban Treaty
Instructions for Participants
Winner of the Contest CrownThe team ofWei Fan1, ErHeng Zhong2, Sihong Xie2, Yuzhao Huang2, Kun Zhang3, Jing Peng4, and Jiangtao Ren1 1) IBM T. J. Watson Research Center 2) Sun Yat-Sen University 3) Xavier University of Louisiana 4) Montclair State University Winner of the most muscularZhongfeng ZhangInstitute of Automation, Chinese Academy of Sciences The Kangaroo Prize was postponed/cancelled due to an anomaly in the data set. The ICDM Data Mining Contest 2008 is now officially over. However, we do encourage you to participate in the tasks, or otherwise explore and mine the data. If you do plan on publishing any results, please let us know. | ||
1. General
Description of the Problem:
Compliance verification of the Comprehensive Nuclear-Test-Ban Treaty
(CTBT) will employ the remote detection and measurement of radioactive forms of
a noble gas, xenon, called radioxenon that is potentially
emitted from the site of a nuclear explosion.
Specifically, four radioxenon isotopes,
Xe-131m, Xe-133m, Xe-133, and Xe-135, are measured in a procedure called
radionuclide monitoring. Different relative combinations of these isotopes
correspond to signatures that can be associated with distinct sources such as
nuclear power plants, medical isotope production facilities, or various types
of weapons.
In the first few weeks after an
explosion, the relative concentrations of the four isotopes are expected to be
released in “fingerprint” relative concentrations quite distinct from
background sources of radioxenon. The problem of
attributing a specific observation of airborne concentrations of radioxenon to an explosion is twofold. Firstly, since the CTBT stations are not
located at the source of the explosion, the radioxenon
is detected at a location which can be well over a thousand kilometres away.
This atmospheric transport process can take weeks, thus degrading the
distinctness of signature through radioactive decay and lessening the
likelihood of detecting one or more of the radioxenon
isotopes at all. Secondly, one can never
observe radioxenons emitted purely from an explosion
source but admixtures of this gas with the radioxenons
released from all other background sources.
The problem set
to the contestants is to devise the means to distinguish between those radioxenon measurements that are due purely to normal
environmental emissions or background (B) from those measurements that contain
the signature of an explosion combined background (B+E).
2.1 Contestant’s Package
In addition to these instructions, the contestant’s package includes:
2.2 Description of
the data sets provided:
Two files containing the contest data are provided. One file contains the training data in which the class of each datum, B or B+E is labelled. A second file contains a test data set containing instances of both classes but without B or B+E labels. In both file types, an alpha numeric index is provided for each case. The numeric portion traces the background measurement or background measurement combined with synthesized explosion observation used to create the datum. The alpha portion of the code refers to one of 5 real-world measurement sites that have been collecting measurements of radioxenon concentrations daily for an extended period of time and to the qualitative degree of complexity of the background radioxenon observed at these sites. Hence, the alpha codes qualitatively rank the sites’ radioxenon background in order of increasing complexity is from V to W to X to Y with location Z being particularly complex with respect to the 4 other sites.
The 6 contest data column headers are explained as follows:
a) The first column contains an index comprised of a station code and the scenario numeric index. (see above).
b) A second column identifying the type of datum - background (B) or a combined background and simulated explosion signal (B+E). This column is blank in the test data set.
c) A final 4 columns, one for each of the activity concentrations associated with the index, namely, Xe-133, Xe-133m, Xe-135, Xe-131m.

3. Contest
Reporting:
A reporting template is provided and it and its use are described in Appendix 2. For each final result of a task and task test, it is essential the contestants provide a filled electronic template the allow the contest evaluators to consider their work fully.
3.1 Point of Contact for Completed Templates:
Please send all completed templates (maximum 5 MB email) to jing_yi@hc-sc.gc.ca (underscore “_” between Jing and Yi).
4. Contest Tasks:
The primary goal of this contest is to produce
methods that are broadly applicable over different station background
measurement distributions and explosion source hypotheses. The best methods will also have a very
efficient learning curve in terms of the amount of data required to
successfully tune the classifier.
Recognition will also be given to methods more proficient in properly
categorizing data arising from specific classes of explosion release hypotheses
or station background types, because these methods add a forensic or diagnostic
dimension to the classifier that may not be evident in the overall best
classifier.
Software will be provided (see Appendix 1) to the contestants to calculate their relative degree of success at each task in terms of a number of characteristics (for example, numbers of false positives) and figures of merit (for example, % accuracy or Area Under Curve for Receiver Operating Curve). Area Under Curve (AUC) will be used as the primary figure of merit by the evaluators to judge success in conjunction with consideration of ease of tuning the methods and balance of performance over a range sub-cases of explosion radioxenon emission scenarios.
Contestants may opt to do one, some or all of the tasks but their full participation is encouraged.
The tasks are as follows:
Task 1: The first task is to classify, as accurately as possible, the results as
B or B+E over the entire set of stations (V, W, X, Y and Z) with one classifier.
Contestants may combine data as they see fit.
They may separately tune classifier parameters for each station but they
may not have separate classifier parameter types for each station nor separate
classifiers. Contestants can to report
on more than one classifier for this task if they so choose.
Task 2: The second task is to
classify, as accurately as possible, the results as B or B+E with an optimal
algorithm for each station (V, W, X, Y and Z) given.
Task 3: The third task
is to apply classifiers developed in Tasks 1 and 2 to assist the panel of
evaluators assess the contestant’s methods. The contestants will apply their
methods to a second unlabelled data set. Furthermore, they will reprocess the first
data set under the prescribed conditions described below to allow consideration
of balance of performance (as below).
4.1 Task 3, Detailed
Tests:
Contestants are requested to run there classifiers developed in the in tasks 1 and 2 in a manner prescribed to assist the panel evaluators assess their submissions in detail. For each test trial, the attached template must be completed as a full report. Instructions on use of the template are included in Appendix 2
4.1.1 Test 1
The
contestants will classify a second unlabelled data set comprised of similar
data as used for the development of their classifiers. This test ensures the performance of the
methods for similar concentrations of radioxenon
sampled from the same distribution as the training set but not necessarily the
same proportions of B and B+E cases. A
report template must be provided for each classification run including the
methods provided for individual station types.
4.1.2 Test 2
The labelled training data sets employed by
the contestants may be the complete labelled data set and station subsets
provided by the contest sponsors or training sets created by the contestants
themselves using combinations and sub-sampling. For the actual data used to
develop their classifiers, the contestants are requested to calculate the AUC
for their classifiers for tuning with 20% of their employed training data set:
then similarly for 40%, 60%, 80%, and 100%.
This test examines the efficiency in use of data required to tune the
classifiers. Of course, recognition will
be given to contestants who employ relative relatively small subsets of the training
data provided in the first instance to develop their classifiers. A report
template must be provided for each classification run including the methods
provided for individual station types.
4.1.3 Test 3
For the classifier developed under Task 1 only, the contestants are asked to provide results for a factorial analysis of the sensitivity of the effectiveness of their method to small changes in their parameter values. Hence, the contestants are requested to provide results for 2n trials where “n” is the number of numerical parameters used by their methods. Parameter jumps on the order of 10% are requested where, as appropriate and as judged by the contestant, it is:
For example, if the classifier has 3 numerical parameters, 23 or 8 trials are needed for all possible combinations of high and low parameters. The trials to be conducted are, therefore:
|
|
Parameter 1 |
Parameter 2 |
Parameter 3 |
|
Trial 1 |
+10% |
+10% |
+10% |
|
Trial 2 |
+10% |
+10% |
-10% |
|
Trial 3 |
+10% |
-10% |
+10% |
|
Trial 4 |
-10% |
+10% |
+10% |
|
Trial 5 |
-10% |
-10% |
+10% |
|
Trial 6 |
-10% |
+10% |
-10% |
|
Trial 7 |
+10% |
-10% |
-10% |
|
Trial 8 |
-10% |
-10% |
-10% |
For Task 1 classifiers employing station specific tuning, It is requested that this analysis is employed for at least two station types.
All contestants are asked to write a
short paper (approximately 4 pages) describing your method and results.
The paper can be as short as 2 pages or can be up to a maximum of 6 pages.
Please email your paper to
Trevor_Stocki@hc-sc.gc.ca by Nov 20th.
Please use the same format as the conference.
Please use the "camera ready format". See
here
for more details.
Do not use pdf express for this. Please send the LaTex, MS-Word document
or PDF file.
Please also note that is not a double blind
submission, so please put the author's names and institutions on the paper. Also
put the corresponding author's contact information.
All the papers will be collected and distributed at
the conference as well as made available through our Website.
6. Contest Prizes
6.1 Contest Crown:
For classifier judged to have best over all performance
6.2 Most Muscular:
For classifiers with highest AUC scores for the full labelled data set and by station.
6.3 Kangaroo Prizes:
For classifiers that are talented in unusual or unexpected respects.
Appendix 1:
Installing and using the provided evaluation software.
Installation
This software is known to work with windows XP, it should work with other versions of windows, but has not been tested. The software is provided in the file named evalualtor.zip. Please note, the Setup.exe program in the tool needs the activeX installed in advance because the tool needs the activeX for charting. The activeX is included in the package.
Installation Instructions:
1. unzip
Unzip the compressed package, which contains all necessary executable files and
sample data files. The following should be done first if needed.
2. Install ActiveX control(MSCHRT20.OCX)
a) Click the start button.
b) Click Run.
c) The run window will pop up. Click browse.
d) Go to the directory in which unzip deposited all the files;
e) Type the following in the “open” field, thereby running the activeX installation:
regsvr32 MSCHRT20.OCX
3. Setup
a) Run Setup.exe in the unziped package for installing the tool
b) Follow the usual setup menus.
Using
the Software
This software has included with it a few example files. In the following instructions these example files will be used to show you how to use the software to calculate an ROC curve and AUC. Below is an image of what your window should look like after running the example.

Steps:
1. You will need to generate two files from
your classification software. One with the actual values
of the training data and one with the predicted values from the model. How to generate these files
is explained in the next section below.
2. Start the program by double clicking on evaluation.application.
3. Input the filename which contains actual values
of test data by typing it into the ‘Actual
Values field’ or by clicking the ‘...’ button
to browse. (true.txt,
for our example, in the directory).
4. Input the filename for the predicted
results by typing it into the
‘Actual Values field’ or by clicking
the ‘...’ button to browse. (predicted1-DTJ48.txt, for our example, in the directory)
5. Select the predicted results in the list
box for evaluation
6. Click Evaluation
Then the related roc
curves (2dXY) will be displayed in a window (please handle a bug for color:
evaluate one first, and then the multiple selections are done next). The
results are displayed in a text box. You
can cut and paste these results into the template. These results are output to
the related files, which are explained below.
Formats for two
inputs
The predicted results
can be one of the two formats, and the delimiters between columns can be tab or
comma or space. Also, the tool can automatically sort the probabilities for
true positive cases.
#case TPProbability
xxxx 0.####
xxxx 0.####
...
or by leaving the TPProbability column empty
#case predClass
xxxx xx
xxxx xx
...
The format for the
file that contains the actual values is only required to be the second format,
but in this case, it is the actual class not the predicted class.
Output Files
ResultsEvals.txt
This file contains
the evaluation results according to the result files from all contestants;
xxxx_roc_curve_points.txt
This file contains
the roc curve points for each predicted result of contestant for the general
purpose;
For example,
x y prediction (not use)
xxx xxxx 0.#####
xxx xxxx 0.#####
xxxx_roc_curve.gnu contains the
definition of the roc curve points, which is used by gnuplot.
This software will
give you the AUC for a data set. You will need to segment your data sets
properly in order to fill in the template with the appropriate values of
AUC. You will also need to make listings
of the data for the template as well. This software will not do this.
If you have any
questions about this software please contact Trevor Stocki
at Trevor_Stocki@hc-sc.gc.ca .
Only solutions entered in the data entry template will be evaluated and considered by the panel evaluators. Each data classification attempt will require submission of the data entry template. The template has 4 sections labelled in blue: Biographical Information, Model Development Data, Analysis Software Results, and Raw Classifier Results. A brief description of each the major section follows:
Biographical Information Section
This section is used to report information: team members, contact information, task number, data set used. Additionally, there are field to enter the algorithm name and description. Information on the algorithm should be of sufficient detail that it is possible for the panel to completely reconstruct the team’s results.

Model Development Data Section
This section contains the indices and associated xenon concentrations that were used in the development of the algorithm. An example of how a single datum is entered is shown below.

Analysis Software Results
By using the software tool provided (see Appendix 1), algorithm performance will be measured and assessed. All performance information from the software tool is entered into this section.

Raw Classifier Results
The results of your team’s algorithm should be the classification of the datum points provided into background and background + explosion classes. This section should contain the indices and associated xenon concentrations as classified by your algorithm under the appropriate heading. You will have to insert rows to accommodate all datum points.
Template Instructions:
Organization
ACME with John Smith as the contact person is participating in ICDM. His team is using the training data set, and
working on Task 2, station W, would use the following filename:
John_Smith_ACME_Task2_StnW_Training.xls
The same
organization is now working on Task 3, Test 3, using the training data set, and has a 3 parameter model (Parameter 1 +10%(I
for increased), Parameter 2 -10%(D for decreased), Parameter 3 Normal (N for
normal)), would save the template as:
John_Smith_ACME_Task3_IDN_Training.xls
Or more explicitly,
Contact Name_Organisation_Task#_Parameter States_Data Set Name.xls
The same organization is
now working on Task 3, Test 2, using the training data set, and 40% of the data
would save the template as:
John_Smith_ACME_Task3_Training_40.xls
Or more explicitly,
Contact Name_Organisation_Task#_Data Set Name_% of Data used.xls
Specific details on reporting on task 3.
1) For test 1 of task 3, please report the probabilities and we will
calculate the AUCs.
2) For test 2 of task 3,
a) please start the tuning of your classifiers
from scratch
b) please use X (where X is 20%,
40%, 60%, 80%, and 100%) of the labelled data set
to tune your classifier,
c) then please run your
classifier on the entire unlabelled data and
d) report the results from the
unlabelled data set in the reporting template.
Note you should generate a reporting template for each of the percentages
(IE 5 templates in this test).
3) For test 3 of under task 3 (it is actually task 1) please do all the
work with the labelled data set and
report your AUCs as tasks 1 and 2.