Brief
Welcome to The NASA Breath Diagnostics Challenge!
The National Aeronautics and Space Administration (NASA) Science Mission Directorate (SMD) seeks innovative solutions to improve the accuracy of its NASA E-Nose as a potential clinical tool that would measure the molecular composition of human breath to provide diagnostic results. We invite data scientists and AI experts to participate in this challenge, leveraging their expertise to develop a classification model that can accurately discriminate between the breath of COVID-positive and COVID-negative individuals, using data obtained from a recent clinical study. The total prize pool for this competition is $55,000.
The objective of this challenge is to develop a diagnostic model by using NASA E-Nose data gathered from exhaled breath of 63 volunteers in a COVID-19 study. Challenge participants will use advanced data preparation and AI techniques to overcome the limited sample size of subjects in the COVID-19 study. The innovative solutions emerging from this challenge may assist NASA in advancing the technical capability of the NASA E-Nose for a wide range of clinical applications relevant to human space exploration.
Note: The challenge is open exclusively to participants from FAR-designated countries. Individuals and teams from these countries are encouraged to register and participate.
Prizes
-
1st Place - $20,000
-
2nd Place - $12,000
-
3rd Place - $8,000
-
4th Place - $4,000
-
5th Place - $4,000
-
6th Place - $3,000
-
7th Place - $2,000
-
8th Place - $2,000
Timeline
- Competition Starts – July 5th, 2024
- Competition Ends – September 6th, 2024
- Winners Announcement – November 8th, 2024
Data Breakdown
IMPORTANT: This competition presents a unique structure to determine eligible models for winning. It is extremely important that participants read the rules carefully before continuing. In particular, section 9 should be very well understood before embarking in this challenge.
Welcome to The NASA Breath Diagnostics Challenge
The objective of this challenge is to develop a classification model that can accurately diagnose patients with COVID-19 based on data that was captured using NASA's own E-Nose device.
The total number of patients, and therefore examples, is 63. As you can probably tell, this is a very limited dataset, so making efficient use of the provided data is of extreme importance in this challenge. We encourage creativity on dealing with the data in order to make the best use of it.
Also for this reason, it is very important to understand the rules governing this event and how this will be ultimately scored.
The data consists of 63 txt files representing the 63 patients, numbered 1 to 63.
Each file contains the Patient ID, the COVID-19 Diagnosis Result (POSITIVE or NEGATIVE) and numeric measurements for 64 sensors D1 to D64. These sensors are evenly distributed inside the E-Nose device and they measure different biochemical signals that can be present in the breath of the patients.
All sensor data is indexed by a timestamp with the format Min:Sec, which represents the minute of the hour, and the second of that minute in which that sensor was sampled. The hour is left out, but when the minute counter resets, it means that the next hour has begun. Keep this in mind when working with this time axis.
In order to achieve maximum consistency across patients, the data was exposed to the E-Nose device using a pulsation bag that had previously collected a patient's breath. The E-Nose device also reads from an ambient air signal that can be used to normalize the exposed breaths.
The data was exposed to the E-Nose device for all patients using windows of exposure through the following process:
1. 5 min baseline measurement using ambient air
2. 1 min breath sample exposure and
measurement, using the filled breath bag
3. 2 min sensor “recovery” using ambient air
4. 1 min breath sample exposure and
measurement, using the filled breath bag
5. 2 min sensor “recovery” using ambient air
6. 1 min breath sample exposure and
measurement, using the filled breath bag
7. 2 min sensor “recovery” using ambient air
Total time = 14 mins
The data is distributed like this:
Train: 45 patients
Test: 18 patients
Within the dataset there are also 2 other files: submission_example.csv and train_test_split.csv. The first file represents how all submission files should look like. The values should be 0 for NEGATIVE and 1 for POSITIVE.
Failing to follow this format will result in error or lower score.
The second file (train_test_split.csv) represents which patient IDs are considered for Train (i.e., are labeled) and which one are for Test (not labeled).
The order of the predictions in the submission file should be the same as in the TEST indicated rows in the train_test_split_order.csv file.
The index column of this submission file is NOT the ID of the patient, but the order of values in the Result column should follow the one in the train_test_split_order.csv file. This is very important.
The evaluation metric is Accuracy.
The leaderboard will be split into a Public and Private leaderboard, where the preliminary results to advance to the final evaluation stage will be determined by the Private Leaderboard, which will be revealed at the end of the competition period.
Again, please refer to the rules, in particular sections 7 to 9 to understand the particular evaluation criteria for this challenge.
Please note that the goal of the Public Leaderboard will be mostly for reference, as it will represent only a very rough assesment of a model's performance. The final score may deviate substantially from this score.
Any attempt to try to "game" this score or artificially inflate it will not result in any benefit accounting for the final score and could end in disqualification if the model is found to be purposely overfitting this value.
We wish you good luck in this challenge. If there are questions, please refer to the FAQ, Rules or send an email to [email protected].
FAQs
Rules
1. This competition is governed by the following Terms of Participation. Participants must agree to and comply with these Terms to participate.
2. Users can make a maximum number of 2 submissions per day. If users want to submit new files after making 2 submissions in a day, they will have to wait until the following day to do so. Please keep this in mind when uploading a submission.csv file. Any attempt to circumvent stated limits will result in disqualification.
3. The use of external datasets is strictly forbidden.
4. It is not allowed to upload the competition dataset to other websites. Users who do not comply with this rule will be immediately disqualified.
5. The final submission has to be selected manually before the end of the competition (you can select up to 2), or else it will be selected automatically based on your highest public score.
6. If at the time of the end of the competition two or more participants have the same score on the private leaderboard, the participant who submitted the winning file first will be considered for the following review stage.
7. A competition prize will be awarded after we have received, successfully executed, confirmed the validity of both the code and the solution (see 8), and calculated the final challenge score (see 9).
Once the competition period ends, our team will reach out to top scorers based on the Private Leaderboard score, which will be revealed at this point. Top scorers will be asked to provide the following information by September 16th, 2024 to be qualified for the final review stage, Failure to provide this information may result in disqualification.
a. All source files required to preprocess the data
b. All source files required to build, train and make predictions with the model using the processed data
c. A requirements.txt (or equivalent) file indicating all the required libraries and their versions as needed
d. A ReadMe file containing the following:
• Clear and unambiguous instructions on how to reproduce the predictions from start to finish including data pre-processing, feature extraction, model training, and predictions generation
• Environment details regarding where the model was developed and trained, including OS, memory (RAM), disk space, CPU/GPU used, and any required environment configurations required to execute the code
• Clear answers to the following questions:
- Which data files are being used?
- How are these files processed?
- What is the algorithm used and what are its main hyperparameters?
- Any other comments considered relevant to understanding and using the model
8. The submitted solution should be able to generate exactly the same output that gives the corresponding score on the leaderboard. If the score obtained from the code is different from what’s shown on the leaderboard, the new score will be used for the final rankings unless a logical explanation is provided. Please make sure to set the seed or random state appropriately so we can obtain the same result from your code.
9. Given the particularly small size of the data for this competition. Additional measures are to be put in place in order to be considered elegible for winning a prize.
To be considered a winner in this competition, the following criteria should be met, in order:
1. All the required information should have been provided. (see 7)
2. The scores in the private and public leaderboard should be reproducible (see 8)
3. bitgrit and NASA team will perform internal scoring Cross Validation and similar experiments with the provided model, using the same test size. If the results are inconsistent with the score on the leaderboard (>10% difference between the Overall Score (Private + Public Scores) and the Internal Score, then only the Internal Score will be considered for as Final Score. If the results are consistent, then the Final Score will be calculated as the average between the Overall Score and the Internal Score.
4. The final ranking will be elaborated using the Final Score, and the winners will be awarded per this ranking.
10. In order to be eligible for the prize, the competition winner must agree to transfer to the Host and the relevant transferee of rights in such Competition all transferable rights, such as copyrights, rights to obtain patents and know-how, etc. in and to all analysis and prediction results, reports, analysis and prediction model, algorithm, source code and documentation for the model reproducibility, etc., and the Submissions contained in the Final Submissions.
11. Any prize awards are subject to eligibility verification and compliance with these Terms of Participation. All decisions of bitgrit will be final and binding on all matters relating to this Competition.
12. Payments to winners may be subject to local, state, federal and foreign tax reporting and withholding requirements.
13. If you have any inquiries about this competition, please don’t hesitate to reach out to us at [email protected].
Thanks for your submission!
We'll send updates to your email. You can check your email and preferences here.
My Submissions
Non-Disclosure Agreement (NDA)
An agreement to not reveal the information shared regarding this competition to others.
- This Non-Disclosure Agreement (“Agreement”) is hereby entered into on 25th April 2026 (“Effective Date”) between you (“Participant”), as a participant in the The NASA Breath Diagnostics Challenge (the “Competition”) hosted at bitgrit.net (the “Competition Site”), and bitgrit Inc. (“Bitgrit”).
- Purpose: This Agreement aims to protect information disclosed by Bitgrit to Participant (the “Purpose”).
- Confidential Information: (1) Confidential Information shall mean any and all information disclosed by Bitgrit to the Participant with regard to the entry and participation in the Competition, including (i) metadata, source code, object code, firmware etc. and, in addition to these, (ii) analytes, compilations or any other deliverable produced by the Participant in which such disclosed information is utilized or reflected. (2) Confidential Information shall not include information which; (a) is now or hereafter becomes, through no act or omission on the Participant, generally known or available to the public, or, in the present or into the future, enters the public domain through no act or omission by the Participant; (b) is acquired by the Participant before receiving such information from Bitgrit and such acquisition was without restriction as to the use or disclosure of the same; (c) is hereafter rightfully furnished to the participant by a third party, without restriction as to use or disclosure of the same.
- Non-Disclosure Obligation: The Participant agrees: (a) to hold Confidential Information in strict confidence; (b) to exercise at least the same care in protecting Confidential Information from disclosure as the party uses with regard to its own confidential information; (c) not use any Confidential Information except for as it concerns the Purpose elaborated upon above; (d) not disclose such Confidential Information to third parties; (e) to inform Bitgrit if it becomes aware of an unauthorized disclosure of Confidential Information.
- No Warranty: All Confidential Information is provided “as is.” None of the Confidential Information shall contain any representation, warranty, assurance, or integrity by Bitgrit to the Participant of any kind.
- No Granting of Rights: The Participant agrees that nothing contained in this Agreement shall be construed as conferring, transferring or granting any rights to the Participant, by license or otherwise, to use any of the Confidential Information.
- No Assignment: Participant shall not assign, transfer or otherwise dispose of this Agreement or any of its rights, interest or obligations hereunder without the prior written consent of Bitgrit.
- Injunctive Relief: In the event of a breach or the possibility of breach of this Agreement by the Participant, in addition to any remedies otherwise available, Bitgrit shall be entitled to seek injunctive relief or equitable relief, as well as monetary damages.
- Return/Destruction of the Confidential Information: (1) On the request of Bitgrit, the Participant shall promptly, in a manner specified by Bitgrit, return or destroy the Confidential Information along with any copies of said information. (2) Bitgrit may request the Participant to submit documentation to confirm the destruction of said Confidential Information to Bitgrit in the event that Bitgrit requests the Participant to destroy this Confidential Information, pursuant to the provision of the preceding paragraph.
- Term: The obligations with respect to the Confidential Information under this Agreement shall survive for a period of three (3) years after the effective date. Provided however, if the Confidential Information could be considered to fall under the category of “Trade Secret” of Bitgrit or any related third parties, this Agreement is to remain effective relative to that information for as far as the said information is regarded as Trade Secret under applicable laws and regulations. If the Confidential Information contains personal information, the terms of this Agreement shall remain effective on that information permanently.
- Governing Law: This Agreement shall be governed by and construed and interpreted under the laws of Japan without reference to its principles governing conflicts of laws.
Terms & Conditions
Competition Unavailable
Login
Please login to access this page
Join our newsletter
Our team releases a useful and informative newsletter every month. Subscribe to get it delivered straight into your inbox!
bitgrit will be your one stop shop for all
your AI solution needs
- Japan Office
- +81 3 6671 8256
-
Koganei Building 4th Floor,
3-4-3 Kami-Meguro,
Meguro City, Tokyo, Japan - UAE Office
-
DD-14-122-070, WeWork Hub 71 Al Khatem Tower,
ADGM Square Al Maryah Island, Abu Dhabi, UAE