Competition for test release on 28 Sep | bitgrit
Competition banner

Competition for test release on 28 Sep

Test
2 Participants
3 Submissions
Brief
yenta is a professional networking app launched by Japanese startup Atrae that uses artificial intelligence (AI) to optimize profile matching for its users. In hosting this competition, Atrae strives to improve the AI it uses on its platform to enable yenta users to make new, valuable connections and expand their networks. The goal of this competition is to optimize yenta’s matching algorithm by predicting the compatibility of two app users. This ensures that the app recommends the most relevant profiles to each user, and winning algorithms submitted for this competition may be used to optimize yenta’s profile recommendation algorithm. With your support, we aim to help people around the world make beneficial connections that last. MAKE NEW CONNECTIONS WITH YENTA Put simply, yenta is like Tinder for business professionals. Once you create a profile on the app, its native AI algorithm searches other users' profiles to find who you might be interested in and like to meet with. You swipe right on profiles that you're interested in, and if the other person is also interested and swipes right on your profile, you are able to message, meet up, and submit a review for each other. Go to https://yenta.page.link/install_fbg to download and learn more about yenta, the professional networking app featured in this competition. *The yenta app is currently only available in Japan and India. We encourage you to download the app to expand your network, get insider tips on new job opportunities, and gain insights that could give you an edge in this competition!
Prizes
  • 1st Prize ($ 10000)

  • 2nd Prize ($ 7500)

  • 3rd Prize ($ 2500)

Timeline
  • 28 Sep 2020 Start
  • 31 Oct 2020 End
Data Breakdown
The goal of this competition is to predict the level of compatibility of two given users to improve yenta’s profile recommendation algorithm. For this purpose, we classify the level of compatibility between user A and user B into 4 categories: • No Match = 0: At least one of either user A or user B swiped left on the other, meaning there is no possibility of match. • Match = 1: Both user A and user B swiped right on each other and matched. • Matched and met but unfavorable review = 2: Both user A and user B swiped right on each other and matched, then met. After the meeting, user A gave user B a review of 1-3 out of 5 (an “unfavorable” review). • Matched and met and favorable review = 3: Both user A and user B swiped right on each other other and matched, then met. After the meeting, user A gave user B a review of 4-5 out of 5 (a “favorable” review). To build this model, we provide subsets of data of 2 different types: user data and interaction data. Note that all of the data is anonymized through the use of IDs and multi-step vectorization models to ensure that user privacy is protected. IDs with low frequency are grouped into “other” categories with an ID of 999999. • User data: These files are connected through the user_id column (e.g. 41245) - user_ages.csv: User's age in years old - user_educations.csv: User's school and (in some cases) degrees - user_works.csv: Companies where user has worked along with information about the company's industry and size - user_skills.csv: User's professional skills - user_strengths.csv: User's strengths based on votes from other users after a meeting - user_purposes.csv: User's reasons for using the yenta app - user_self_intro_vectors_300dims.csv: User's vectorized profile biography (300 dimensions) - user_sessions.csv: User's app session logs • Interaction data: These files are indexed by from-to user_id pairs (e.g. 12345-52462) - interaction_review_comments_300dims.csv: User A’s vectorized opinion of user B after meeting (300 dimensions) - interaction_swipes.csv: User A’s swipe status of user B: 1 = right swipe (interested), -1 = left swipe (not interested) - interaction_review_strengths.csv: User A’s review of the strengths of user B after meeting • Train and test files: In order to train the model, we provide a train.csv file with pairs of user IDs and their corresponding scores - train.csv: Score between (from) user A and (to) user B - test.csv: A list of "from-to" IDs in the form of "userAid-userBid" that have to be predicted The solution file to be provided should follow the form below. The from-to IDs should be the same IDs contained in the test.csv file and they must be in the same order. • submission.csv from-to, score 6280229-6293525, 1 670384-50085, 2 2271906-4685859, 1 ... A few minutes after submitting your solution, you will be able to see the accuracy of your solution on the submission page over a subset of the test data. Final competition results will be based on the Private Leaderboard results, and the winner will be the person(s) at the top of the Private Leaderboard.
FAQs
Who do I contact if I need help regarding a competition?
If you have any inquiries, please contact us at [email protected]
Why is my score 0.00037?
The predictions on your submission.csv file should be formatted with at least one decimal (e.g. 0.0 instead of 0, 1.0 instead of 1) as stated in the Guidelines, otherwise the accuracy will not be correctly displayed. We apologize for this inconvenience, and we appreciate your understanding as we work on supporting competition solutions following different formats.
Rules
1. This competition is governed by the following Terms of Participation. Participants must agree to and comply with these Terms to participate. 2. A competition prize will be awarded after we have received, successfully executed, and confirmed the validity of both the code and the solution. Once winners are announced and our team reaches out to them, the winners must provide the following by November 10, 2020 in order to avoid disqualification. a. All source files required to preprocess the data b. All source files required to build, train and make predictions with the model using the processed data c. A requirements.txt (or equivalent) file indicating all the required libraries and their versions as needed d. A ReadMe file containing the following: • Clear and unambiguous instructions on how to reproduce the predictions from start to finish including data pre-processing, feature extraction, model training and predictions generation • Environment details regarding where the model was developed and trained, including OS, memory (RAM), disk space, CPU/GPU used, and any required environment configurations required to execute the code • Clear answers of the following questions: - Which data files are being used? - How are these files processed? - What is the algorithm used and what are its main hyperparameters? - Any other comments considered relevant to understanding and using the model In the event these items are not provided or do not meet the minimum requirements listed above, we will not be able to award the winner with their respective prize. 3. If two or more participants have the same score on the leaderboard, the participant who submitted the winning file first will be considered the winner. 4. The dataset used for this competition is derived from real-world data that has been anonymized, so please do not use any models developed utilizing this data on similar matching services. 5. If you have any inquiries about this competition, please don’t hesitate to reach out to us at [email protected]. We ask that users do not contact Atrae directly.
New Submission
Step 1
Upload or drop your file
Upload or drop your txt file here.
Your submission should be in .txt format.
Step 2
Description
Briefly describe your submission (400 characters or less)

You have exceeded the number of allowed submissions for this competition.

Thanks for your submission!

We'll send updates to your email. You can check your email and preferences here.
My Submissions
.