Week 1 - Bioinformatics Focus - Gene-Disease Association Prediction

Project Scoping and Disease Selection

Objective: Define the project scope and select a single target disease for gene-disease association prediction.

Starting point: General project idea of predicting gene-disease associations using OpenTargets and STRING database data. Please check the Machine Learning Focus - Gene-Disease Association Prediction post to see detils of the initial data being used by the ML teams.

Tasks:

  • Select one specific disease for prediction:

    • Research disease categories in OpenTargets
    • Analyze data availability and quality for different diseases
    • Choose one disease based on criteria such as data abundance, research interest, and potential impact
    • Document selection criteria and rationale
  • Analyze data characteristics for the chosen disease:

    • Investigate the number of known gene associations
    • Assess the quality and completeness of available data
    • Identify any unique features or challenges associated with the selected disease
  • Define project scope:

    • Specify the exact prediction task (e.g., binary classification of gene-disease associations)
    • Outline the potential impact and applications of accurate predictions for the chosen disease
  • Create project documentation:

    • Write a detailed project objective
    • Justify the choice of the target disease
    • Specify evaluation criteria for the prediction task
    • Document relevant data sources (OpenTargets and STRING)

Expected outcome:

  • Selection of one target disease for prediction, with detailed justification
  • Analysis of data characteristics for the chosen disease
  • Clear project scope and objectives
  • Detailed project documentation including rationale for disease selection, objectives, and relevant data sources
1 Like

@Sanzida @Savann @moneuron @Thuraya_Ayman @hahaharsini @Moh_Saiger

Are you available to meet with Anya this weekend to discuss your findings? If you have a preference, please let us know below (pick all time slots that may work):

  • Sat (08/03) morning Pacific
  • Sat afternoon Pacific
  • Sat evening Pacific
  • Sun morning Pacific
  • Sun afternoon Pacific
  • Sun evening Pacific
0 voters

Please make sure to come prepared. We will use OpenTarget and STRINGdb for bulk data. OMIM does not permit scraping and requires a license for API, but we can get useful targeted data.

Meeting is an open one, all are welcome to attend.

Confirming Saturday 9am Pacific Meeting Time for the Bioinformatics Session

  • Please take a look at the tasks and come prepared for an interactive meeting.
  • If you are interested in this task but are not available for the Saturday meeting, please reply to this topic with project proposal and/or questions.
  • Participation in one of the above is mandatory to be part of the Bioinformatics subteam.

Zoom Link: Launch Meeting - Zoom

Check out the ML meeting times as well. The ML meeting for Sunday will be led by Sam and cover initial analysis of the bioinformics data.

Meeting Minutes (3 August 2024)

Project Definition IMG_52C8B67AABC5-1

Project Scope

  • Level 1: develop binary classification model focusing on only one disease (diabetes mellitus or breast cancer)
  • Level 2: integrate score/confidence level based on how much and type of evidence from OpenTargets and StringDB
  • Level 3: generalize model to work with multiple diseases

NOTE: you may need to explore StringDB’s API (Access - STRING functional protein association networks) to efficiently generate PPI networks for multiple diseases

PPI = protein-protein interaction

3 Likes