[Part-1]Data science should recognize the complexities hidden behind real data, acknowledging the life and responsibilities that these datasets represent
Consider this scientific paper as an example to illustrate that many theoretical methods need to make practical compromises when facing real-world data.
No one can claim to know the truth, just as no one can determine what a "normal" gene is. We merely take the common and stable baseline that satisfies the Law of Large Numbers as a reference, calling it "normal," while those that differ from it are termed "variants". Variants often appear as "outliers" in the data, but each outlier hides countless truths.
This paper describes a problem that can be translated into the mapping relationship between 61.2 million gene iterations and 19 types of cancer.
1. If possible, consider forming a feature spectrum for gene characteristics and cancer traits. The feature spectrum, for genes or traits, is a summary of a series of information. These feature spectrums enable us to understand and describe the inherent structure of data more accurately. Choosing the right features requires a deep understanding and professional knowledge of the field. After selecting the features, we map them to a high-dimensional space to form the feature spectrum. If conditions permit, we can construct feature spectrums of various types and levels, eliminating secondary factors to improve the interpretability of the model. Once the model is established and refined, we can increase its dimensions, continually expanding the model's scale through tuning. This process may be repetitive, adding new dimensions each time, adjusting and optimizing the model to ensure its performance and effectiveness.
However, we must be cautious not to inadvertently discard crucial biological signals as noise. Statistical methods should be used to ensure that discarded information does not affect the performance of the model.
......
(To be continuous)
xiaowen kang 2023.7.18.4.57 KangV1
1. Dietlein, F. et al. Genome-wide analysis of somatic noncoding mutation patterns in cancer. Science 376, eabg5601 (2022).
https://lnkd.in/ex4B-ucG
Marketing Manager & Editor: PharmiWeb.Jobs - Global Life Science Job Board
2wAbout #Fortea: Fortrea is known for providing customized clinical trial delivery solutions globally, using a consultative approach to meet individual sponsor needs. It's recognized for scientific excellence, innovation in clinical development, and a commitment to pushing the boundaries of what's possible in clinical trials.