Figure 3. Variations of the token probability normalization. (A) Strategies for probability normalization at the step of the individual token probability retrieval (1) and final calculation of the total gene probability based on the tokens within its name (2). (B) Distribution of token lengths for protein-coding genes, for which the therapeutics are available (“is known target”) and not (“not known target”). (C) Validation metrics for the approaches of gene tokens normalization in the target identification task.