Effects of Redundancy on Mutation Rates

# Effects of Redundancy on Mutation Rates

We now consider in more detail the assumptions behind the analysis made in the article The Mutation Problem. The calculations made in that article assume the standard model of population genetics in which the effects of various harmful mutations on fitness are independent. We now consider whether relaxing this assumption can permit populations to endure a larger number of mutations and thereby remove the difficulty from the theory of evolution.

Suppose a population has an average of n harmful mutations per individual. Suppose N is the total number of base pairs. Assume N >> n (N is much larger than n). Then the probability of a harmful mutation at a particular base pair will be n/N and the chance of no harmful mutation will be 1 - n/N, which is nearly 1. Now, the number of mutations per individual will be a binomial distribution. Let p be n/N and q be 1 - n/N; then the mean is Np which is n, and the standard deviation is sqrt(Npq) which is about sqrt(n).

Now, let us assume that the genome has a large amount of redundancy. Suppose that there is only one gene and over 400 copies of it (an extreme case, for purposes of analysis). To make the analysis even simpler, we assume that this gene has only one base pair. Suppose the average number of genes with harmful mutations per individual is 400. Then the standard deviation is 20. The chance of an individual having over 420 corrupted genes is about 0.15 and the chance of having over 440 mutations is about 0.025, or 1/40. Suppose that an individual survives if at least 439 copies of this gene are uncorrupted, but if all 440 copies have a harmful mutation, the individual will die. Then about 1/40 of the population will die, and each such individual has 40 extra harmful mutations. In this way, the death of one individual removes an excess of 40 harmful mutations from the population. This means that the population can endure one mutation per generation at equilibrium with only 1/40 of the zygotes dying due to harmful mutations. So we obtain a considerable improvement in this way. In fact, it may be that organisms in harsh environments make use of this mechanism to improve their ability to endure harmful mutations.

Suppose all individuals with more than 400 harmful mutations die. This means that half of the zygotes die due to harmful mutations, and may be enough of a handicap to cause the species to become extinct. Then the average number of mutations in the individuals remaining is about 400 - .8 * 20, or 400 - 16. The population can then endure 16 mutations per generation with a death rate of 50 percent.

We justify this figure as follows: Let phi(x) be the normal distribution with mean zero and standard deviation one. Then phi(x) is 1/sqrt(2 pi) e ^ (x * x / 2). If we eliminate all individuals with x > 0, then the average of the remainder of the distribution is

I(x phi(x)) / I(phi(x))
where I is integration from negative infinity to zero. Now, I(phi(x)) is 1/2 and I(x phi(x)) is -phi(x). Thus the average is -phi(0)/(1/2) or -2 phi(0) or -2/sqrt(2 pi), which is about -.8. For a general normal (or binomial) distribution, the average will be -.8 times the standard deviation, or, -.8 sqrt(n). Thus a population can tolerate .8 * 20 or 16 mutations per generation as stated above with half of the zygotes dying due to genetic causes.

Now, suppose that there are many groups of genes having high redundancy as above. Suppose for example that there are M groups of genes in which at least 439 must be undamaged for the individual to survive. Suppose that n is such that each group has an average of 400 harmful mutations. Then the chance for a given group to have 440 or more mutations is about 1/40, so the chance for a group not to have this many mutations is 1 - 1/40, and the chance that no group has 440 mutations is (1 - 1/40) ^ M. Suppose M is such that this quantity is about 1/2 (that is, M is about 27). Each individual that dies has about 440 mutations. Since half of the zygotes die, it follows that the remainder have an average of 360 mutations. Thus the population can tolerate 40 mutations per generation with half of the zygotes dying each generation. So we obtain a large tolerance to mutations in this way.

With numbers smaller than 400, of course, the ability to endure high rates of mutation is much less. Even a redundancy of 10 would seem extreme, since it would mean that a human would need only 10,000 genes (of 100,000) to survive. Suppose there are many groups of genes, and in each group, there are an average of 10 harmful mutations. Each group has more than 20 genes in all. If there are very many groups, then the chance is large that one of them will have a number of mutations much larger than average. For example, if there are 10,000 groups, then at least one of them will probably have a number of mutations which is 3.5 standard deviations more than the average. The standard deviation is sqrt(10) or about 3, so the chance is high that one group will have 20 or more mutations. Suppose that the organism dies if any of these groups of 20 or more genes has more than 20 genes damaged by mutations. If there are enough groups so that about half of the individuals die, then the population can endure over 20 mutations per generation.

We now give tighter bounds on mutation rates possible with redundancy. Suppose that humans have 100,000 genes and 10-fold redundancy. Let p be the chance that a gene will have a harmful mutation which destroys its function. The chance that a given group of 10 will be non functional is p^10. The chance that it will be functional is 1 - p^10. The chance that all 10,000 groups will be functional is (1 - p^10)^10,000. The optimum rate of mutation is reached when (1 - p^10)^10,000 is about 1/2. If it is much larger than this, too many zygotes will die. If it is much smaller, then too few mutations will be removed by deaths. Thus the optimum rate of mutation is reached when p^10 is about 7/100,000, which means that p is about .38. At this rate, each group of 10 will have an average of 4 mutated genes, and thus the excess over average of those that are nonfunctional is only 6. So we can expect at most 6 mutations per generation at equilibrium without the population dying out. (We give a better figure below, which is slightly less.) This also implies that an individual with about 38,000 genes randomly destroyed has about a 50 percent chance of surviving. This is hard to believe!

For 5 fold redundancy, there will be 20,000 groups of 5. The optimum is reached when (1 - p^5) ^ 20,000 is about 0.5, or, p^5 is about .000035. Then p is about .13. The group of 5 will have an average then of .65 mutations, and the population will tolerate 4.35 mutations per generation at equilibrium. (The true figure is less, as shown below.) This implies that an individual with about 13,000 genes randomly destroyed has about a 50 percent chance of surviving. This is also very hard to believe.

For 2 fold redundancy, there will be 50,000 groups of 2. The optimum is reached when (1 - p^2) ^ 50,000 is about 0.5, or, p is about 0.004. Thus at equilibrium we obtain about 2 mutations per generation at the most. (The true value is about 1.4, as we will show.) A 2 or 3 fold redundancy seems to be about the most we can hope for in general. This implies that an individual with 400 genes randomly chosen and destroyed will have a 50 percent chance to survive. Even this is hard to believe.

We now give a more detailed calculation. Let us say a redundant group of genes (say, a group of 10) is defective if all the genes are mutated and functional if at least one gene is free from a harmful mutation. We assume that half of the zygotes die due to harmful mutation. If each group has M genes, then there are 100,000/M groups, and the chance that a group is defective is about .7 * (M/100,000). The chance that a gene has a harmful mutation (p) is about (.7 * (M/100,000))^(1/M). The average number of mutated genes per group is then about M * (.7 * (M/100,000))^(1/M). Let us call this quantity F, which depends on M.

Among all individuals, half of them have at least one defective group. Of the individuals which die, all of them have at least one defective group. Some of them have more than one defective group. A detailed calculation shows that the average number of defective groups among individuals which die due to genetic causes is about 1.4. Let F' be the average number of defective genes in functional groups. Then F' is slightly less than F. The average number of defective genes in an individual is then (100,000/M)*F' + .7 * (M - F'); the average for individuals which die is (100,000/M)*F' + 1.4 * (M - F') and the average for individuals which live is (100,000/M)*F'. After the defective individuals die, the average of the remaining individuals is (100,000/M)*F', which has been reduced by .7 * (M - F'). To maintain equilibrium, this number of mutations must be added each generation. Thus the population can bear at most .7 * (M - F') mutations per generation at a 50 percent death rate.

For M = 10, F (and F') are about 4, so we obtain 4.2 mutations per generation as the most the population can bear at this death rate. For M = 5, F' is about .65, so we obtain .7 * (5 - .65) or slightly over 3 mutations per generation. For M = 3, we obtain about 2.1 mutations per generation, and for M = 2, we obtain about 1.4. For M = 1, we obtain about .7. It turns out that M = 1 is the most efficient in terms of enabling the population to endure a high rate of mutation.

This suggests that 2 or 3 mutations per generation is about all that a species can endure. However, these scenarios are unreasonable for several additional reasons, meaning that the true rate must be significantly smaller. First, whenever a gene is damaged, there are usually some harmful effects to the organism. (This is a consequence even of the estimate that 9/10 of the mutations which change an amino acid are harmful.) Other genes can usually only partly make up for a damaged one. Also, genes have multiple effects, and each effect may be part of a different group of redundancies, which would tend to smooth out the effects of harmful mutations. Furthermore, many single mutations are known that have harmful effects. Genes thus affected cannot be part of such redundant groups.

Actually, it would not make sense for the genome to have 10-fold redundancy. This could only increase the acceptable mutation rate to about 4.2 per generation, as shown above. It would make more sense just to have one copy of each gene and a genome one-tenth the size. This would effectively reduce the mutation rate by a factor of 10, significantly more efficient than having 10-fold redundancy. The reason is that with less genetic material, there will be fewer total mutations at a given rate of mutation. The same argument applies to 3-fold or 2-fold redundancy. So it makes sense from the standpoint of survival to reduce redundancy in the genome as much as possible. This is another argument that the genome has very little redundancy, and therefore high rates of mutation cannot be tolerated.

Of course, some genes may be redundant. If there are only a small number, then their effect on the mutation rate will be small. In addition, there is another problem with assuming many redundant genes. Based on observation, human genes have a mutation rate of about 1 in 25,000 per generation. This is based on the rates of occurrence of genetic diseases. If these diseases derive from the failure of a group of 3 redundant genes, then each gene in the group would have to have a mutation frequency of the cube root of 25,000 in the population, which is about one in 30 individuals! If these genes were recessive, the rate would have to be the square root of 30, which is about one in 5.5. I am not aware that any harmful mutations are this common in the human population. In general, at equilibrium genetic diseases based on a redundant group of genes should have a similar frequency as others in the population, but a larger frequency of mutated genes contributing to them.

If there are no such diseases observed with these characteristics, but the human genome has redundancy, this would be another evidence that the human race is young and has not reached equilibrium, since it would take longer for redundant groups of genes to reach equilibrium.

It is known that some genes interact in complex ways. Sometimes a trait (such as skin color) is determined by many genes. But in this case, each gene has a small effect, so we do not have redundancy as discussed above. Another possibility is for one gene A to control another gene B by turning B on and off. This is not an example of redundancy, either, because if A has a harmful mutation, it will not control B properly, and if B has a harmful mutation, it will not function properly when turned on. So either mutation by itself has a harmful effect. There is some genuine redundancy in the genome -- at least one gene (perhaps the one for RNA polymerase) has several copies on the genome -- but such redundancy seems to be rare. For example, there is only one gene for insulin, one for hemoglobin, and so on. Thus, we have additional reason to doubt that redundancy is common, supporting the thesis that a population cannot endure a high rate of mutation (one mutation per generation or more). Apparently the Creator has chosen to make the genome small and to take precautions to reduce the number of harmful mutations as much as possible.

The redundancy scenarios given above assume that the death rate increases very slowly with the number of mutations up to some number, and then increases very rapidly. A more natural assumption is that the death rate increases smoothly with the number of harmful mutations. Let Q(x) be the probability that an individual with x harmful mutations dies due to harmful mutations. Then we can assume that Q(x) is monotonically increasing with x. We can also assume that this rate of increase does not get larger with increasing x, that is, the second derivative with respect to x is negative.

Let us consider the possible values for Q(n), where n is the average number of harmful mutations per individual in the population. If Q(n) is about 1/2, then the slope at n is at most 1/2n and we can expect about half of all individuals to die. There will not be much difference in the death rate for individuals in the population, since most of them will have a number of mutations differing from n by at most one or two times sqrt(n). A detailed calculation shows that the population can tolerate at most about Q(n)/(1 - Q(n)) mutations per generation. If Q(n) is 1/2, then this is at most one mutation per generation. If Q(n) is 3/4, then this is three mutations per generation. So we see that a large mutation rate will imply that most of the zygotes die and the species will be at a terrible handicap of fitness. This calculation assumes that the derivative of Q(x) with respect to x is constant up to about n; in fact, it will probably decrease, making the slope much smaller at n. This would mean that the population could tolerate only a much smaller rate of mutation.

So, in summary, we see that an artificial scenario permits a population to endure huge rates of mutation, but natural assumptions reinforce the conclusion that large rates of mutation (one or more mutation per generation) lead to degeneration and extinction of a species.

## Partial Redundancy

There is another factor which makes it extremely unlikely that redundancy will affect the ability to endure high rates of mutation. This is the fact that each gene by itself will have some harmful effect if it is knocked out. Due to the many effects of a gene (pleiotropy), one would assume that each gene would have some selective disadvantage when it is knocked out (in both chromosomes). Suppose a gene has a selective disadvantage of 1/100. This means that its frequency of being knocked out in the population will be about 100 times the rate of mutation, probably at most about 1/1000. If we consider two genes, the probability that both of them will be knocked out in any individual is about one in a million. Even if it is fatal for both genes to be knocked out, this happens so rarely that it will not affect the survival of zygotes to any significant degree. It is only genes that have essentially no harmful effect when they are knocked out that might change the calculation of zygote survival bounds, but such genes should be very rare or even nonexistent. This means we can just as well ignore the effects of redundancy in these calculations.

According to (Proc. Natl. Acad. Sci. USA Vol. 94, pp. 8380-8386, August 1997), typical harmful mutations remain in the population for about 100 generations, which seems consistent with a selective disadvantage of 1/100 for them. We use this information to consider yet another model of redundancy. Suppose that organisms have a certain amount of adaptability, so they can compensate for a certain number of harmful mutations by change of habits. But when the number of harmful mutations becomes too large, this is no longer possible. Under the assumption of a selective disadvantage of 1/100 for harmful mutations, how many mutations per generation can the population bear, assuming an arbitrary amount of truncation selection?