23andMe’s New Admixture Date Estimator

23andMe’s New Admixture Date Estimator

I received a couple of inquiries asking what my thoughts were on the recently introduced 23andMe “Admixture Date Estimator”, which is detailed in their White Paper no. 23-14.

Their theory appears mathematically and statistically sound, and appears to be heavily based on the size and distribution of the segments in your genome, however, the model assumes that admixture is inherited in a single pulse, which is problematic for most users. I will use the results from one of the northern Iraqi Kurd samples I manage to illustrate.

 

Fig 1 – Admixture Date Estimator for a northern Iraq Kurd ; Kurd C1

 

Here 23andMe suggests that this Kurd sample has:

  • 1 Chinese ancestor from 5+ generations ago
  • 1 Mongolian ancestor from 6+ generations ago
  • 1 Yakut ancestor from 6+ generations ago.

 

Fig 2 – Ancestry composition for Kurd C1

 

Looking at fig 2, many would correctly feel that 23andMe’s ancestry composition report is inconsistent with the dates of admixture for this Kurdish sample. The reason being how a sample with such small amounts of minor Chinese, Yakut, Mongolian, and S Asian percentages can have 100% Chinese, Yakut, Mongolian, and S Asian ancestors as recently as 5 or 6 generations ago, considering that an ancestor 5 generations ago is expected to contribute on average 3.5% autosomal DNA.

I agree that this would be inconsistent, except one has to remember that 23andMe is extremely conservative in assigning minor admixture. I have previously written about how 23andMe’s ancestry analysis methodology can underestimate or minimize minor ancestry. I believe they themselves realize this at 23andMe, otherwise they would not show this Kurdish sample to have Chinese, Yakut, Mongolian, and S Asian ancestors as recently as 5 or 6 generations ago, considering the tiny percentages of minor admixture this sample was assigned on their ancestry composition report.

 

23andMe’s ancestry analysis method

First, I will digress a little and discuss how I believe 23andMe’s ancestry analysis method leads to an underestimation of minor admixture in an individual, which I think is relevant to this discussion. I will later return to discussing the estimation of admixture date for this case.

On average an individual is expected to inherit about 3% admixture, from a 100% Chinese ancestor 5 generations ago. The 3% is not an exact amount, because there is variation depending on how the DNA is reshuffled by recombination during meiosis.

23andMe’s ancestry analysis algorithm basically divides your chromosomes into 100 SNP/marker consecutive segments, which are compared against one of their references. The reasons minor admixture is very likely minimized /underestimated, is due to the following:

  1. The majority rule. For example, let’s say in a particular 100 SNP segment, you have 30 SNPs which are similar to their “Chinese” references, and 70 SNPs similar to their “S Asian” references. In this case, the “Chinese” markers would get ignored, and the whole segment would be assigned “S Asian”, because that is what the majority of the markers in that segment are;
  2. To make matters worse, the next step in their algorithm, potentially further reduces minor admixture. To illustrate how this happens, let’s assume that the output from their Support Vector Machine (SVM), which is a classification tool 23andMe uses, is the following assignment; chromosome 1: Z – X – Z – Z – Z – Z – Z – X – Z, where X are 100 SNP “Chinese” segments, and Z are “Middle Eastern” from SVM.

During the next step, their DNA segment smoothing algorithm can change the X (Chinese) segments (highlighted red), to Z (Middle Eastern) segments, simply because they occur in a run of Z segments, even though those particular 100 SNP segments were originally classified as X, or Chinese, by their SVM. The following shows the changed output strand from the smoother:

chromosome 1, parent 1: Z – Z – Z – Z – Z – Z – Z – Z – Z

It is for those reasons that someone from Iran can get assigned 98% Middle Eastern by 23andMe, whereas someone from neighboring Pakistan may get 98% S Asian by contrast, in spite of the Pakistani and Iranian individuals having a common ancestor not too long ago in the distant past.

By minimizing or underestimating minor admixture, 23andMe’s ancestry analysis method can effectively hide older genetic input, older than say a couple of hundred years old, and therefore would not be appropriate for those seeking to know about genetic contribution from ancestors say 1000 years ago. This is because admixture outside the individual’s current main clade is associated with smaller DNA segments, which most likely don’t make it through to 23andMe’s ancestry composition report. However, this is ok, because companies such as 23andMe or Ancestry DNA tend to be more focused on more recent genealogical time frames.

On the flip side, smaller admixture percentages on 23andMe’s ancestry report, such as 0.2%, are most likely real, unlike smaller such outputs from allele frequency based programs such as ADMIXTURE, which most likely tend to be noise.

Now back to this post’s main topic, which is 23andMe’s “Admixture Date Estimator”.  I believe that 23andMe’s model assumes that the admixture is inherited in a single pulse, and the other populations, including those which comprise the individual’s main clade are themselves un-admixed. So in this example with the Kurdish sample, a 100% Chinese ancestor 5 generations ago would be reasonable, if the other Kurds, from whom Kurd descended, themselves were 100% “Middle Eastern” and had no “Chinese” admixture, which very likely would not be the case.

In fact what is more likely is that, Kurd’s other Kurdish ancestors had similar levels of Chinese admixture, perhaps from historical interactions with Scythians, Mittani,  or Cimmerians,  and therefore the date for Kurd’s 100% Chinese ancestor would get pushed back in time.

 

Admixture Date Estimation visualizes how admixed Kurds are relative to Middle Eastern references

Although the lower end of the scale (5 generations ago) may not be accurate, 23andMe’s admixture date estimation is nonetheless informative in that it visualizes how admixed an individual is relative to the reference populations they use. In the case of this Kurdish sample, how East and South Asian admixed this individual is relative to the Middle Eastern references. It can also quantify or visualize the amount of minor admixture you have, and if you think about it, also hints that the percentages of minor admixture reported on 23andMe’s ancestry composition reports are extremely conservative.

 

References:

1- 23andMe Ancestry Composition

2- 23andMe Admixture Date Estimator

4 responses to “23andMe’s New Admixture Date Estimator”

  1. mm Dilawer Khan says:

    Others, including Kurds are encouraged to post their results here.

    To post your image, copy and paste your image url (BBC – for Bulletin Boards) from a photo sharing website such as imgur.com into the comments area

  2. Reza says:

    Referring back to 23andme’s analysis method, how does phasing against parents change the results of a segment?

    • mm Dilawer Khan says:

      Hi Reza,

      Nice to see you here….that’s a good question. In an allele frequency admixture program such as ADMIXTURE, phasing would not change the results, but with a haplotype segment comparison algorithm such as 23andMe’s , phasing can make a difference in the results, and here is why; Suppose we have the following unphased segment output from 23andMe’s genotyping chip:

      AG
      TT
      AC
      CA

      With ADMIXTURE each position on the chromosome independently conveys allele frequency information of the variant, so it does not matter if the variants are jumbled, however, with haplotype comparisons it does. So let us assume with the above segment we find out after phasing with a parent that the order is actually as follows:

      AG
      TT
      CA
      CA

      In other words the child got ; A-T-C-C from the mother, and G-T-A-A from the father, whereas UNPHASED it looked like
      A-T-A-C from the mother, and G-T-C-A from the father.

      It may end up that the PHASED window A-T-C-C matches 23andMe’s Middle Eastern reference better, whereas the UNPHASED A-T-A-C window matches their S Asian reference better. So now after phasing, the S Asian % would decrease slightly, and Middle Easter % would increase slightly.

      It is not an absolute requirement to have the parent data to phase, for example, I can phase using BEAGLE, however, having the parent’s data increases accuracy, as there is always a margin of error using programs such as BEAGLE.

      Hope this helps.

  3. SeinundZeit says:

    I think 23andMe is capable of doing some interesting analyses with all the data that they have, like provide customers with a fineSTRUCTURE/ChromoPainter analysis.

    I really wish that they could go in that direction.

Leave a Reply

Your email address will not be published. Required fields are marked *