Y-DNA is passed down from father to son relatively unchanged. The changes or mutations that do occur define your haplotype¹ or DNA signature and when combined with the surname and the genealogical record may reveal other men who descend from the same Most Recent Common Ancestor (MRCA)². In genetic genealogy the DNA signature IS NOT a unique identifier for any individual and does not reveal any personal characteristics about the individual.
The Y-DNA test is made up of a series of non-coding DNA segments called STR markers³. The markers selected have been shown to be useful for genealogical purposes. The results are reported as the number of alleles4 representing the length of each marker such as 13, 24, 14, 11, 11 etc. This represents your DNA signature. If two men descend from the same common ancestor they will have a match or a close match of their DNA signature which means their results will be the same or similar.
What constitutes a match depends on how many markers are tested. The most common tests available today involve 12, 25 or 37 markers with nearly a hundred being available overall. The more markers tested the more definition you will have for your DNA signature. In many cases 12 or even 25 markers are not enough to determine if two men descend from the same common ancestor. This is due to the fact that certain DNA signatures are very common.
All modern men descend from only a couple of dozen population groups known as haplogroups5 that originated before the last ice age some 12 to 20 thousand years ago. Men of European descent are most likely to be in a very large haplogroup known as R1b1. Haplogroups are defined by a mutation on certain SNP markers6 which are different than STR markers. Your haplogroup will be predicted by the testing company using your STR markers but can only be confirmed by a SNP test. The first 12 markers are useful in predicting haplogroups but not particularly useful in defining individual families. DNA testing started for scientific purposes not genealogical purposes.
Surnames only started being used in England 1000 years ago. Therefore people from the same clan or family took different surnames. Two men from the same clan, one named Smith and one named Jones, may have shared a recent common ancestor. However that usually cannot be determined from genealogical records. If they were also part of a large common haplogroup their descendants might have exactly the same DNA signature with only 12 or 25 markers. They would not be considered related today. Even two unrelated people from different clans may descend from the same haplogroup founder and have the same DNA signature due to the random mutations.
The analysis of your DNA results involves the comparison of your DNA signature with one or more other people. A person might have a paper trail or suspect that they and an other person share a common ancestor. The DNA test may confirm that. In some cases a person will match another person with the same surname where it was not known that the respective ancestors were related. This allows one to know what direction to focus their research as well as which family line(s) to avoid because they are not related.
In some cases there will be a number of members in the project that are known to be related or at least they have matching DNA signatures. The project administrator can group these people together and begin to do more advanced analysis. Each matching group will have the same values in certain defining markers which distinguish one group from another. The defining markers will be different for each group and usually have allele values which are uncommon in the general population (Based on the frequency distribution of the known values for a given marker).
Defining markers are based on mutations of the marker value from a previous state such as the R1b1 modal haplotype or a family modal haplotype. Some mutations are more recent and thus do not define a large segment of the family. More distant mutations may be defining markers for a subset of the larger family.
The testing company reports matches based on a very narrow definition that does not take into consideration defining markers. They rely solely on the number of mismatches or mutations between any two people. This is known as genetic distance.7 They do not have the knowledge or ability to analyze groups of people from a genealogical perspective. Therefore it is entirely possible that two people known to be related would not be declared a match by the testing company if the number of mutations for a given family was higher than normal.
All mutation is random and not consistent over time. So, one family might not have had a single mutation in the last 1000 years and another family might have had more than one in the last 100 years.
The project administrator can look at groups of similar haplotypes and see possible patterns that are not apparent to the testing company. This usually involves people with the same surname. In many cases it is obvious that two people with similar DNA signatures that have not been declared a match by the testing company do in fact share a common ancestor.
A part of DNA analysis promoted by testing companies is a computer calculation known as the time to the most recent common ancestor (TMRCA) which is expressed as a range in the number of generations to the MRCA. A generation is usually considered to be 25 or 30 years. This is based on the genetic distance between two people and the known mutation rate of the Y-STR markers. Genetic distance is based on the transmission events8 for both lines being compared. The mathematical formula assumes that both lines conform to the known mutation rate which is seldom true for individual families within genealogical time frames. While the TMRCA can be useful in general terms it can also be misleading if a family or a branch of a family has more or less mutations than the mutation rate would indicate. The computer also cannot identify back mutations9 or parallel mutations10 which complicate the TMRCA calculations.
Therefore the project administrator groups similar haplotypes together. There may or may not be a connecting paper trail for any two people. As groups grow in size and include different branches it is possible to determine the ancestral haplotype or ancestral signature11 for the common ancestor. This is also known as the modal haplotype.12 At this point the individual DNA signature is compared to the ancestral signature rather than that of other individuals. In many cases this gives a better representation of the distance between the individual and the common ancestor.
While DNA testing can determine that two or more people share a common ancestor, the identity of that ancestor can only be determined through genealogical research. While this limitation is frustrating, the benefits of DNA testing is that people can focus their attention on ancestors they know they are related to and avoid those that they know they are not related to.
In some cases a person will get their results and not match any other person or group of people. This is a discouraging start to the process. However, like genealogy, DNA testing is an ongoing process. The number of people being tested is growing rapidly. As more people are tested in the future the likelihood of a match becomes greater. As people are associated with an existing group it is hoped that the combined genealogies will allow people to identify earlier ancestors on the family tree.
#. DNA=Deoxyribonucleic acid. Chemicals=adenine (A), thymine (T), guanine (G), cytosine (C)
1. Haplotype=DNA results for a set of markers. Haplotypes are also known as signatures
2. Most Recent Common Ancestor is the earliest person from which two people directly descend.
3. STR=short tandem repeat. AKA Y-STR.
4. Allele=Number of times the DNA chemical repeats for a given marker. (pronounced UH-leel)
5. Haplogroup=A population group defined by a specific SNP mutation. There are about 20 major haplogroups.
6. SNP=single nucleotide polymorphism. (pronounced SNIP)
7. Genetic distance=The number of differences, or mutations, between two sets of results.
8. Transmission event=The passage of genetic material from one generation to the next.
9. Back mutations=A marker mutates once and then reverts to its previous state.
10. Parallel mutations=Two people acquire the same marker value independently rather than it being passed down from an ancestor.
11. Ancestral haplotype=The method of deducing the MRCA's haplotype by comparing matching descendants' haplotypes, and eliminating the mutations.
12. Modal haplotype=The most common result for each marker tested in a group of results. See ancestral haplotype