DNA Sequencing and Profiling

Image from Learn Genetics, shows Allele markers when printed.
Refer to https://learn.genetics.utah.edu/content/science/forensics

Introduction

When suspects are interviewed by police or the CPS they may be told that DNA gathered at the scene identifies them. Usually they are asked “why was your DNA found at the crime scene?” They are presented with a near certainty that the “gold standard” evidence will convict them. But how accurate is forensic DNA based identification?

The state of the art is to use a technique called DNA Profiling. DNA is extracted either from blood or a buccal swab and then a small part of that is extracted and processed. Using an observation that certain Alleles contain short tandem repeats (STR) measurements of the number of repeats are used to estimate the probability of how many people would need to be tested in order for that pattern to be repeated. The numbers they come up with are staggering. This process is expensive.

In the past two decades another technique has emerged, this is known as Sequencing. The DNA molecule comprises 3 billion “base pairs”, organized into Chromosomes, Genes, and Alleles. The base pairs are coded as G, A, T or C. Every nuclear cell for an individual will have Chromosomes in which the same Sequence of GATC will be present. This process is very expensive.

Sequence the Whole Genome (WGS). This would be stored as a list of every base pair (all three billion of them). A few years ago they were expecting to be able to do this and even started to advertise a Human WGS for $1000, however you never see that proposal anywhere today.

Technology marches on and thanks to an innovation in what is called “Next Generation Sequencing” (NGS) it may be possible to do a Human WGS for an affordable price. The problem is new technology can reveal new Biology. In this document I will discuss a possible transition to the new technology and its consequences.

DNA Testing Current State of Play

Nuclear DNA is said to comprise about three billion base pairs, you could extract this and in theory produce a string over a meter long. Groups of base pairs within that sequence are called genes, there are about 25,000 of these. Some genes are fixed, others can vary and they are called Alleles. Alleles can vary from person to person. One type of variation is called a Short Tandem Repeat (STR) and they are of intense interest to forensic criminologists. Each person receives an Allele from both their parents.

STR are repeats of a basic base pair sequence (sometimes known as Sequence Motifs). The counts of sequence elements/Motifs in the allele vary a lot person to person. By standardizing on a set of alleles it is claimed that a very useful tool for identifying individuals has been produced. Several organizations, most notably the FBI, have measured the frequency of occurrence of each of the chosen Alleles in a population of volunteers. For each completed run of Capillary electrophoresis (CE) we obtain a number of repeats and match the frequency and then multiply the fraction for each Allele. Note: you have to take into account that each person has two copies of the Allele, as parents may have the same the multiplication needs to done taking that into consideration. The fractions produced are tiny, when expressed as “odds” the number should not be repeated if we could compare up to several Quintilian people. This is with a US standard set of 13 Alleles.

In the US the latest estimates put these odds over one quadrillion to one. You would need four planets the size of earth to identify more than one individual with the same Allele variation. In the UK, using SGM+ and DNA-17, they stick to just one Billion, even so it sounds quite impressive. With 13 Alleles the odds of finding another match are over a quintillion to one. With another 4 Alleles the odds should be multiple quintillion to one. This leaves the question of why do the UK use the one Billion number? It is way too small to be calculated and yet big enough to impress a jury that a suspect has been identified. It must be a fiction.

A few years ago the British CCRC referred a case where DNA testing revealed six matches. That was just in London among people who were forced to give a DNA sample. Possibly something is wrong here, how can you get from needing four planets worth of people to finding six matches just in London? One reason may be that the Database contained entries that are old and perhaps taken when only 4 Alleles were used.

There are known problems with STR analysis. What people are trying to do is take the results of natural Genetic evolution and map it on to a precise mathematical framework based upon statistics. Although the observation that some Alleles have a repeat model is more or less valid, nature is not obliged to follow along. It has also been observed that the repeats can have variation. Not only the repeats but the areas surrounding them, which are necessary to identify the Allele and to extract it.

Known as Discordance. One reason is that it is a chemical process used to isolate the STR Alleles and they do not always work as intended. Sometimes there is a requirement to match a sample that was extracted using an older scheme with a newer one (e.g. DNA-17) and they do not match. Although the UK likes to standardize on an existing/single system, worldwide there are many systems that can be used. These often produce different results (null alleles). The practical reality for DNA or DNA Profile tests is a long way fro the supposed certainty as portrayed by criminal investigators.

One possible explanation is that it is careless casework in the Lab. Another possibility is that the science behind these tests has a long way to go before it can be declared accurate.

Misleading DNA Evidence

Quite a lot can go wrong with poor Lab work (contamination). Peter Gill has written a book on this. It is a must read for any Counsel defending a suspect who maintains innocence in spite of the “Gold Standard” of DNA evidence pointing to their guilt. To be honest similar problems may occur with sequencing.

Next Generation Sequencing

Sequencing the Genome has been a long held goal in the Molecular Biology community, specifically that means finding all the base pairs in the Chromosomes and the order in which they occur. At the time, Launched in October 1990 and completed in April 2003, this was an extraordinarily ambitious project. Since that time there has been a revolution in the technology and the cost of Sequencing has fallen dramatically. However the author is unaware of any case where somebody has been convicted or acquitted using Sequencing technology. It is not that it couldn’t be done but you would need very deep pockets to do it. This is why STR length measurement remains the de facto approach.

Is there newer technology that can provide a more accurate identification? One very interesting Company to consider is Millipore from Oxford, England. They have invented a technology that can sequence segments of DNA or RNA. What is really interesting is that they produce a range of devices that can do this at prices that are appropriate for a Lab right down to a hobbyist price point. You could imagine a DNA Sequence being produced in a Solicitor’s office but that is probably undue optimism. It is still not really possible to produce a WGS for a Human Genome, using this technology, but it could be used to produce a Sequence for a part of Nuclear DNA.

A research team has taken the traditional set of Alleles, used for forensic STR analysis, and applied the NGS from Millipore to these Alleles. I refer to the team led by Courtney Hall at the University of North Texas.

NGS is still a difficult technology to work with. It can be used very accurately Sequence a relatively short DNA molecule or to approximately Sequence a much longer one. Now why would anyone want a less accurate Sequence? The answer is that it can be corrected.

Why would you need to apply NGS to a fragment of DNA when there is a process that has been approved by the Courts? The way the normal process works is that some chemical ‘scissors’ are used to slice out the STR Alleles using what is known as the Polymerase Chain Reaction (PCR). This not only isolates the Allele but produces large numbers of copies of it. Enough copies that they can be put on a Gel Electrophoresis device for analysis. Once the process has finished you will get a visual representation of the Alleles for that sample/person. I should say that is with some very expensive piece of lab equipment to do that. Starting with an isolated Allele they have applied the NGS technique. What they have focused attention on is quite stunning.

The Alleles should be a sequence of Short Tandem Repeats in which each repeat is the same. What they have focused attention on is that not all repeats are the same! There can be variation in some of the repeated Motifs. Variation not only in the STR but in the flanking regions that surround them. It should be no surprise really, nature is not obliged to provide support for an identification technology. What is worse is that if the flanking region is not what is expected, it can result in that Allele being hidden.

NGS can see the hidden variation. This means it can accurately report on the structure of the Allele. Suppose an Allele which has the same length of a variant is analysed, NGS would see that as two Alleles whereas counting the length of an Allele would recognize it as one. Further NGS can see the variation in the flanking region. It is simply more accurate than counting the repeats of the Motif.

There is a project that will Sequence all the Alleles used for Forensic analyses.

The existing FBI Allele Frequency Databases were prepared before it was possible or economic to understand the consequences of the testing being blind to Motif variation. In the future when a complete Sequence is available for all Alleles it may be necessary to redefine the Alleles. Usually the Alleles are named following the location (Locus) on the Chromosome. The use of a Motif length may be regarded as too simplistic.

UK Forensic Community and Sequence Variation

The UK has now transitioned to the DNA-17, a Profiling system. An obvious flaw is that the variation in Motifs discussed above is not recognized. Initially the UK used just 4 Alleles, moving to 6 Alleles in the SGM system. Followed by SGMPlus with 10 Alleles. The UK NDNAD, though based upon SGMPlus, now permits the addition of 17 Alleles. At one point the UK only allowed the SGMPlus system to be used. Part of this was standard selection of “primers” that are used to extract the Alleles. Due to genetic variation primer selection would select some alleles in preference to others. They now allow other systems (Multiplexes) to submit Profiles. This has generated a problem, the Profile produced can vary between Multiplexes. Somewhat embarrassing. This is known as a Discordance. A new process has been devised known as Streamlined Forensic Reporting which hopes to provide more consistency.

For somebody being confronted with a test that indicates it is their DNA it would be very useful to have the respected Alleles Sequenced to confirm or deny that analysis.

References

There is a brief video by the key researcher.

Courtney Hall.

STR-Spy

Leave a Comment

Your email address will not be published. Required fields are marked *