5. Protein Techniques

Indirect Protein Sequencing Via Geneomic Analyses

5. Protein Techniques

Indirect Protein Sequencing Via Geneomic Analyses: Study with Video Lessons, Practice Problems & Examples

Topic summary

Created using AI

Indirect protein sequencing through genomic analyses is more efficient than direct methods like tandem mass spectrometry. Working with DNA is easier due to its stability, allowing faster and cheaper sequencing. While genomic analyses can derive amino acid sequences from nucleotide sequences, they cannot identify unknown proteins or detect chemically modified residues, which direct sequencing can. Understanding the genetic code is crucial for translating mRNA codons into amino acids, revealing peptide sequences. This synthesis of genomic and direct methods enhances our ability to analyze proteins effectively.

concept

Indirect Protein Sequencing Via Genomic Analyses

Video duration:

Video transcript

Hey, guys. In this video, we're going to talk about indirect protein sequencing via genomic analyses. So up until this point in our course, we've only focused on direct protein sequencing methods such as tandem mass spectrometry or Edman degradation. Now direct protein sequencing is used on already extracted or isolated proteins. And direct protein sequencing is able to directly identify the sequence of unknown proteins in a sample. However, direct protein sequencing does not account for how biochemists obtain most of their protein sequencing data. And so most of the protein sequencing data is actually derived indirectly from genomic analyses or translating the nucleotide sequences of genes into amino acid sequences. And so this brings up the question, why is most of the protein sequencing data obtained via genomic analyses? Why would we obtain most of our protein sequencing data this way? Well, it turns out that it actually saves a boatload of time. It saves a lot of time. Working with DNA is actually easier than working with proteins in a lab and that's because we know that proteins are really sensitive to lots of conditions, and they can be pretty easily denatured if the temperature is off or if the pH is different. And DNA is more resistant to essentially decomposing and breaking apart. And so because DNA is more stable, it's easier to work with, and so that allows us to essentially work with DNA faster. And it turns out that DNA sequencing is actually significantly faster, cheaper, and more efficient and informative than direct protein sequencing since direct protein sequencing only allows us to obtain the amino acid sequence, but DNA sequencing allows us to obtain the nucleotide sequence. And then from that nucleotide sequence, we can derive derive the amino acid sequence using the genetic code. And so essentially, overall, genomic analyses allows us to collect more data and more protein sequencing data faster. And so that begs the question, why do we even need direct protein sequencing if genomic analysis is the best way that allows us to obtain more protein sequencing data faster? Why do we even need direct protein sequencing if indirect genomic analysis is the best at that? Well, it turns out that we can't just scrap direct protein sequencing because direct protein sequencing has its own sets of advantages. And some of those advantages include the fact that genomic analyses are not able to identify an unknown protein sample on its own. And so, because it cannot do this, that's something that direct protein sequencing is easily able to do. And that's because when we're working with genomic analyses, we're going to need a DNA sample. And so, if we only have an unknown protein or just protein, then we're not able to perform genomic analyses on these proteins. So, it's not, that's not a good thing about genomic analyses. Now, in addition to that, unlike genomic analyses, direct protein sequencing via tandem mass spectrometry can actually reveal chemically modified amino acid residues. And that allows us to identify, essentially, proteins that are genes And so genomic analyses does not reveal chemically modified amino acid residues, but direct protein sequencing can. So that's another advantage of direct protein sequencing and another reason for why we can't just scrap all of the direct protein sequencing techniques. So the rest of this video here is going to refresh our memories on how the genetic code works, which allows us to perform genomic analyses. So recall from our previous videos that the genetic code actually reveals the connection between the codons of nucleic acids and the amino acids of proteins. And so in our example below, we're going to use the genetic code to reveal the peptide sequence in the example shown over here on the right. And so, what you'll see is on the left here we have the genetic code. And recall that the genetic code is essentially reading the codon of the mRNA, and the codons have 3 nucleotides. So with this genetic code, we have the first base of the codon on the left, we have the second base of the codon, so the second base of the codon, on the top here, and then we have the third base of the codon over here on the right. And so recall that the first base of the codon limits us to one particular row here. The second base of the codon limits us to one particular column. And then the 3rd particular codon limits us to a specific position in a box. And so, what you'll see here is that we have a DNA coding sequence that's provided, and you can see that it has a 5 prime end and a 3 prime end. And so we know that this DNA coding sequence can be converted into an mRNA sequence through the process of transcription that's shown here, represented by this arrow. And mRNA sequence is going to be exactly the same as the DNA coding sequence up above, except the fact that all of the threonines, are going to be converted into U's, or uracils, because mRNA only has uracils. And so these two threonines here are going to be converted into uracils in our RNA sequence. And so, now that we have our mRNA sequence, we know that the genetic code breaks down and reads the mRNA sequence in codons, which are sets of 3 nucleotides. So our first codon are these first three nucleotides, AUG. And so again, the first base of our codon is A. And so because it's A, it limits us to this column. I'm sorry, this row. The second base of our codon is u, so we can see that here, u. And so in the second base of our codon, it limits us to one particular column. So the overlap between these two is this box right here. And then the 3rd codon is, I'm sorry, the 3rd base of our codon is g, and so that limits us to this particular position within the box, which is a u g. An AUG codon corresponds with a methionine amino acid residue, which is why we have methionine as our first residue on the n terminal end of our peptide. So moving on to our next codon, we have GCU. And so GCU corresponds with this, first residue here in this column, I'm sorry, this row. Then we have C, which limits us to this column. So now we're in this box. And then U limits us to this one particular position, GCU, which is an alanine amino acid residue. So over here, we can put an A for alanine in that position. And so essentially what we can do is continue through this process here and move on to our next codon. So the next codon is GGC, and GGC, G is here in this row. G, the second one is G, so that limits us to this column, so now we're in this box. And then C here limits us to a GGC, which is glycine, so glycine is our next residue. And now you guys are probably remembering how this works here, and so what we can do is fill out the rest of these codons here. So we have, after GGC, we have CGG, then we have AGC, and then last but not least, we have AAA. And so CGG corresponds with an arginine, so this is an arginine, CGG. And then, AGC corresponds with a serine, and then of course AAA corresponds with a lysine. And so, what we can see is that the amino acid sequence of our peptide is actually revealed through genomic analysis. We obtain the DNA sequence and we sequence that DNA, And then, through the process of transcription and translation, the genetic code, we are able to obtain the sequence of our peptide. And so this is an indirect method to be able to sequence our peptides. And that's exactly how, indirect sequencing via genomic analyses works. And so in our next couple of videos, we'll be able to get some practice utilizing the genetic code and indirect protein sequencing. So I'll see you guys in those practice videos.

Problem

Use the genetic code above & the coding DNA sequence below to determine the protein sequence.

Video duration:

Was this helpful?

Problem Transcript

Alright. So this practice problem wants us to use the genetic code above and the coding DNA sequence below to determine the protein sequence. And so, notice up above, we have our genetic code, and down below, we have our coding DNA sequence. And so, what you'll also notice is that this genetic code here is specific to mRNA codons, and we know that because it has uracils in it instead of thymine. Uracils are found in RNA and thymines are found in DNA. And because we are given a coding DNA sequence, we first need to convert it into RNA to use this genetic code. Recall from our previous lessons that a coding DNA sequence, because it's coding, it's going to have the exact same sequence as the mRNA sequence, except in the mRNA, all of the thymines are going to be replaced with uracils. So if we go ahead and highlight all of the thymines in our DNA sequence, we can see there's one here, and one there, two here, and one here. So we have a total of 5 thymines, and all of these thymines are going to be replaced with uracils in the mRNA sequence. So if we provide the mRNA sequence, essentially it's going to be exactly the same. We’re going to have an A here, and the T's are going to be converted into U's. So we have a UG, and then we have GCC, so GCC, then we have UGCGUCUCAAG. This is in order from 5 prime to 3 prime. This is our mRNA sequence. Now that we have our mRNA sequence, we can use the genetic code above to read out the codons, which are sets of 3 nucleotides in mRNA. Our first codon is AUG, then our second codon will be GCC, then it's UGCGUCUCAAG. Now, we can read these codons so that we can determine what amino acids they correspond to. Again, our first codon is AUG. So the first base of our codon is going to be on the left-hand side. So because it's A, that limits us to this whole row. The second base is U, so the second base is at the top. Because it's U, it limits us to this whole row, I'm sorry, column. The last base is G, so it limits us to this exact position within the box that we're limited to. And so AUG corresponds with a methionine amino acid residue. Down below, we can put methionine, or the one-letter code M, for this codon. Next is GCC: G is this row here. C is this column, so now we're in this box. And then C limits us to this exact position, which corresponds to an alanine. So below, we can put alanine for GCC. Then we have UGC, so U, G, and C limits us right here to this position, which is a cysteine. So down below, we can put C. We then have GUC, which is G, U, and C, essentially, we're in this box, so GUC is right here, so valine. Then, CUC, so C, U, and C limits us to this position right here, so that's a leucine. And so down below, we can put L. AAG, our last residue, is going to be AA, and that limits us here. So we have AAG, so shown here. So that is a lysine, and so lysine's one-letter code is K. Essentially, when it's asking us to determine the protein sequence, the protein sequence is going to be methionine, alanine, cysteine, valine, leucine, and lysine. This here is the answer to our practice problem, and that concludes this practice. So I'll see you guys in our next video.

Problem

Suppose the sequence below is a template DNA sequence. What is the corresponding protein sequence?

Video duration:

Was this helpful?

Problem Transcript

So at first glance, this practice problem might seem exactly identical to our previous practice problem, especially since the sequence of nucleotides is the same from 5' to 3'. However, there is a key difference in this practice problem. It states that the sequence below is a template DNA sequence; what is the corresponding protein sequence? In our last practice problem, the sequence given was a coding DNA sequence. Being informed that it is a template DNA sequence changes our answer entirely. Recall that the template DNA sequence is complementary and base pairs with the coding DNA sequence. The base pairing works as follows: adenines (A) pair with thymines (T), and cytosines (C) pair with guanines (G). To derive the coding DNA sequence from the template DNA sequence, we need to apply these base pairing rules. Let’s go ahead and do that below. We know that A pairs with T, T with A, G with C, C with G, repeating this sequence accordingly. This sequence here is our coding DNA sequence. We can label it here as the coding DNA sequence. Next, recall that in a double-stranded DNA molecule, the two strands are antiparallel to one another, indicating that the direction in terms of 5' to 3' is opposite. Thus, if the top strand runs from 5' to 3' from left to right, that means the bottom strand, the coding DNA sequence on the bottom, must go from 5' to 3' in the opposite direction from right to left. Thus, our 5' end is on this side, and the 3' end is on that side. To obtain a protein sequence, we need to use the genetic code, which involves converting the DNA coding sequence into an mRNA sequence. The mRNA sequence is essentially the same as the coding DNA sequence, except that thymines (T) are replaced with uracils (U). Also, remember that when using the genetic code, it reads the mRNA from the 5' end to the 3' end. We want to rewrite the mRNA sequence so that it's 5' to 3' from left to right, replacing T's with U's as we go. Thus, we start with a C, then two T's replaced by U's, giving us CUU. Following this pattern, we have GAG, AAC, GCA, GGC, and finally CAT, with T replaced by U, giving us CAU. This produces our mRNA sequence, allowing us to break it into codons: CUU, GAG, AAC, GCA, GGC, and CAU. These codons correspond to amino acids, revealed using the genetic code. Without detailed consulting the genetic code from the previous page, we know from the previous practice problem that CUU codes for leucine (L), GAG for glutamic acid (E), AAC for asparagine (N), GCA for alanine (A), GGC for glycine (G), and CAU for histidine (H). Therefore, our protein sequence from the N-terminal to the C-terminal end of the peptide is leucine, glutamic acid, asparagine, alanine, glycine, and histidine. This encodes our protein sequence represented on this side. This concludes the practice problem, and I'll see you guys in our next video.

Problem

Even when the sequence of nucleotides for a gene is available and genomic analyses can be performed, direct chemical techniques on the physical protein are still required to determine:

The molecular weight of a simple protein.

The N-terminal amino acid residue.

The total number of amino acid residues in the protein.

The location of disulfide bonds.

Here’s what students ask on this topic:

What is indirect protein sequencing via genomic analyses?

Indirect protein sequencing via genomic analyses involves determining the amino acid sequence of a protein by first sequencing the DNA that encodes it. This method leverages the stability and ease of working with DNA compared to proteins. By sequencing the DNA, we can obtain the nucleotide sequence, which can then be translated into the corresponding amino acid sequence using the genetic code. This approach is faster, cheaper, and more efficient than direct protein sequencing methods like tandem mass spectrometry or Edman degradation.

Created using AI

Why is DNA sequencing preferred over direct protein sequencing?

DNA sequencing is preferred over direct protein sequencing because it is faster, cheaper, and more efficient. DNA is more stable and easier to work with in the lab compared to proteins, which are sensitive to conditions like temperature and pH. Additionally, DNA sequencing provides more comprehensive data, allowing researchers to derive the amino acid sequence from the nucleotide sequence. This makes genomic analyses a more practical and informative approach for obtaining protein sequencing data.

Created using AI

What are the limitations of genomic analyses in protein sequencing?

Genomic analyses have several limitations in protein sequencing. Firstly, they cannot identify unknown protein samples without a corresponding DNA sequence. Secondly, genomic analyses cannot detect chemically modified amino acid residues, which are important for understanding protein function and regulation. These limitations necessitate the use of direct protein sequencing methods, such as tandem mass spectrometry, which can identify unknown proteins and reveal post-translational modifications.

Created using AI

How does the genetic code facilitate indirect protein sequencing?

The genetic code facilitates indirect protein sequencing by providing a set of rules for translating nucleotide sequences (mRNA codons) into amino acid sequences. Each codon, a sequence of three nucleotides, corresponds to a specific amino acid. By using the genetic code, researchers can convert the mRNA sequence derived from DNA into the corresponding peptide sequence. This process involves reading the codons in the mRNA and matching them to their respective amino acids, thus revealing the protein's sequence.

Created using AI

What are the advantages of direct protein sequencing methods?

Direct protein sequencing methods, such as tandem mass spectrometry and Edman degradation, have several advantages. They can identify unknown protein samples without needing a corresponding DNA sequence. Additionally, direct methods can detect chemically modified amino acid residues, which are crucial for understanding protein function and regulation. These capabilities make direct protein sequencing essential for comprehensive protein analysis, complementing the data obtained from genomic analyses.

Created using AI

Your Biochemistry tutor

Jason Amores Sumpter

Biology, Biochemistry and Microbiology lead instructor

My Courses

Chemistry

Biology

Math

Physics

Business

Social Sciences

Programming

Product & Marketing

Indirect Protein Sequencing Via Geneomic Analyses: Study with Video Lessons, Practice Problems & Examples

Indirect Protein Sequencing Via Genomic Analyses

Video transcript

Use the genetic code above & the coding DNA sequence below to determine the protein sequence.

Problem Transcript

Suppose the sequence below is a template DNA sequence. What is the corresponding protein sequence?

Problem Transcript

Even when the sequence of nucleotides for a gene is available and genomic analyses can be performed, direct chemical techniques on the physical protein are still required to determine:

Here’s what students ask on this topic:

What is indirect protein sequencing via genomic analyses?

Why is DNA sequencing preferred over direct protein sequencing?

What are the limitations of genomic analyses in protein sequencing?

How does the genetic code facilitate indirect protein sequencing?

What are the advantages of direct protein sequencing methods?

Your Biochemistry tutor