Hi, in this video we're going to be talking about human genetic variation. So, the first thing is, what is the scope of the human genome? What does it look like and what's in it? Well, we know this because we've been able to sequence the human genome and that's provided this information of course on the size and what's in it. Now, I'm just going to go through some different numbers. You don't necessarily need to memorize these specific numbers, these genes 9%, whatever, but just sort of understand the overall composition, how many genes there are and what those genes actually are encoding for, are not encoding for, is just sort of an overall important thing to understand.
So the human genome contains around 3.2×109 nucleotide pairs organizing 23 chromosomes. But you would like to I think find interesting and also want to know that actually less than 2% of that encodes for proteins. So what is the rest of it encoding for? The overall composition of the human genome is around 20-25,000 protein-coding genes. There's about 1.2% of the genome encodes for proteins. 50% of the proteins or of the genes in here can actually, or either currently or have in the past, been able to jump around the genome. So they're considered mobile genetic elements or jumping genes. There are around 9000, probably even more than this, but there are 9000 known functional RNAs. And the interesting part of this is actually 5% of the human genome is highly conserved and of other organisms. So 5% of our genomic sequence is really conserved across pretty much most organisms on earth. But we already know that there are only 2% of the genome encodes for proteins. So that leaves 3% of really conserved genomic material that we don't really know what it does.
This is really interesting and sort of excites a lot of scientists, you know, what is this other 3% doing. So, if we're to just look at an overall human genome composition, this is kind of, you know, just a pie chart just to give you an overall view. You don't need to know these percentages, or even what these things are, but just sort of realize, you know, if we're looking at introns here, that is this region 26%. Introns, that's a huge portion of this pie chart. If we're looking at these things called LINES, these are mobile genetic elements. You don't need to memorize that, just sort of I'm telling you now, you can see that's 20%. That's a huge portion of the genome. But what's not a huge portion of the genome is this protein-coding region which is here 2% of the pie chart. It encodes for proteins. It's really tiny. Not a lot, but it creates everything about us.
And then come back and now we're going to talk a little bit about comparing genomic sequences between humans and other organisms. You're going to see a lot of percentages here, a lot of dates and you don't necessarily need to know that unless your professor has specifically said you don't know these dates. But I just want to tell you about them just so that we can really get this understanding of scientific advancement and also differences in protein coding compositions between organisms. Prokaryotes, sort of bacteria, were first sequenced in 1995, the first organisms to be sequenced were bacteria. And they found that 90% of the genome is protein coding. So you can imagine that scientists doing this were thinking okay well then that means that most organisms genome is protein coding. But you can see as we continue down this sort of list here with these model organisms, we have yeast in 1996, worms in 1998, fruit flies in 2013, and this starts to really decrease and they were like dear goodness! Like, what's going on? What are all of these huge percentages of the genome that aren't protein coding?
And then in 2000, what happened is they started sequencing Arabidopsis, which is a flowering plant. Now they were surprised because it was the first organism that they actually saw an increase in protein coding compared to what the trend they had been seeing as they were going throughout the years. And so what they concluded here was that, and this is really important. So I'm going to highlight this in a different color. The size of the genome does not dictate organism complexity. I mean a flowering plant, 25% of the genome is protein coding. That is such a small amount, but it's even, I mean it's just when you compare these organisms and the amount of genome that's protein coding. You see that this is just kind of, this is a little crazy and a little ridiculous and a little exciting that the size of the genome doesn't dictate complexity or the amount of protein coding genes that are present. So this is just a simple graph. Or it's actually not a super simple.
So let me walk you through this, and here you have pie charts with different organisms. You have humans, mice, fruit flies, flowering plants, yeast up here. And so you have these pie charts here would say, you know how much is coding much is coding. And you can see up here with the yeast, there's a lot of coating here, down here at the humans, there's a little coding. You also here have listed the size of the genome and also about how many genes they have. You can see that Arabidopsis, which I just highlighted in green has about 20 over 25,000 genes similar to humans, although they've now sort of figured out that humans have a little less than that. And the third thing that is on here is this and this is just generally an example of what a gene would look like in these organisms. So, in yeast you have this sort of starting sequence and the rest of its coding, you get more complex as you get more complex sort of in the flowering plants. Fruit flies, what you see is you have a starting sequence but you can have these different exons, here are exons that exist, making them more. Making the gene more complex. And then as you get down to mammals is nice. And humans, you can see you have multiple starting regions. You can have these sort of exons here, here and here, you have interspersed in here some repeating sequences that either have a known function or generally have a non-unknown function. And so the purpose of this is not so that you memorize all of this. You don't need to know these percentages. You don't need to know these numbers, you don't need to know these images but what you do, what I'm trying to convey to you is that genomic size doesn't equal complexity, just sort of as simple as that. You know, genomes of different organisms have different levels of protein-coding genes. They have different genes, organizations, they have different things that are present in their genes, but more complex doesn't mean bigger. So with that, now, let's turn the page.