PETER SKEWES-COX: I'm Peter Skewes-Cox. I'm a third year bioinformatics student in Joe DeRisi's lab. I'm particularly interested in deep sequencing data and how to analyze it looking for viruses in human disease. Pretty early on-- GRAHAM RUBY: I'm Graham Ruby, and I'm a post-doc in Joe DeRisi's lab. When I was in high school, I was interested in biology. But it really wasn't until graduate school that I started adding computer programming to be part of the ways that I approached biology. So one of the particular problems that I've been looking at is infants with respiratory illness. So I go through mucus collected from infants and look at all of the nucleic acid that's in that mucus and look for sequences of nucleotides that haven't been described before but look like they might belong to a pathogen. So from there, you can go from this being an area of mystery in medicine to an area of knowledge where then if someone has that sickness, you can do something about it. PETER SKEWES-COX: When people used to study genomes and things, they would study things one gene at a time. And they would do experiments on those one genes and look at their data and analyze it by hand. Well nowadays, with the technology, we're able to do millions of experiments simultaneously. And it's no longer feasible to go through and look at each gene at a time. Rather, you need to do this in a automated fashion if you ever want to graduate or publish a paper. So what I do is I write computer programs that think like I do and they're able to go through these data at a much faster rate than I am. This is the QB3 server room, and it's got thousands and thousands of computers in here all stored in cabinets stacked right on top of one another. We have one cabinet down here, as well. And we have about 100 nodes in there, computers. What we're able to do is when we do a deep sequencing run, in the next building over, we're able to transfer the data across the network to this big computer down here. So if you have a job that would take a month normally, you're able to do it on 30 computers in one day. Or if it would take a year to run on one computer, you could do it on 300 computers in a day. You're really able to increase your throughput and your productivity. That noise is the sound of computational biology. It's thousands of computers humming and thousands of computers humming produces a lot of heat. And so we have to have air conditioning on down there. It's a cooled room and it's a loud room. So you don't want to spend too much time there at once. GRAHAM RUBY: So these huge amounts of data that increase in scale is what really makes it a lot more efficient in terms of discovering new pathogens of interest. So if you take a little fragment of five or six nucleotides of sequence, five or six letters, and you say, OK, does this come from a virus? If you only look at one snippet of DNA, you have a very small chance that that fragment of DNA is actually going to come from the dangerous pathogen that you're interested in discovering. But if you have billions and billions of small fragments of DNA, then you have a very good probability that one of those small snippets is going to come from the pathogen that's causing a problem for humans. And so that's why you have to systematize the way that you go through this data so that you can collect the huge amounts of data that make it likely that you'll discover something of interest but still be able to go through all that data in a sufficiently detailed way that if you see something that's of interest, you can identify it as being interesting. PETER SKEWES-COX: When we actually do the deep sequencing run, we produce multiple terabytes of information, which is equivalent to thousands of CDs worth of data. And I'll get on the order of 20 gigabytes of text. That won't even fit on your standard flash drive. And so even though it's all just text data, just words and words and sequences, it takes a really long time to process these texts. You can't even open these files in Microsoft Word, for example. It will crash. You need special programs that go through line by line and do the analysis for you. We can process a human genome's worth of data on the order of hours now, which is really cool. It might take us a week or to generate those data, but once we have the data in my hands and on these fast computers processors, we're able to go through them extremely quickly, which wasn't-- possible before. When you get your deep sequencing datas, you need to pay close attention so that you don't miss anything subtle. Because the best projects and the best side projects often come out of these subtleties that you pick up on. And Joe encourages us to do this. And it happens all the time. So when we look at something, oh wait, what's this? This isn't normal. Whereas if you just ran it through a pipeline and put something in and expect the answer to come out, you'd miss all the subtleties along the way. So having all these nice computers and everything is great. But we also interact with them all the time. We're able to get our answers a little more quickly at each step, which is really awesome. But it still requires a lot of user intervention. And smart, intelligent thinking about the problems that we're trying to solve. GRAHAM RUBY: The diversity of pathogens, especially viruses, is quite different from what you might think about from the diversity of other species. And they evolve so much faster. They don't take millions and millions and millions of years to build up differences in their genetic code. They take years or even months or sometimes even days, even in the course of a single infection. And so because they evolve so much faster, it's a lot harder, even if you know about a type of virus. It can be a lot harder to identify that virus with a test that looks for some specific molecule because those molecules change rapidly. The reason to focus on things like computer programming and computer science so much when you're doing this work goes back to a quote from a Louis Pasteur where he says, "chance favors the prepared mind." So the better you are at that and the more elegant your analysis is on the computational, the more able you'll be to notice really interesting and exciting biology when it's sitting right in front of you.
Table of contents
- 1. Introduction to Biology2h 40m
- 2. Chemistry3h 40m
- 3. Water1h 26m
- 4. Biomolecules2h 23m
- 5. Cell Components2h 26m
- 6. The Membrane2h 31m
- 7. Energy and Metabolism2h 0m
- 8. Respiration2h 40m
- 9. Photosynthesis2h 49m
- 10. Cell Signaling59m
- 11. Cell Division2h 47m
- 12. Meiosis2h 0m
- 13. Mendelian Genetics4h 41m
- Introduction to Mendel's Experiments7m
- Genotype vs. Phenotype17m
- Punnett Squares13m
- Mendel's Experiments26m
- Mendel's Laws18m
- Monohybrid Crosses16m
- Test Crosses14m
- Dihybrid Crosses20m
- Punnett Square Probability26m
- Incomplete Dominance vs. Codominance20m
- Epistasis7m
- Non-Mendelian Genetics12m
- Pedigrees6m
- Autosomal Inheritance21m
- Sex-Linked Inheritance43m
- X-Inactivation9m
- 14. DNA Synthesis2h 27m
- 15. Gene Expression3h 20m
- 16. Regulation of Expression3h 31m
- Introduction to Regulation of Gene Expression13m
- Prokaryotic Gene Regulation via Operons27m
- The Lac Operon21m
- Glucose's Impact on Lac Operon25m
- The Trp Operon20m
- Review of the Lac Operon & Trp Operon11m
- Introduction to Eukaryotic Gene Regulation9m
- Eukaryotic Chromatin Modifications16m
- Eukaryotic Transcriptional Control22m
- Eukaryotic Post-Transcriptional Regulation28m
- Eukaryotic Post-Translational Regulation13m
- 17. Viruses37m
- 18. Biotechnology2h 58m
- 19. Genomics17m
- 20. Development1h 5m
- 21. Evolution3h 1m
- 22. Evolution of Populations3h 52m
- 23. Speciation1h 37m
- 24. History of Life on Earth2h 6m
- 25. Phylogeny2h 31m
- 26. Prokaryotes4h 59m
- 27. Protists1h 12m
- 28. Plants1h 22m
- 29. Fungi36m
- 30. Overview of Animals34m
- 31. Invertebrates1h 2m
- 32. Vertebrates50m
- 33. Plant Anatomy1h 3m
- 34. Vascular Plant Transport2m
- 35. Soil37m
- 36. Plant Reproduction47m
- 37. Plant Sensation and Response1h 9m
- 38. Animal Form and Function1h 19m
- 39. Digestive System10m
- 40. Circulatory System1h 57m
- 41. Immune System1h 12m
- 42. Osmoregulation and Excretion50m
- 43. Endocrine System4m
- 44. Animal Reproduction2m
- 45. Nervous System55m
- 46. Sensory Systems46m
- 47. Muscle Systems23m
- 48. Ecology3h 11m
- Introduction to Ecology20m
- Biogeography14m
- Earth's Climate Patterns50m
- Introduction to Terrestrial Biomes10m
- Terrestrial Biomes: Near Equator13m
- Terrestrial Biomes: Temperate Regions10m
- Terrestrial Biomes: Northern Regions15m
- Introduction to Aquatic Biomes27m
- Freshwater Aquatic Biomes14m
- Marine Aquatic Biomes13m
- 49. Animal Behavior28m
- 50. Population Ecology3h 41m
- Introduction to Population Ecology28m
- Population Sampling Methods23m
- Life History12m
- Population Demography17m
- Factors Limiting Population Growth14m
- Introduction to Population Growth Models22m
- Linear Population Growth6m
- Exponential Population Growth29m
- Logistic Population Growth32m
- r/K Selection10m
- The Human Population22m
- 51. Community Ecology2h 46m
- Introduction to Community Ecology2m
- Introduction to Community Interactions9m
- Community Interactions: Competition (-/-)38m
- Community Interactions: Exploitation (+/-)23m
- Community Interactions: Mutualism (+/+) & Commensalism (+/0)9m
- Community Structure35m
- Community Dynamics26m
- Geographic Impact on Communities21m
- 52. Ecosystems2h 36m
- 53. Conservation Biology24m
18. Biotechnology
Introduction to DNA-Based Technology
Video duration:
6mPlay a video:
Related Videos
Related Practice