PreK-12 blog

Join the conversation and stay informed about the latest trends, perspectives, and successes in PreK-12 education.

Explore posts in other areas.

Higher EducationPearson studentsProfessional

  • blog image alt text

    Why 'what works' doesn't: False positives in education research

    By Jay Lynch, PhD and Nathan Martin, Pearson

    If edtech is to help improve education research it will need to kick a bad habit—focusing on whether or not an educational intervention ‘works’.

    Answering that question through null hypothesis significance testing (NHST), which explores whether an intervention or product has an effect on the average outcome, undermines the ability to make sustained progress in helping students learn. It provides little useful information and fails miserably as a method for accumulating knowledge about learning and teaching. For the sake of efficiency and learning gains, edtech companies need to understand the limits of this practice and adopt a more progressive research agenda that yields actionable data on which to build useful products.

    How does NHST look in action? A typical research question in education might be whether average test scores differ for students who use a new math game and those who don’t. Applying NHST, a researcher would assess whether a positive—i.e. non-zero—difference in scores is significant enough to conclude that the game has had an impact, or, in other words, that it ‘works’. Left unanswered is why and for whom.

    This approach pervades education research. It is reflected in the U.S. government-supported initiative to aggregate and evaluate educational research, aptly named the What Works Clearinghouse, and frequently serves as a litmus test for publication worthiness in education journals. Yet it has been subjected to scathing criticism almost since its inception, criticism that centers on two issues.

    False Positives And Other Pitfalls

    First, obtaining statistical evidence of an effect is shockingly easy in experimental research. One of the emerging realizations from the current crisis in psychology is that rather than serving as a responsible gatekeeper ensuring the trustworthiness of published findings, reliance on statistical significance has had the opposite effect of creating a literature filled with false positives, overestimated effect sizes, and grossly underpowered research designs.

    Assuming a proposed intervention involves students doing virtually anything more cognitively challenging than passively listening to lecturing-as-usual (the typical straw man control in education research), then a researcher is very likely to find a positive difference as long as the sample size is large enough. Showing that an educational intervention has a positive effect is quite a feeble hurdle to overcome. It isn’t at all shocking, therefore, that in education almost everything seems to work.

    But even if these methodological concerns with NHST were addressed, there is a second serious flaw undermining the NHST framework upon which most experimental educational research rests.

    Null hypothesis significance testing is an epistemic dead end. It obviates the need for researchers to put forward testable models of theories to predict and explain the effects that interventions have. In fact, the only hypothesis evaluated within the framework of NHST is a caricature, a hypothesis the researcher doesn’t believe—which is that an intervention has zero effect. A researcher’s own hypothesis is never directly tested. And yet with almost universal aplomb, education researchers falsely conclude that a rejection of the null hypothesis counts as strong evidence in favor of their preferred theory.

    As a result, NHST encourages and preserves hypotheses so vague, so lacking in predictive power and theoretical content, as to be nearly useless. As researchers in psychology are realizing, even well-regarded theories, ostensibly supported by hundreds of randomized controlled experiments, can start to evaporate under scrutiny because reliance on null hypothesis significance testing means a theory is never really tested at all. As long as educational research continues to rely on testing the null hypothesis of no difference as a universal foil for establishing whether an intervention or product ‘works,’ it will fail to improve our understanding of how to help students learn.

    As analysts Michael Horn and Julia Freeland have noted, this dominant paradigm of educational research is woefully incomplete and must change if we are going make progress in our understanding of how to help students learn:

    “An effective research agenda moves beyond merely identifying correlations of what works on average to articulate and test theories about how and why certain educational interventions work in different circumstances for different students.”

    Yet for academic researchers concerned primarily with producing publishable evidence of interventions that ‘work,’ the vapid nature of NHST has not been recognized as a serious issue. And because the NHST approach to educational research is relatively straightforward and safe to conduct (researchers have an excellent chance of getting the answer they want), a quick perusal of the efficacy pages at leading edtech companies shows that it holds as the dominant paradigm in edtech.

    Are there, however, reasons to think edtech companies might be incentivized to abandon the current NHST paradigm? We think there are.

    What About The Data You’re Not Capturing?

    Consider a product owner at an edtech company. Although evidence that an educational product has a positive effect is great for producing compelling marketing brochures, it provides little information regarding why a product works, how well it works in different circumstances, or really any guidance for how to make it more effective.

    • Are some product features useful and others not? Are some features actually detrimental to learners but masked by more effective elements?
    • Is the product more or less effective for different types of learners or levels of prior expertise?
    • What elements should be added, left alone or removed in future versions of the product?

    Testing whether a product works doesn’t provide answers to these questions. In fact, despite all the time, money, and resources spent conducting experimental research, a company actually learns very little about their product’s efficacy when evaluated using NHST. There is minimal ability to build on research of this sort. So product research becomes a game of efficacy roulette, with the company just hoping that findings show a positive effect each time it spins the NHST wheel. Companies truly committed to innovation and improving the effectiveness of their products should find this a very bitter pill to swallow.

    A Blueprint For Change

    We suggest edtech companies can vastly improve both their own product research as well as our understanding of how to help students learn by modifying their approach to research in several ways.

    • Recognize the limited information NHST can provide. As the primary statistical framework for moving our understanding of learning and teaching forward, it is misapplied because it ultimately tells us nothing that we actually want to know. Furthermore, it contributes to the proliferation of spurious findings in education by encouraging questionable research practices and the reporting of overestimated intervention effects.
    • Instead of relying on NHST, edtech researchers should focus on putting forward theoretically informed predictions and then designing experiments to test them against meaningful alternatives. Rather than rejecting the uninteresting hypothesis of “no-difference,” the primary goal of edtech research should be to improve our understanding of the impact that interventions have, and the best way to do this is to compare models that compete to describe observations that arise from experimentation.
    • Rather than dichotomous judgments about whether an intervention works on average, greater evaluative emphasis should be devoted to exploring the impact of interventions across subsets of students and conditions. No intervention works equally well for every student and it’s the creative and imaginative work of trying to understand why and where an intervention fails or succeeds that is most valuable.

    Returning to our original example, rather than relying on NHST to evaluate a math game, a company will learn more by trying to improve its estimates and measurements of important variables, looking beneath group mean differences to explore why the game worked better or worse for sub-groups of students, and directly testing competing theoretical mechanisms proposed to explain the game’s influence on learner achievement. It is in this way that practical, problem-solving tools will develop and evolve to improve the lives of all learners.

    This series is produced in partnership with Pearson. EdSurge originally published this article on February 12, 2017, and it was re-posted here with permission.

     

  • blog image alt text

    Analysis: For ed tech that actually works, embrace the science of learning

    By Kristen DiCerbo, Aubrey Francisco, Bror Saxberg, Melina Uncapher

    This is the second in a series of essays surrounding the EdTech Efficacy Research Symposium, a gathering of 275 researchers, teachers, entrepreneurs, professors, administrators, and philanthropists to discuss the role efficacy research should play in guiding the development and implementation of education technologies. This series was produced in partnership with Pearson, a co-sponsor of the symposium co-hosted by the University of Virginia’s Curry School of Education, Digital Promise, and the Jefferson Education Accelerator. Read the first piece here.

    As education technology gains an increasing presence in American schools, the big question being asked is, “Does it work?”

    But as curricula and learning tools are prepared for rigorous evaluation, we should think about how existing research on teaching and learning have informed their design. Building a movement around research and impact must include advocating for products based on learning research. Otherwise, we are essentially taking a “wait and hope” strategy to development: wait until we have something built and hope it works.

    When we make a meal, we want to at least have a theory about what each ingredient we include will contribute to the overall meal. How much salt do we put in to flavor it perfectly? When do we add it in? Similarly, when creating a curriculum or technology tool, we should be thinking about how each element impacts and optimizes overall learning. For example, how much and when do we add in a review of already-learned material to ensure memory retention? For this, we can turn to learning science as a guide.

    We know a lot about how people learn. Our understanding comes from fields as varied as cognitive and educational psychology, motivational psychology, neuroscience, behavioral economics, and computer science. There are research findings that have been replicated repeatedly across dozens of studies. If we want to create educational technology tools that ultimately demonstrate efficacy, these learning science findings should serve as the foundation, integrating the insights from decades of research into how people learn and how teachers teach into product design from the beginning.

    Existing research on learning

    So what do we know about how people learn? You could turn to foundational texts like Clark and Mayer’s e-Learning and the Science of Instruction, Dan Schwartz’s The ABCs of How We Learn, and Hattie and Yates’s Visible Learning for detail. Or you could look to the excellent summaries compiled by Deans for ImpactLearningScientists.org, and Digital Promise Global.

    Here are a few examples:

    Spaced practice: We know that extending practice over time is better than cramming all practice into the few days before an exam. Spaced practice strengthens information retention and keeps it fresh over time, interrupting the “forgetting curve.” Implementing spaced practice could be as simple as planning out review time. Technology can help implement spaced practice in at least two ways: 1) prompting students to make their own study calendars and 2) proactively presenting already-learned information for periodic review.

    Retrieval practice: What should that practice look like? Rather than rereading or reading and highlighting, we know it is better for students to actually retrieve the information from memory because retrieving the information actually changes the nature of the memory for the information. It strengthens and solidifies the learning, as well as provides more paths to access the learning when you need it. Learners creating flashcards have known about this strategy for a long time. RetrievalPractice.org offers useful information and helpful applications building on this important principle. There is a potential danger point here for designers not familiar with learning literature. Since multiple-choice activities are easier to score with technology, it is tempting to create these kinds of easy questions for retrieval practice. However, learning will be stronger if students practice freely recalling the information rather than simply recognizing the answer from choices.

    Elaboration: Taking new information and expanding on it, linking it to other known information and personal experience, is another way to improve memory for new concepts. Linking new information to information that is already known can make it easy to recall later. In addition, simply expanding on information and explaining it in different ways can make retrieval easier. One way to practice this is to take main ideas and ask how they work and why. Another method is to have students draw or fill in concept maps, visually linking ideas and experiences together. There are a number of online tools that have been developed for creating concept maps, and current research is focusing on how to provide automated feedback on them.

    So how many educational technology products actually incorporate these known practices? How do they encourage students to engage in these activities in a systematic way?

    Existing research on instructional use of technology

    There is also significant research about how technology supports teaching practices, which should inform how a product is designed to be used in the classroom.

    For example, there is a solid research base on how to design activities that introduce new material prior to formal instruction. It suggests that students should initially be given a relatively difficult, open-ended problem that they are asked to solve. Students, of course, tend to struggle with this activity, with almost no students able to generate the “correct” approach. However, the effort students spend in this activity has been shown to build a better foundation for future instruction to build on as students have a better understanding of the problem to be solved (e.g., Wiedmann, Leach, Rummel & Wiley, 2012 Belenky & Nokes-Malach, 2012. It is clearly important that this type of activity be presented to students as a chance to explore and that failure is accepted, expected, and encouraged. In contrast, an activity meant to be part of practice following direct instruction would likely include more step-by-step feedback and hints. So, if someone wants to design activities to be used prior to instruction, they might 1) select a fundamental idea from a lesson, 2) create multiple cases for which students must find an all-encompassing rule, and 3) situate those cases in an engaging scenario.

    Schwartz of Stanford University tested this idea with students learning about ratios — without telling them they were learning about ratios. Three cases with different ratios were created based on the number of objects in a space. This was translated into the number of clowns in different-sized vehicles, and students were asked to develop a “crowded clowns index” to measure how crowded the clowns are in the vehicles. Students are not specifically told about ratios, but must uncover that concept themselves.

    Product developers should consider research like this when designing their ed tech tools, as well as when they’re devising professional development programs for educators who will use those technologies in the classroom.

    Product makers must consider these questions when designing ed tech: Will the activity the technology facilitates be done before direct instruction? Will it be core instruction? Will it be used to review? How much professional development needs to be provided to teachers to ensure the fidelity of implementation at scale?

    Too often, designers think there is a singular answer to this series of questions: “Yes.” But in trying to be everything, we are likely to end up being nothing. Existing research on instructional uses of technology can help developers choose the best approach and design for effective implementation.

    Going forward

    With this research as foundation, though, we still have to cook the dish and taste it. Ultimately, applying learning science at scale to real-world learning situations is an engineering activity. It may require repeated iterations and ongoing measurement to get the mix of ingredients “just right” for a given audience, or a given challenging learning outcome. We need to make sure to carefully understand and tweak our learning environments, using good piloting techniques to find out both whether our learners and teachers can actually execute what we intend as we intended it (Is the learning intervention usable? Are teachers and students able to implement it as intended?), and whether the intervention gives us the learning benefits we hoped for (effectiveness).

    The key is that research should be informing development from the very beginning of an idea for a product, and an evidence-based “learning engineering” orientation should continue to be used to monitor and iterate changes to optimize impact. If we are building from a foundation of research, we are greatly increasing the probability that, when we get to those iterated and controlled trials after the product is created, we will in fact see improvements over time in learning outcomes.

    Follow the conversation on social media with the hashtag #ShowTheEvidence.

    Authors:

    • Kristen DiCerbo, Vice President, Education Research, Pearson
    • Aubrey Francisco, Chief Research Officer, Digital Promise
    • Bror Saxberg, Chief Learning Officer, Kaplan
    • Melina Uncapher, Assistant Professor, Department of Neurology, UC San Francisco

    This series is produced in partnership with Pearson. The 74 originally published this article on June 5, 2017, and it was re-posted here with permission.

  • blog image alt text

    #ShowTheEvidence: Building a movement around research, impact in ed tech

    By Aubrey Francisco, Bart Epstein, Gunnar Counselman, Katrina Stevens, Luyen Chou, Mahnaz Charania, Mark Grovic, Rahim Rajan, Robert Pianta, Rebecca Griffiths

    This is the first in a series of essays surrounding the EdTech Efficacy Research Symposium, a gathering of 275 researchers, teachers, entrepreneurs, professors, administrators, and philanthropists to discuss the role efficacy research should play in guiding the development and implementation of education technologies. This series was produced in partnership with Pearson, a co-sponsor of the symposium co-hosted by the University of Virginia’s Curry School of Education, Digital Promise, and the Jefferson Education Accelerator.

    To improve education in America, we must improve how we develop and use education technology.

    Teachers and students are increasingly using digital tools and platforms to support learning inside and outside the classroom every day. There are 3.6 million teachers using ed tech, and approximately one in four college students take online courses — four times as many as a decade earlier. Technology will impact the 74 million children currently under the age of 18 as they progress through the pre-K–12 education system. The key question is: What can we do to make sure that the education technology being developed and deployed today fits the needs of 21st-century learners?

    Our teachers and students deserve high-quality tools that provide evidence of student learning, and that provide the right kind of evidence — evidence that can tell us whether the tool is influencing the intended learning outcomes.

    Evidence and efficacy can no longer be someone else’s problem to be solved at some uncertain point in the future. The stakes are too high. We all have a role to play in ensuring that the money spent in ed tech (estimated at $13.2 billion in 2016 for K-12) lives up to the promise of enabling more educators, schools, and colleges to genuinely improve outcomes for students and help close persistent equity gaps.

    Still, education is complex. Regardless of the quality of a learning tool, there will be no singular, foolproof ed tech solution that will work for every student and teacher across the nation. Context matters. Implementation matters. Technology will always only be one element of an instructional intervention, which will also include instructor practices, student experiences, and multiple other contextual factors.

    Figuring out what actually works and why it works requires intentional planning, dedicated professional development, thoughtful implementation, and appropriate evaluation. This all occurs within a context of inconsistent and shifting incentives and, in the U.S., involves a particularly complex ecosystem of stakeholders. And unfortunately, despite the deep and vested interest of improving the system, the current ecosystem is many times better at supporting the status quo than introducing a potentially better-suited learning tool.

    That’s the challenge to be taken up by the EdTech Efficacy Research Symposium in Washington, D.C., this week, and the work underway as part of the initiative convened by the University of Virginia’s Curry School of Education, Digital Promise, and the Jefferson Education Accelerator. People like us rarely have the opportunity to collaborate, but this issue is too important to go it alone.

    Over the past six months, 10 working groups consisting of approximately 150 people spent valuable hours together learning about the challenges associated with improving efficacy and exploring opportunities to address these challenges. We’ve looked at issues such as how ed tech decisions are made in K-12 and higher education, what philanthropy can do to encourage more evidence-based decision-making, as well as what will be necessary to make the focus on efficacy and transparency of outcomes core to how ed tech companies operate.

    Over the next six weeks, we’ll explore these themes here, sharing findings and recommendations from the working groups. Our hope is to stimulate not just discussion but also practical action and concrete progress.

    Action and progress might look like new ways to use research in decision-making such as informational site Evidence for ESSA or tools that make it easier for education researchers to connect with teachers, districts, and ed tech companies, like the forthcoming National Education Researcher Database. Collaboration is critical to improving how we use research in ed tech, but it’s not easy. Building a common framework takes time. Acting on that framework is harder.

    So, as a starting point, here are three broader issues that we’ve learned about efficacy and evidence from our work so far.

    Everyone wants research and implementation analysis done, but nobody wants to pay more for it

    We know it’s not realistic to expect that the adoption of each ed tech product or curricular innovation will be backed up by a randomized control trial.

    Investors are reticent to fund these studies, while schools or developers rarely want to pick up the price tag for expensive studies. When Richard Culatta and Katrina Stevens were still at the U.S. Department of Education’s Office of Educational Technology, they pointed out that “it wouldn’t be economically feasible for most app creators (or schools) to spend $250k (a low price tag for traditional educational research) to evaluate the effectiveness of an app that only cost a total of $50k to build.”

    We could spend more efficiently, leveraging the 15,000 tiny pilots and decisions underway into new work and new insights without spending more money. This could look like a few well-designed initiatives to gather and share relevant information about implementations and efficacy. Critically, we’ll need to find a sustainability model for that type of rigorous evaluation to ensure this becomes a key feature in how adoption decisions are made.

    We need to recognize that evidence exists on a continuum

    Different types of evidence can support different purposes. What is important is that each decision is supported by an appropriate level of evidence. This guide by Mathematica provides a useful reference for educators on different evidence types and how they should be viewed. For educators, it would be wise to look at the scale and cost of the decision and determine the appropriate type of evidence.

    Tools like the Ed Tech Rapid Cycle Evaluation CoachLearn Platform, and Edustar can provide useful support in making decisions and evaluating the use of technology.

    It’s important to remember that researchers and philanthropists may use education research for different purposes than would a college, university system, or districts. Academic researchers may be looking to identify causal connections, learning gains, or retention rates, while a district is often focused on a specific context and implementation (what works for schools similar to mine).

    When possible, traditional randomized control trials provide useful information, but they’re often not affordable, feasible, or even necessarily appropriate. For example, many districts, schools, or colleges are not accustomed to or well versed in undertaking this type of research themselves.

    It’s easy to blame other actors for the current lack of evidence-driven decisions in education

    Everyone we spoke to agrees that decisions about ed tech should be made on the basis of merit and fit, not marketing or spin. But nearly everyone thinks that this problem is caused by other actors in the ecosystem, and this means that progress here will require hard work and coordination.

    For example, investors often don’t screen their investments for efficacy, nor do they promote their portfolio companies to necessarily undertake sufficient research. Not surprisingly, this tends to be because such research is costly and doesn’t necessarily drive market growth. It’s also because market demand is not driven by evidence. It’s simply not the case that selection choices for tools or technologies are most often driven by learning impact or efficacy research. That may be shifting slowly, but much more needs to be done.

    Entrepreneurs and organizations whose products are of the highest quality are frustrated that schools are too often swayed by their competitors’ flashy sales tactics. Researchers feel that their work is underappreciated and underutilized. Educators feel overwhelmed by volume and claims, and are frustrated by a lack of independent information and professional support. We have multiple moving pieces that must be brought together in order to improve our system.

    Ensuring that ed tech investments truly help close achievement gaps and expand student opportunity will require engagement and commitments from a disparate group of stakeholders to help invent a new normal so that our collective progress is directional and meaningful. To make progress on this, we must bring the conversation of efficacy and the use of evidence to center stage.

    That’s what we’re hoping to help continue with this symposium. We’ve learned much, but we know that the journey is just beginning. We can’t do it alone. Feel free to follow and join the conversation on Twitter with #ShowTheEvidence.


    Authors:

    • Aubrey Francisco, Chief Research Officer, Digital Promise
    • Bart Epstein, Founding CEO, Jefferson Education Accelerator
    • Gunnar Counselman, Chief Executive Officer, Fidelis Education
    • Katrina Stevens, former Deputy Director, Office of Educational Technology, U.S. Department of Education
    • Luyen Chou, Chief Product Officer, Pearson
    • Mahnaz Charania, Director, Strategic Planning and Evaluation, Fulton County Schools, Georgia
    • Mark Grovic, Co-Founder and General Partner, New Markets Venture Partners
    • Rahim Rajan, Senior Program Officer, Bill & Melinda Gates Foundation
    • Robert Pianta, Dean, University of Virginia Curry School of Education
    • Rebecca Griffiths, Senior Researcher, Center for Technology in Learning, SRI International

    This series is produced in partnership with Pearson. The 74 originally published this article on May 1, 2017, and it was re-posted here with permission.