SEGMENT 5: Dr. Frieberg talks about the challenges and opportunities that lie ahead for Big Data. Dr. Matthew Freiberg, University of Pittsburgh: When you look at an electronic medical record, it’s like when you go grocery shopping. All the food is in the market and what you would really like is to just have your basket and walk around, and say, I want an apple, I want a steak, I want a bottle of milk, and you just pull it off the shelf and put it in the basket, pay and you go. Great, I have everything that I need. EMRs were not built for research. They were built for clinical care. To me, the big challenge is that there is an enormous amount of data in there, but the ability to get that data out easily and seamlessly, that is not easy. If you talk to the people who are the experts, like on our team Cynthia Brannon and Scott DuVall, they will tell you, we can see the data and we know it is in there, but to be able to build a tool to extract it and get it and pull it out accurately, consistently, and repeatedly, that is one of the first big challenges to me. How do we build these tools in an efficient way that can extract this data? The second thing facing us is how can we work with people who build these EMRs. If we are really going to have a commitment from the NIH and other places that these health care systems are really going to be resources for Americans to understand disease, then what we want to be able to do is build these platforms to care for patients and to improve care. We need to try to make sure they are maximally friendly from a research perspective as well, if that can be done, so that we spend time trying to retrofit everything and they are already built to do some of the stuff right off the bat so that we can target things later. Let me give you an example of that. As these EMRs get better, our tools are going to change. Right now, we have this tool for pulling out ejection fraction. Well it may be five years from now that everything that has ejection fraction in it will be automatically in an easily retrievable space with an EMR and you will not need our tools for finding ejection fraction anymore. That is fine. There is almost certainly going to be something else to pull out, because to get to the level of granularity in the data that we want, you can’t have every one of those things in some field that is easily retrievable and people are too complicated. There is going to be this evolution. What we want is that as these EMRs roll out and as they are being designed, it would be really useful to have input from the people who are trying to pull the data out of them, so that you fix or you create these things upfront that are maximally useful. Then we could spend our time working on tools that pull out that extra granular data that really gets us great phenotypic data from multiple projects and we do not pull out the stuff that the EMR could have done for us up front. The third issue is how well do these EMRs all talk to one another? Great, I have built this tool in the VA, but it does not work in EpiCare, it does not work in Cerner, or whatever system you happen to be using. We need to think carefully if there is going to be more of a national initiative, and truthfully, maybe even a global initiative if you really want to think big. I cannot imagine other places do not have EMRs that will not be worldwide, but certainly other countries are likely to have these things. We want to make sure that we have tools that if they be constructed to be maximally flexible so they need very little tweaking to work somewhere else. I think that is also advantageous, and that may not be easy, but I think it can be done. Which comes to what I said earlier; we are big into the multiple PI thing, as you can imagine, because we think that promotes science. You get people who have different skill sets, basic scientists, clinicians, trialists, epidemiologists, and health services. If they are all multiple PIs really contributing their full thought to the grant, we get better science. Imagine if you have a similar thing where you have multiple PIs across different systems. We have a PI for a grant that does VA, and a PI for Kaiser who does EpiCare, and a PI for Pitt who has Cerner. It is the same kind of thing. If you have grants or opportunities to do that, these people can start to build tools up front, where there are saying that EpiCare needs this kind of code, Cerner needs this kind of code, and VA needs this kind of code. If we can do that up front more efficiently because there is funding to do it, we can move this initiative through faster. I believe that there are a lot of people out in the clinical research community who are excited about this, and I think if there were funding opportunities, I think that people would apply. I think that they would work together. I do not think that is naïve; I think that is real. I think if we could do that, I think some of the tools that we have worked on now and put into VA could be replicated and put in elsewhere, and move science along faster. Dr. Gary H. Gibbons, Director, National Heart, Lung, and Blood Institute: As we move to the close here, are there any final thoughts about particularly compelling opportunities that you see on the horizon that you are particularly excited about exploring? Dr. Freiberg: I think the biggest opportunity is really this push, and I don’t want to sound redundant, but the fact that we really are trying to put all this information in a system that allows data to be retrievable, it is not paper, it is electronic, that is the biggest opportunity because the data is there. You do not have to create it. It is there. It is getting it out of there, out of these systems so that we can use it. That is the biggest opportunity. If you start thinking instead of, take the Framingham Heart Study with 5,000 people, which is a great study and has led us on the path, and it is not an either/or, but imagine if you could take results from the Framingham Heart Study that go into the very granular data that we talked about. Imagine if you coalesced 10, 12, 14, 15, 20 different medical centers with, let us say EpiCare data, and now you may have a data set that has 20 or 30 million people in it. You can really look at macro and micro with the stuff that the NIH has funded. That is the opportunity. If you can harness the power that clearly exists, there potentially isn’t a disease you cannot look at. You may be able to understand it in a way that completely informs how we drive basic science, how we drive biomarkers, how we drive genetics. To me, that is the big opportunity. All diseases could be on the table to study in a real meaningful way, because you have the power to do so if we do this right. I will say that in the VA and what Amy Justice has built with the VACS and I have tried to help, (the VACS is the Veterans Aging Cohort Study, just so if people were curious and it is based out of Yale) we have tried this vertical epidemiology model and I think this could be a model for the country as an opportunity. In that, we have our virtual court, which we described earlier in this talk, which is this 120,000 person cohort and we have all of their data out of the VA. Within that, we have 8,000 people who have also participated in survey data, so we get to ask them questions that you obviously can’t do when you are just using data from a health care system. Within that, we have 2,500 people who have provided DNA- urine, plasma, peripheral blood mononuclear cells. Within that, we have people who have done CT testing for coronary calcium, graded treadmill testing, and other biomarker analyses. We are drilling down as we go. But they are linked, because they are in the same health care system. We have all of their data that they have either provided to us by virtue of their care, and/or their survey information. So imagine if you had that amplified. Now if everyone in America were part of this virtual cohort, if you will, of America and then drilling down, where hospital systems then become places where we can collect blood specimens. Instead of having one site where everyone goes to, if they are already coming into clinical care, let’s say once they come in, they say yes I can participate today. The hospitals, these clinics, become places that are virtual cohorts. They can provide blood samples. Let’s say you see your doctor and you are supposed to get a lipid panel today as part of your routine care. They go to the lab to get their lipid panels. What happens if they are already seeing a phlebotomist, they are already in the chair, and you ask, can we also have a tube of blood because we want to look at lipid particles sizes for research and they agree? Right there you can do some of the work that these other cohorts have done in a way that maximizes and builds on top of infrastructure that is already built by virtue of delivering care. That is what we have tried to do in the VA. To me, that is the opportunity. These health systems, it is not just the data. They also provide platforms where people go where we can collect data, additional information- whether it be survey, blood specimens, DNA or whatever- we may be able to help use these infrastructures and work with these healthcare systems to make new cohorts in addition to what we already have. If we can link that to all the stuff that is in their EMR that we have been talking about for minutes now, look what you have. You have this unbelievable resource to understand what is happening for health across a spectrum of diseases, whether they are rare or not, across a spectrum of people, old, young, black, white, it does not matter. I think that is the most exciting thing. I do not have a timeline, but I can see that it can happen. How do I know that? We are working with these tools and these tools work. We have done a microcosm of that within the VA over the last 10 years. If you get a community of people that want to energize and contribute to this, I think this is possible. I think it can be done. It opens up a world of possibilities from my perspective, and it allows people from different disciplines to work together from big data down to DNA data. To me, that is what it is all about, and I think that what is exciting. Dr. Gibbons: I really enjoyed this conversation, Matt. We look forward to ongoing insights into the relationship between HIV and cardiovascular disease. Thanks for spending this time with us.