In an exciting, quickfire format, each speaker had eight minutes to present on data in government.
Subscribe to the IfG Youtube channel: https://bit.ly/3fL6Iek
Better use of data is key to more effective government. Across government, teams are doing fascinating work with data. But those projects don’t get the attention they deserve. Data Bites aims to change that.
This event was the 48th in our series, where the speakers present their work in an exciting, quickfire format.
Each speaker had eight minutes, followed by eight minutes of questions from the audience.
This month’s speakers were:
Joe Cuddeford, Director at Smart Data Research UK
Frank Gauld, Chief Executive Officer at Smart Data Foundry
Ben Goldacre, Director at the Bennett Institute for Applied Data Science
Professor Gina Neff, Executive Director of the Minderoo Centre for Technology and Democracy
The event was chaired by Gavin Freeguard, Associate at the Institute for Government.
Intro by Gavin Freeguard – 0:00-9:12
Joe Cuddeford – 9:13-27:02
Frank Gauld – 27:03-44:32
Professor Gina Neff – 44:33-1:01:30
Ben Goldacre – 1:01:31-1:21:08
Closing remarks – 1:21:09-
Get more from the IfG:
►Register to our upcoming events: https://www.instituteforgovernment.org.uk/our-events/upcoming-events
►Subscribe to our newsletter: https://bit.ly/3Wz9tQx
►Catch up on previous events at: https://www.instituteforgovernment.org.uk/our-events/past-events
►Listen to our podcast at: https://www.instituteforgovernment.org.uk/podcast
#Data #DataScience #DataScientist
Good evening everyone and welcome to the 48th edition of datab bites getting things done with data and government kindly supported by smart data research UK I’m Gavin freegard associate at The Institute for government and it’s wonderful to welcome so many of you this evening here at the ifg and online let’s
Start in the usual way hands up if you’ve been to data bites before welcome back hands up if this is your first databytes welcome now can everyone here properly good I was just checking that you have confidence in the speaker sorry thank you that never happens it’s a particular pleasure to
Welcome you all to the first databytes of 2024 happy New Year this is actually the longest Gap we’ve had since the series started back in April 2019 110 days that’s longer than the current Welsh lab lab leadership contest which will conclude just before our next event
It’s more than twice as long as Liz truss’s government lasted it’s equivalent to listening to all Nadine dory’s audio books 16 times or 76 times Michelle donnellan’s tenure as education secretary though it’s only about a seventh of the time that Northern Ireland was without a government until a
Few weeks ago and about a sixth of the time since the government first announced its Rwanda Asylum plan which as you can see continues to go very well indeed now we Ed that break wisely a sincere thank you to everyone who filled out our survey about improving data
Bites giving us levels of approval that would put an autocracy to shame there will be a few changes we’ll be switching to a six-weekly cycle from April we’ll be looking to run more themed events like this one and we may experiment with some different types of uh times of day
As well I was particularly pleased by this person’s answer to what do you enjoy most about data B Gavin’s good jokes slightly less pleased with their answer to what do you enjoy the least about datab bites Gavin’s bad jokes I’m sorry to say that the jokes or
Attempts at them are here to stay and I’ll let you judge which category they all fall into let’s get the housekeeping out of the way tonight’s event is on the record and we are being live streamed obviously if you’d like to get involved on social media it’s # ifg databytes and we are
Live tweeting still from ifg events and to put questions to our speakers if you’re online you can use the slido page you’re almost certainly already on it’s bit. llod db48 capitals Capital DB if you’re here at the ifg you can also use that or you can raise your hand though do note
This feedback from our survey questions in the room are always long winded consider yourselves told and if you are online please do get your questions in as soon as you can as you are on a very slight delay why does the ifg organize datab bites well we aim to bring together the
Various different data communities in and around government show everyone what better data can achieve in practice and put interesting data projects on the record so we can all learn from them how does datab bites work you’re going to see four presentations about data this evening each presentation will last for
8 minutes yes just eight minutes there are eight bits in a bite hence eight minutes in a data bite the Pres presenter will then face questions for 8 minutes yes just 8 minutes and then we’ll move on to the next presentation so four presentations of 8 minutes each
Followed by questions for 8 minutes this is our 48th datab ites bringing to an end our sixth cycle of eight events you can watch the previous 47 events on the ifg website so what’s happened in UK politics since we last met well as ever it’s been a quiet few
Months a few days after the last data bites there was a reshuffle a minister for common sense was appointed not sure what that implies about all of the other ministers but the big story was the return of former Prime Minister David Cameron four conservative PMS before Rishi sunak one Twitter user pointed out
That if Tony Blair had done that he’d have had Ramsey McDonald as foreign secretary a reminder both of recent turnover and the Stellar 20th century electoral success of the labor party while we’re on turnover that reshuffle gave us a 16th housing Minister appointment since May 2010 though to be
Fair he had done the job for 7 weeks under Liz truss it was a busy start to 2024 for ifg with the publication of our whiteall monitor report on the size and shape of the civil service numbers were up churn was down a bit but still high and morale
Fell down two with a number of charts in the report there’s the pink line for 2024 blue for 2023 with a run rate slightly lower than this time last year though still better than England’s against India the year also began with a nation gripped by a traitorous orgy of betrayal
Where only one person was left standing as Simon Clark called on rishy sunak to quit a reminder that they used to work together in the treasury now the traitors wasn’t the only TV highlight with the post office Horizon Scandal finally breaking through into mainstream Consciousness let’s hope the salitary warning it provides on
Trusting or automated systems without question transparency or address is heeded by politicians of all Stripes thinking AI is about to solve everything there were two more byelections and two more resounding conservative defeats we also have the Rochdale byelection to look forward to if that is the right phrase there’s no official labor
Candidate but three former Labor candidates standing the conservative defeat in Wellingborough was particularly crushing the second biggest fall in vote for a governing party party since 1945 if we go back to 1979 and look at seats Changing Hands in byelections we can see this Parliament now has the
Highest number that have changed hands and most government losses coupled with a high number of MPS changing Allegiance most recently former Tory Deputy chair Lee Anderson being suspended that means the government’s working majority has fallen by more than 30 since the 2019 election in MOG news here’s an old favorite showing cats in
The cabinet office they will soon be joined by another cat little who spoke here at the ifg back in November who’s been appointed new permanent Secretary of the cabinet office and Civil Service coo and in other civil service appointment news the government has finally appointed a government Chief
Data officer now some of you may remember me singing a SE Shanty at a previous data by which looked forward to that appointment that was three years ago # impact it’s actually a vacancy that’s existed for eight and a half years now it would be really cheap of me to point
Out how many trust governments you could fit into that time 62 since you asked but it’s also longer than Harold Wilson was prime minister could yet be longer than Winston Churchill’s Premiership by the time they start work and isn’t far short of some other long serving PMS let’s hope it doesn’t take
The new CDO quite as long to make a difference to data in government and speaking of being smart with data let’s turn to tonight’s smart data special first we’ll hear from Joe cutterford of smart data research UK on safe access to Smart data for research then from Frank
Gold of SM smart data Foundry on unlocking financial institutions Community intelligence through private sector data sharing at scale after that we’ll be professor ginaf of the minderoo center for technology and democracy at Cambridge on opportunities challenges and lessons from research and our final speaker joining us virtually
Will be Ben gold Acer from the Bennett Institute for Applied data science at Oxford on open safely we’ll be back with datab bytes 49 on Monday the 18th of March you can sign up for that one on the ifg website or via bit. lfg databytes 49 capital ifg ifg DB then
We’ll be back on Wednesday the 17th of April when we’ll be celebrating 50 installments and five whole years of data bites what have I done with my life a huge thank you to Smart data research UK for supporting tonight’s event we’re only able to run data vites
Through the support of Partners like SDR UK If you would like to follow in their virtuous footsteps please drop pesh an email and as ever if you’d like to follow in the footsteps of our speakers please drop me line instead so that’s more than enough from me I’m now going
To hand over to [Applause] Joe check I can use the clicker uh right um hi everyone um so I’m Joe cord director of smart data research UK I’m going to be talking about a new 59 million pound investment that ukri is making in digital research infrastructure that will make it easier for researchers
To use new forms of data um I’ll talk about what is Smart data why do we find it so interesting and so valuable and what are we going to do to make it easier for researchers to do good things with data um so we Define smart data as all data generated through everyday
Interactions with Digital Services digital devices um there is very little you can do in the modern econ economy that doesn’t leave behind some kind of digital trace and it’s a very wide and growing definition covering everything from um social media usage smart home devices digital payments Loyalty cards
Banking data um wearable devices and so on um key thing is it’s not data that’s collected primarily for research but it can be reused for research that has much wider public benefits now unlike survey data and unlike uh administrative data smart data tends to be held by private
Companies and they are using it as we know all the time to um develop and deliver products and services that we all know and love uh they’re using it as well for uh marketing purposes and um we are at a point now in the UK’s research data landscape where we have been seeing
Innovative research using smart dat data across a whole range of areas we have four opportunity areas in our program uh productivity and prosperity health and well-being Digital Society and sustainability and looking at just one of those as an example health and well-being so this is a study led by
James Flanigan and his colleagues um Imperial UCL and Birmingham so they using loyalty card data from boots and loyalty card data from Tesco to try and understand something about the relationship between shopping habits and the development of ovarian cancer so at the moment there’s no reliable way to
Screen for ovarian cancer and because the symptoms tend to be um non-specific it can go undiagnosed for quite a long time um it’s thought that during that period of you know you get symptoms but you haven’t got a diagnosis um women tend to begin by self-managing the symptoms using over-the-counter pain and
Indigestion medication but uh we have a real Gap in our understanding of that because studies tend to rely on um the ability of patients to recall when did you first start getting these symptoms what medication were you buying so it’s quite an unreliable um source of data
But this the researchers in this study were able to look at real shopping data over a long period of time uh they had a health questionnaire and they had a a sample of women with ovarian cancer and a um a control group and so they were able to U isolate a
Particular relationship between purchasing particular over-the-counter pain and indigestion medications and the onset of aarian cancer um on average eight months prior to a diagnosis and that’s four months prior to attending a GP appointment about this so a really important finding really important study the researchers think that they could
One day develop this into an alert system where we could choose to be notified if our shopping habits indicated something to be concerned about um but it also tells us something about the challenges of working with smart data because in this example and in many types of um research that are
Using smart data the researchers have to go to Great Lengths to build a relationship with those companies to acquire smart data to put in place all of the legal and um technical infrastructure that you need to have in order to be able to process these data
Sets safely so if we want to be able to support more research um like this then we are going to need to invest in institutions that can do this work for us because to have individual research teams all trying to put in place their own separate legal and Technical
Arrangements would be very inefficient not just for the researchers but for the companies as well who would have to have lots of separate bilateral relationships so what are we going to do to make this easier so the first thing to say is we have already got some very
Strong foundations in the UK U the last decade has seen a real flourishing of digital research infrastructure trusted research environments that are providing secure Computing environments where researchers can access sensitive data uh sec securely and do analysis without personal data ever being um at risk of leaving the secure environment to ADR UK
And HDR UK two great examples of that from the administrative and Health Data spheres two examples I wanted to mention um esrc has been investing for the past 10 years in the consumer data Research Center and the urban Big Data Center um these are very successful um centers that have been building
Research Partnerships with a whole range of private companies bringing data into a secure environment and supporting research in areas like Urban Development Health diet So based on these programs uh we’ve got a very specific understanding of what works but also where the gaps are and what we need to
Build uh and improve on so we are going to spend our 59 million pounds over the next five years um firstly on some new data services so this will be the most part of our investment uh we have a funding call that went live last Thursday and closes on the second of May
Um these will be um centers of expertise in established institutes across the UK specializing in different types of smart data different research themes and they will be responsible for for acquiring data and providing the infrastructure and the processes for secure access to support research um we will also support
Innovative research uh so we’re going to be focus on Research that develops um new data sets and demonstrates the value of smart data to challenging social and economic issues um we will also crucially be investing in a public engagement program so we’re very committed to making sure
That the public are very involved in the development M of smart data research UK obviously will be fully compliant with all of the relevant data um uh protection legislation and so on but we also want to make sure that the public voice is at the heart of our program um
So we have a public dialogue that kicks off this summer we can say more on that soon um and finally um all of this will need some careful coordination so we’re going to have a central Hub based at esrc um so we’re taking uh a model that has been successfully uh implemented for
ADR UK um where we’ll have a team that will ensure that the data services are joined up help to develop new Partnerships with um a wide audience and work with government to uh identify how smart data and how smart data research uh can connect um and have an impact on public
Policy this is just um our road map so I think the only thing I’ll mention is the funding call again which closes on second of May and in September that’s when those data services are expected to be coming online um we are always on the hunt for
Interesting use cases so if you have a problem that smart data might be able to help with please do get in touch or if you want to discuss anything about the program you can contact us through any of these means thank You thank you very much Joe um a reminder if you’re watching us online um you can submit questions as many people are already doing uh on the slider which is bit. l/ slid db48 if you’re not already there uh those of you in the
Room you can use that as well or you can put your hands up which is what I’ll ask you to do very shortly uh please do wait for the mic to come to you do tell us who you are and where you’re from if you
Can but do remember we are on the record so I’m going to come to the room first any questions in the room I’ve got a hand straight up there and I’ll come to you uh after a couple of online questions so uh James Robson data Protection Officer for the labor
Party um great presentation really love what you’re doing um I just want to get an idea of how you would judge the efficacy of an SD secure data environment or trusted research environment within the project you’re proposing and have you chatted to the D boys D UK digital analytics research
Environment uh yes thank you so good question so yes so the D UK program for those that don’t know it is um initiative that’s been set up um to look at the particular question of how do these trusted research environments that we have now that are holding different
Kinds of data securely in their um different environments how can we think about the cohesion of those data services um are there ways that we can make it easier for users to look at data between the data services in Secure ways so we’re very much engaged with that
Team um and uh they’ve got you complicated and difficult question but very very supportive of that kind of work um did answer your question is there another part of your question um how are you going to judge the accuracy of the proposal for it uh right yes so we so
Part of our funding call um we will have an expert panel that will look through all of the proposals that come in um and part of that panel will involve assessing uh the feasibility and um the compliance with all the sort of relevant security um privacy measures and so on
So it be that kind of expert panel will be making those decisions great thank you I’m going to go online for the next one this is from Steve black evening to you Steve for early diagnosis uses of smart data what is the cost and level of false positives
And how low does it have to be to make the idea valuable overall not sure I am best place to answer that question so could false positives in early diagnosis yes so I think um where where the data might be showing that people are um potentially ill with something but it turns out
They’re not so iose how how do you how do you correct for that or guard against that in some of these studies I’m not I don’t think I’ve got an answer to that right now but perhaps we can pick it up in the um discussion later or afterwards
Brilliant uh let’s come back into the room next we had a question just there hi David Durant excellent presentation thank you um I didn’t hear anything about public data sharing agreements or equivalent is there going to be a public list of which research groups are going to be accessing data
From where and for what reasons um so is your question about whether or not we’re going to be publishing lists of um the people using the data yes yes absolutely so yeah I didn’t mentioned that um so we think that transparency is really important um and we’re very committed to putting in place
Mechanisms that will make sure that uh if there’s an application to use data that’s in smart data research UK infrastructure that it’s that there’s a published record that that applications come in whether or not that application is successful or not I think that’s quite important to be able to see the
Things that are rejected and why as well so we’ll be um working with our new data services to work out the best way to do that great thank you um another online question this one’s from Sam uh from Med confidential evening Sam um given this weekend’s government messaging and I
Think there have been a few stories about dwp’s use of uh data and what it wants to do with people’s data can you think of any adverse consequences of making Club Card data available in bulk to DWP for policy choices yeah so we so um we’re not planning to
Make Club Card data available in bulk to DWP um so government researchers would be eligible to use Smart data research UK to run research um but the pro the projects that anybody if anybody wants to use Smart data research and they have a proposal the project needs to be
Approved the people need to be approved um so all of the the necessary safeguards will be in place regardless of whether you’re a government researcher or an academic researcher um and when you’re using smart data research UK you you don’t have access to the bulk um Club Card file with personal
Data data is deidentified you can’t take any of the data out of the secure research environment so the safeguards in place will protect against um any of that great thank you let’s come back into the room I’m conscious our questioners have not been hugely diverse
So far uh anyone in the room want to ask a question I’ve got another one down here at the front thanks um Paul aen fellow of the Royal Society of Arts um I’d be interested have you done any research against smart data collection usage against people who do not use Smart
Devices so are you getting a correlation between people that don’t for instance use technology in abundance um that matches the results that you do with people who do a really good question um so so it’s one of the challenges with this data source is that um it’s only
Giving you information about the people that are using those services and we know that you know there’s different demographics and digital exclusion some people don’t have the same devices I I have a Apple watch so I’m probably you know loads of biometric data on what I’m
Doing but I’m sure most people in the room don’t um and a key part of our program is actually not just kind of making data available but developing the methods and the tools to understand what is good about these data sets what can you do with them and what’s missing and
What can’t you do with them um and I think I’m also interested in thinking about how you can not just see a smart data as a sort of data source in isolation giving you the answers but how you can bring it alongside other data survey data administrative data um to be
Able to sort of start to provide uh the context um and much more richer picture so I think your question is really important um and hopefully through some of the research that we’re going to be supporting and the data services we’ll be able to start to understand who is it
That we have in in our sample here but crucially who’s missing thanks I’m going to go online for the next one this is from Tom King is there any International collaboration planned or comparable initiatives overseas yes um great question so it is a um something we’ve been looking at so
There are some interesting examples um and Ashley Gina might be able to talk about about some of these later so we’ve been talking to colleagues in the University of Michigan uh they have something called the the social media archive there where um they have I think recently launched a service that has
Data from meta so Facebook Instagram available to researchers so we’ve been talking to them a lot there’s some great um researchers in Australia who have been making real advances with um the a model of access which is more sort of data donation so that’s where you start
By approaching member of the public via established research sample and asking them um for consent permission to access their own data you could we look at um Gavin’s Twitter account or something like that not sure you’d need to do that um and then another another place that um has been doing some really
Interesting work is um Netherlands they’ve been doing something similar um and there’s another one that I’m forgetting but there is oh and I think the other the other thing that’s interesting from kind of from an international point of view um is the regulatory landscape so there is um new
Legislation which I think kicked in in Europe last week or very recently the Digital Services act which has specific pres Provisions um to enable researchers to access data from the big platforms um and a whole regulatory process that goes around that um the UK’s you I think will
Still be able to potentially benefit from that particular bit of of Regulation because I expect that what the big platforms do uh for that regulatory context they will also do for other countries that might not be part of that regulatory regime anymore um so certainly we have our eye on what’s
Going on internationally because you know we’re learning a lot and and there’s particular opportunities to collaborate and bang on time as well thank you for getting us off to a great start J thank you uh and sorry to those of you whose questions we didn’t get to some really
Good ones uh online uh again uh we now go to our second speaker that’s Frank thank you good evening everybody and everybody online and thank you to data byes um my name is Frank gold and I’m the CEO of smart data Foundry with a not for-profit subsidiary of edbury
University set up with a purpose to open finance for good and I’ll explain what that means um we’ve had a long belief and we’ve proven through our our partnership with West group that Financial institutes hold untapped Community intelligence from a coverage standpoint um it’s well known that just
Take top four banks in the UK hold over 75% of consumer current accounts and actually the same for smmes and every day every hour millions of financial data points are created and in the deidentified aggregated data that we collect it’s incredible the number of use cases that that economic data can be
Applied to especially when combined with others and they are the data for good cases that we pursue particularly in three areas socio economics where we combine the economic data for social uh research Health economics where we’re we’re combining and there’s a lot of stages to go through Health Data with
Financial data and then finally new areas for us moving into combining economic data with climate data and as much as the financial institutes have this data they find it very challenging in themselves to leverage this for good um I’m from Banking and Technology background and banking is a risk
Environment so the perceived risk combined with GD compliance is rather constraining for banks to be able to do this themselves and apart from that um I’ve headed up the data warehousing and analytics companies uh groups within a bank and they literally are maxed out on regulatory reporting banking
Reporting um via analytics to support products they really don’t have the capacity to pursue in a priority sense these type of cases and then towards the end um the power really of the financial data is when you combine it with other ancillary data sets other private sector
Data sets or or health or administrative data and it the banks really aren’t in a position to take those data sets inside their environments to be able to combine them so uh enter smart data Foundry um we focus on private sect financial data with a clear purpose to use that data
For good and uh you can see in the top here but the power really in our work comes where we would combine it with traditional sources of data like a survey or or researchers or analytics um combining qualitative and quantitative data or some of the work we’re doing in
Edinburgh with the Usher Institute which is uh three research projects looking at combining Health Data with economic data that’s leading on to work we’re studing to do with HDR UK and then finally administrative data actually working with local authorities in Scotland pulling together their Silo data to help
Them make decisions and do a better job of allocating the services that’s gradually taking us into more more work with ADR UK and Scotland this scatter um and so through that work we’ve pioneered new approaches to opening Finance through good um we gather deidentified aggregated data which you date monthly
Which is normally the shock that we get when we talk to researchers um our data goes back to 2019 as a longitudinal data set and is right up to date with as of four weeks ago we can tell you what happened at Christmas um that comes from
Our trusted data Partners we have data Partnerships uh agreements with each one of them the basis being legitimate interest in particularly research in the public interest we hold that within a trusted research environment in Edinburgh which is which is actually also where the public health Scotland
Hold the NHS data for Scotland a very secure data with incredibly powerful Computing to go with it and that is under the guidance of our risk management our information security and our specific information governance that we created in conjunction with the Ico through their sandbox all of that to
Create an environment where we bring together the best of research with our data science to go after use cases like these three during the co pandemic we created the co dashboard to help uh government decision making a simple dashboard which gave them an indication by data zone of income expenditure and
Balance which was helping them make decisions um off the back of a research project with CBR we know using Sage Cloud accounting data an dashboard that we release in a quarterly basis that’s now picked up by small business commissioner Federation for small business and even reported on Bloomberg
And then finally the dashboard that we created at the end I’ll take you into more detail this is a dashboard that we’ve created for local authorities uh not specifically in Scotland that’s where we started uh to help them make decisions and some of the results have already been applied um around decision
Making this is a dashboard we created for an area just outside of Glasgow called East rfor sure and what you see here on the left hand side um this left hand side is an area of historical poverty whereas on the right is in the same local Authority is an area of
Historical wealth the size of the circles indicate the use of overdraft yeah continuous use of overdraft and what you actually see is the circles in the wealthy area are actually the same size if not bigger than the other area the color indicates the uptake of money
Advice Services uh and you can see it’s high in areas of non financial distress but less so in that area that’s taken us on to a deeper and deeper study with that local Authority looking at the reason reasons why what has happened and we can literally Slide the data back to
2019 and actually see where it’s happened and then when there’s interventions bring it forward to see if it has an effect and uh the social uh cases that we’ve G in there speaking to a social worker in that group really what she said was um I didn’t believe
That you bunch of data Geeks could ever teach us something about the people we work with but we investigated and we literally found family problems that you identified there and she said we would never have seen them and we weren’t doing anything about them until they got
To a crisis point so it was a great study for us and we can tell you more so you can see is this is we bridge this gap between private sector data applying it for researchers and at the same time apply it in the public sector such that
They can improve their decision making but we need to go further to create greater impact and for us in our view it requires five things uh we’re actively working to increase the number of data Partners uh from Financial Services to find greater coverage of both consumer
Andme data um and then we need to streamline the process um it can take up to 18 months before we even at the stage of ingesting the data we’re going after that it’s great to have Nat West group applying confidence and being able to explain to the other Financial
Institutes in in that area but also we’re t killing them all in parallel um to get to that point where we’re linking data sets and we’re really excited by what Joe and Bruce are doing with smart data research UK getting more and more of the data sets together because that’s
The power that we see in the research we apply such that we really hope private sector data becomes essential to research in the UK and then the final point for me 37 seconds um is funding um this is an expensive business from my point of view you know when you take
Regular ingestion of data you hold it in a trusted research environment with all of the conditions the governance around this is extremely high and occasions can involve public sector public engagement you know for each one of the different groups and then you know if you think about the engagement we have
To work with all of the banks work with local authorities and work with researchers across the UK this is a a really heavy goinging business in a world which quite Honestly by definition of the data Partnerships there’s not a lot of Revenue generation that you can
Really have in there so um we’re really excited about what’s happening with scr UK and quite honestly we believe that we hope that’s the stage towards uh data becoming part of the infrastructure for research across the UK and with that I’ll stop thank you thank you very much FR thanks uh
Reminder if you’re watching us online please submit your questions via slid bit. llod db48 uh let’s start in the room again who’d like to ask the first question otherwise online are already ahead of you so I’m going to jump online instead um we’ve got another question from Tom king um you gave some
Successful examples he says are there any failures and how do you identify and resolve insights that are unhelpful we haven’t had any failures yet um I think some of the most controversial is where we’re asked to do research studies that we won’t do we will never do anything
That’s of commercial benefit for a company we’ve been asked and we’ve been asked by areas of government to do uh research um into the financial viability of new taxes which we stayed away from uh unsurprisingly quite a common thing in Scotland unfortunately but um no no failures yet um actually some I would
Say in working with the Usher Institute on health uh one of the the problems our health counterparts is during Co they almost had a wartime level of permission over Health Data which was taken away and so whereas we’ve been able to do some of the economic analysis quite
Quickly it’s really really slowed down that area which I know came up in the co inquiry so uh some areas take longer than than others great thank you uh let’s come into the room I’ll come to you first I’ll come to you next time R thanks Deborah CW smart data research
UK could you just talk a little bit about the role that regulation has played in enabling this data access I think in the end the basis upon which the data sharing was the first part so um there’s different groups that we work with some work on a consent
Model it’s very difficult to work in a consent model to have a a continuous data so we work around legit imate interest specifically um the data sharing is research in the public interest which is great um it took us a long time to establish the information governance processes that we’ll be happy
With uh one of the advantages in Edinburgh is that we have within the same University representatives from ADR UK from HDR UK and uh we could learn a great deal from each other in that area to be able to bolster the practices and quite honestly we had a trusted trusted
Research environment ready for us uh to go in that area but um I think it will evolve you know um there’s always some degree of moral outrage whenever I talk about what we’re doing and uh I think it’s going to be evolving regulation as um I don’t know in some
Ways um some of the feedb back we have is wow I can’t believe you’re using this data and others it’s thank goodness we’re actually using uh my data for good you know so um a bit of both yeah thank you thanks I’m going to go online for
The next one uh this is from Jonathan and flowers when you doing the local Authority support work how small a level of granularity do you go down to and he’s put efficacy versus anonymity we spend a a lot of work making sure that there’s no degree of
Attribution uh at the point of ingestion of the data we won’t go down to a level where you could attribute a person or a street if a street has got too few on it we knock the data out so we handle enough of that at the point of data
Ingestion yeah uh the way in which you see in the graphs is we go to the data Zone level uh which is like um I don’t know eh1 the first part of a postcode is the level that we go down to at that point and no lower but there there’s an
Awful lot of time by data scientist information governance to make sure that there’s no attribution yeah great thanks uh we had a question there thanks Frank emman trano University of Bristol related question actually I was wondering what data sources you use for the local Authority example you mentioned and whether you
Were able to uh extend beyond Scotland for instance msoas in in England and Wales I’d love to ask the guy on my team who sits behind you because he remembers every single bit of it but uh it’s a combination of the financial consumer data that we have and then we pull in
Some Census Data government data as well as consolidating what the local authorities have access to themselves to build up this picture of the decisions that they make so who did I miss anything he we have GB dat oh yes sorry GB data so UK not including Northern
Island where we don’t have enough data great this may be a record it’s 41 minutes into a datab bites and I’m going to ask the first question from Anonymous how are you how are you linking for example financial data with Health Data if it’s by a probabilistic method what’s the reliability of
Correctly linking individuals records this is point is dangerous cuz I have no data scientist you know um I mean the linking for most of what we do is um if the common index is a data Zone then that’s the area that we have it uh one of the Ping project for the Usher
Institute was actually to look whether we could understand the economic impact of long covid there’s something between 2 and 3% of the Scottish population got long covid and we can see that in the Health Data um and we’re try to how do you link deidentified data between
Financial data and Health Data so uh that’s actually a service by the national records of Scotland that can do that type of work um we’re still working on it after about 9 months um they are the university is happy with that and the HCR UK is happy with it but the the
Bank don’t recognize um that group to put their um data identifiable data over them to create the index so we keep working on it but right now our favorite index is data Zone thanks uh let’s come back in the room I’ve got a question down here at the front thanks Lucy behavioral insights
Team I was just wondering if you’ve been working with the Challenger Banks as well and if so what are you finding kind of different about the data that they’re collecting in their data infrastructures yeah no it’s interesting so um right now I’m working with SF which is the Scottish financial sector
Body and uh I presented to the whole of the banking sector there and what they came forward is so you take Scottish Building Society as a tiny Building Society who have savings and pensions another one um we’re working with Virgin money right now um santande UK is still
Quite small and um so we’re gradually picking them up what is the Challenger Banks is is my data useful enough and for us as we paint together this picture of banking across the UK it is and in fact it shows different areas for us but
For them what they worry about is the is the burden of being a able to provide this on an ongoing basis but uh you know really it’s a p Patchwork quilt that we’re piecing together from all of the different banks um right now we don’t
Have the data in yet to be able to see where to go with it but um what we sometimes find with them is that they have Regional areas so if you take virgin money unsurprisingly it’s got Glasgow and Yorkshire and then then down to Newcastle because it’s Cale Yorkshire
Bank and Northern Rock within that area which is what we see or different types of financial products is typically what we find thanks I’m going to go online for I think will be the final question this is another question from Steve black what volume of data are you using and how
Much is the typical cost to use it I’m not going to tell you the cost because Joe CER sitting there um no um right now what we have is uh 5 million accounts yeah um across the UK from one of the banks and uh of course
Banks compete with each other so we want exactly the same from the other leading Bank um in the beginning the the aggregated sets that we have were uh about 22 we used the open banking data set because it was common but now we’ve moved to a deeper set so the data set
We’re working on now is 5 million consumers across GB uh with 220 uh different aggregated classes across income and expenditure so L of data brilliant well Frank thank you very much thank you so much and we now move to our third speaker of the evening that’s
Gina oh thank you thank you it’s a delight to be here I have the honor and privilege of following two great talks tonight so you’ll hear me say everything Joe said I am a sociologist by training and I my research has Spann how people have looked at data in their
Workplaces from the beginning of the commercial web to today these are three of the books that I have done about data including one on digital devices um for a project that we did with Intel and the human Center data science book part of a 40 million pound $40 million investment from the more
Sloan foundations into making a discipline of data science in the US at the universities of New York University University of Washington and University of California Berkeley um the book I’m working on right now is um though a bit of a throwback and it’s really the challenge
That I want to present you tonight and is a both a metaphor and a a warning um I’ve spent 15 years looking at the roll out of what is now called digital twins Technology Building information modeling was what it was called when we started the project and
My collaborator car St stask and I are completing that book manuscript so can data change a sector can data change an industry can data change jobs the short answer is it’s really hard when we look at the social and cultural and social institutional challenges of how people
Work with data and all the constraints that come with that we see that getting data into the hands of people to make change is actually really hard if you think working in government with data is hard let me introduce you to commercial construction and the challenges they’ve
Had in trying to get this data to flow so um one one of the things that we’ve been doing this is what Joe said um one of the things that we’ve been doing at the center that I run the minderoo center for technology and democracy is to really think about how
We can take on and change some of those social institutional and power relationships in our society that helps us get data and Technology to be used for good so so the kind of value proposition that I think brings all of us together really is that new kinds of data hold um potentially
Enormous um value in terms of helping us know about social social behavior um and as Joe said the data really help us ask new kinds of questions in fact it not only helps social science researchers ask different questions and answer different questions it changes the nature of our question
Asking so we see students we see our PhD students come in and they suddenly say not this is what I want to know but this is the kind of data that’s out there how might I find something out we really are thinking differently about what we could know um
However and those examples by the way that I flew through on those slides they’re from The Amazing um studies and case studies that smart data research have been doing um however as part of a Royal Society National Academy of Science working group on on researcher access to data we’ve identified some of
These challenges that we have for getting data into the hands of making making good I want to really drill into two of these for a moment and the first is access we’ve we’re we’ve talked in the beginning of tonight’s talks really about what can be or could be possible
But in the world of social media data which is the the the world that that I’m working in we’re still facing enormous challenges for getting that data into researchers hands the second I would say a lesson that we have learned from covid is connecting data across different institutions and jurisdictions still is
Incredibly difficult we can argue that may be for very good reasons as Citizens we don’t want our data flying around the world um all um L Piggly um but but when when our governments have an concern a cons a legitimate concern for why data should move and the kinds of choices
They can can be making we might have different kinds of challenges so for example covid researchers faced um challenges moving data during Co across National jurisdictions so let me use the rest of my time in three case studies the first is a project that I’m working
In the European Union this is an EU Horizon Project thanks to the UK’s um Horizon guarantee program um AI for trust this is to build an early detection warning system in multiple languages multiple platforms multimodal multi-channel and multilingual um we are H facing enormous challenges in getting
Training data to build that classif fire um because we want to be able to look at circulating misinformation about climate and about health we need the data to be able to do that and it’s incredibly hard um I called this in wired earlier this year a new digital Dark Age that we’re
Facing that researchers are actually um very much in the dark in terms of what we know compared to what social media platforms have access access to um we need to figure out new ways to get that access to that data the work that we’re doing at the minderoo center for
Technology and democracy really is um helping to advocate for researcher access to data in new UK EU and US legislation I can update you a bit about that in our discussion but I will say part of these International efforts are about both linking how legitimate public
Use and access to this data could be structured and figuring out how we build the regulatory I.E the legal mechanisms for making sure that that access is guaranteed um making this actionable though is actually quite challenging it’s challenging for those questions of getting the data into people’s hands getting building the infrastructure and
Getting the scope and scale scale that we need finally just a pitch for the work we’re doing at um the responsible AI UK initiative this is a 35 million pound initiative from ukri to create an international ecosystem for responsible AI research and Innovation and part of what we’re doing now is thinking through
What are those parameters around responsible AI that we want to make sure that we get in the search community that helps us think through these challenges um helps us ensure that we’ve got the data to build the models we’ve got the infrastructure to do the work that helps
Us build Technologies for good and for the future and finally just a um a mention of the center that I run you can look at our website for more of the work that we’re doing gav thank you very much Gina um a reminder if you’re watching us online bit. lsod
Db48 I’ll be saying that in my sleep tonight uh let’s come to the room first who’d like to ask Gina the first question plenty of ground to cover I’m going to hold the silence until somebody puts their hand up thank you very much uh just wait for the microphone hi Gina I’m Bill
Roberts do you think legal protections for um legitimate use of data are enough do we trust people to follow the law do we trust people to follow the law do you think the law will follow the law yeah um uh and and just so the the question
Is do we trust people to follow the law um I one of my hats is training doctoral students and training doctoral students in research ethics and time and again I face the question not from any of my students of course um you know why can’t
I just do this right so for example in the AI for trust project we where we are trying to build a um you know early detection Miss and disinformation um detection tool that would automate and um supplement the work of human in the loop fact Checkers the question that some of those
Researchers might ask is well why can’t we just scrape the data why can’t we just take data and and and use it for building tool um and the short answer there is we’ve been tasked by the European commission to build something that Accords with European values and particularly these values under the
Digital Services act um and GDP PR and increasingly we are now having to Pivot in the middle of building our tool to think about what the eui ACT will mean for what we have done so will people follow the law maybe not always but if we don’t have the law they can’t follow
It for social media platform companies I think we have to be very clear that the balance of power is completely and utterly skewed so that if we don’t have some kind of structures legal structures in place we won’t be able as researchers to get access to data that has enormous
Public benefit in the words of one of the anthropologists who sat with the election 2020 study this very splashy big study that um that came out earlier this Academic Year the the researcher tasked with watching how people did access this data through work said this was um access by Ascent from the company
And that’s not truly the way we can get science done so if we’re relying on the companies to give us permission for the data we won’t be able to ask those really great questions what could I do with this thank you I’m going to go online for the next one this is another
Question from Jonathan flowers I’ll come to you next um is there a ro for participatory methods such as Citizens assemblies in addressing some of the data ethics considerations here Jonathan thank you that’s brilliant question I feel like American softball we have this thing like getting the softball question
That’s a a slower pitch baseball for those of you um I know I know maybe test match cricket is more on the more topical but you know a slowly pitched ball so that the batter can really knock it as we say out of the ballpark um um
We have at responsible AI UK just released um a or about to release a set of um collaborations that will soon be announced that help us develop those participatory models for getting people involved in the decisions around data and around building AI models we also I’m working with the esrc digital good
Network and we too have this participatory angle how do we ensure that people um are both aware of how their data is being used but how they’re really brought in that we’re not simply we’re not simply um you know ticking a box please don’t read this fine print
We’re going to take your data I think those models for How We Do responsible data really are over we need to work with communities in a deep engaged honest participatory way to help co-design what this data can be and as a throwback to that I was incredibly
Inspired by the work I did um around self-tracking data BEC U that that resulted in the self-tracking book we spent a lot of time with the Quantified Self community and in that Community not unlike this one a lot of people who care very deeply about data they would come together and
They wanted to understand how they could gather and use this data about themselves for better insights for things they wanted to change I think data in the hands of people is one of the most powerful things you can do but we can’t simply say oh your data’s out
There go do something with it we need that kind of engaged sustained co-design and capacity building work that really helps us build new kinds of data Futures that’s the kind of Digital Society that I’m super excited about building building and I think we have in the UK an extraordinary opportunity to get there
Through both events like this but also through the kinds of initiatives and data um um uh internationally leading data initiatives that that that both Frank and Joe are a part of thanks we’ve got a question right there at the back hi Gina I want wonder how much artificial intelligence uh projects focus on
Basically large language models right how much the equation is between these two or whether they also Encompass recommender systems right because I think that recommender systems are very humble application but a very consequential application of machine learning and I think perhaps is a bit under resarch given its implication for
Democracy um yes putting on Democracy hat um chat GPT larg language models have dominated our conversation in the last 12 months 11 months since the public has become aware of chat GPT and open AI there’s an enormous as you said enormous different ways of thinking about um what a responsible AI uh
Present and future would be and I think you’re absolutely right we can’t simply drill down and think just about large language models we have we have many more things you know larger language models um aren’t they’re not magic and they’re not the answer to everything so one of the
Things that we’re doing and in responsible AI UK is thinking um really how do we spark the research um Community to be working together in multidisciplinary ways around new kinds of data so so while there are um enormous opportunities to do things around large language models most of the
Projects that we’re looking at so far are not necessarily large langu like most of them are are not around llms so longwinded to answer but there’s lots to do thanks there are two brilliant questions online but it would take 14 minutes to answer them I think rather
Than 14 seconds so Gina thank you very much indeed thank you we’re now going virtual for our final speaker of the evening uh hopefully we’re about to be joined by Ben goldacre Ben can you hear me hey hello yes hi excellent over to you whenever you’re
Ready hey great thanks and sorry to not be there um as our data infrastructure gets better I think our Railway infrastructure is deteriorating um I’m going to tell you just briefly about open safely which is a very large uh trusted research environment that we built during the covid pandemic and what
I hope to show you in the next eight minutes is not just that we built it but also a little bit about how we built it um so first up the general practice data that we have in this country is an extraordinary opportunity it’s got breadth and depth one record for every
Citizen and detailed information on all GP contacts so it can be used to do amazing research of course that’s relevant to a global population because of our eth diversity but also huge opportunity to Monitor and improve NHS care but it also presents huge problems privacy and transparency you can’t just
Give people download access to this data especially um uh well in particular not after just removing names and addresses which was the historic approach to protecting privacy but it also poses huge challenges around usability these are very large data sets that are hard to use it’s also worth noting this is a
Contentious space so there’s been lots of prior investment but there’s still no National Data access or at least until we came along there’s also been widespread Civic concern about previous efforts so when the nhhs proposed extracting all the GP records into one big computer 3 million patients opted
Out of GP data for research and there was a catastrophic loss of public trust and also there’s been kind words about sharing code but that’s very rare in practice so an open safely we developed a new way of working the data stays put we install our open source tools in the
Data center to where the data already resides secondly as I’ll show you in a moment the researchers don’t interact directly with real data they work on randomly generated dummy data and then lastly all the platform code and the analysis code is shared in real time so
What we have as a consequence is an access uh access to an unprecedented volume of data the entire nation the whole of England’s GP records available linked to other data sets with complete trust and support from the professions privacy campaigners citizens juries and so on and very high productivity so 60 3
Papers published after about a 10 million investment to date which I think is pretty good value in the scale of these platforms and 155 projects from 22 organizations so this is a a general purpose platform for users across the community so first up we do hands-off data analysis normally a researcher sits
On a machine and writes their code for programs to uh manage the data change its shape And format and then turn it into graphs and statistical models they usually do that by working directly with the data in open safely we don’t let you do that we give you randomly generated
Dummy data the researcher uses that then when their code is ready to run it’s tested by the system against their dummy data and then it gets sent off into the real environment that contains the real data which no researcher ever gets to enter and then the machine comes back to
Them with an output folder full of their log files their graphs their insights and so on so when we started doing this the first push back we got from the community was how do we know you’re really working this way so we put all of the code for the entire platform up on
The internet free for scientific review Security review and efficient reuse next up people said well maybe we believe you about the Privacy protections but we still can’t see what people are doing with the data so we built a live realtime public log and if you go to jobs. opens safety.org you can see
Everything that’s being run against 58 million patients records and when I say everything if you click through you will get down to individual Hub repositories containing all of the code that’s being run against those patient records just for the context GP practices Remain the data controller for the most fine grain
Data and NHS England is the data controller for the service and this is the NHS England open safely service so by doing this we’ve earned unprecedented trust from all the people who previously objected to access at this scale we’ve got strong formal letters of support from rcgp BMA Med confidential strong
Formal support from joint G pit citizens juries and so on and because of that when the covid pandemic came to an end open safely data access was not Switched Off and we’ve been able to continue running the service and we’re shortly going to expand it but we didn’t just
Want to be secure we also wanted to be productive so part of the way that we work is we had to do things like standardize all of the data management tooling automate things so that you could be sure that people who had written code on their own machine at
Home could be guaranteed that it would run at the other end so for a window into this GP electronic health record data is not made for researchers as Joe said at the beginning it’s made for um clinicians and patients it’s an aid Memoir to help you remember what’s
Happened to the patient previously and then you want to turn it into something that looks a bit more like an analysis ready data set um in the table at the bottom right so to achieve this instead of having the previous model of effectively um closed Anarchy in data management we built standardized data
Preparation tools and a domain specific language electronic health record query language where you write your code once and it will run anywhere that the open safety tools have been built so that allowed us to have Federated analytics where we can have the same code running in multiple different data centers and
There are lots of other advantages to having standardized data management tooling for example it’s very legible every new us new user can read and understand every prior user’s code automating things makes it fast you can update your analyses quickly and you also get Federated analytics that works
Nicely out of the box because a lot of the people in our team are software developers and people who come from a commercial environment before they committed to public service and and open safely we’ve also got a really strong sense of delivering a service so a
Couple of examples we have a five person days of uh full-time user support up front we give uh we give people an experienced user who works along inside them over the first six weeks of them working on the platform there’s comprehensive technical documentation there’s about 70,000 words of user manual online at docs.
Work on this kind of data but who don’t have skills on um things like GitHub python R and St so last up we impose open methods on the community and now that was critical because everybody says they love open methods but policy doesn’t necessarily lead to better practice so the way we
Built open safely you write your code you put it on GitHub you tell us where the code is on GitHub and we run it or rather the machine runs it automatically against patient records so all code on open safely has to be on GitHub before
It even runs and all code run on open safely goes in the open for everyone to see and reuse and that brings all the benefits of open code that I hope everyone recognizes and agrees with which are it’s open for Quality checks it’s open for reuse it’s good for
Accountability it blocks packing you could try and analyze the data a 100 ways until you got the answer you want but everybody would know that you’ve done it and also it’s really good for public proof of delivery on a platform like this so tons of outputs in lots and
Lots of different covid topic areas here are a few examples and most excitingly of all this is the announcement from Department of Health NHS England and ourselves that we’ve got stable funding for the NHS England open safety service and that we’re going to be expanding out to do work on nonco platforms alongside
That we’re also um uh currently under review with esrc to expand it to retail data where the interesting opportunity is that we can prove to retail ERS that the data they’ve shared has only been used for the purposes that have been agreed we’ve just got funded a project
To do open safely as a layer on top of Education data with National Institute of teaching we’re doing some work with various Partners on using the tools in a network to Federate different data centers together and we’re also building non-op safely infrastructure because we don’t think open safely is the only way
To do things um so if anybody’s Keen then please do get in touch and sorry not to be there in 3D thank you very much Ben can you see my big two-dimensional face now we can we can see you your face is on um just a reminder if you’re online it’s bid.
Llid db48 to ask questions uh I’m going to come to the room first that was about 40 slides worth in8 minutes there’s plenty to dig into and we’ve got a question there thanks so much bit bit like a masterclass that one um what’s the next step surely National Data intermediary organization able to
Scale up to you know all national data sets to bring together in the model as transparent and using sort of your your pets Bingo list of federation synthetic and Tres in various ways what is the next step because surely this is the next step an evolution to societal data
Sharing and ecosystems that will just enhance and allow for AI generation but in a transparent manner so the thing that we’re really Keen to do next is twofold really and first of all as I said expands the network of data centers that have open safety tools built in them and then
Secondly I think our model of standardized data preparation and then um uh that allows you to send code out for remote execution open safely I think is probably the only sort of large thing doing that at scale in the UK I think um we would like to see others who are able
To adopt that way of working collaborating with us and in a wider Network to get that kind of interrup um I think the challenges here are not particularly technical once you’ve got good ideas and good teams that can build it there’s a bit of a um Workforce shortfall but the real challenges are
Small P political I guess it’s um getting funding in place and also it’s it is a contentious space you know it’s a space with lots and lots of organizations competing uh often with a long history um and often with a mixed history of delivery um and you know it’s
A challenge with fast moving spaces to to push through that sometimes thanks I’m going to take two online questions together both about uh the opt out uh Anonymous asks so those 3 million are no longer opted out question mark um I think they’re they’re just wondering does it mean that they’re they’re now
Back in the system um while Tom King asks what Prospect is there of persuading people to stop opting out and change the minds of some of the millions who already have done so I think the opt outs are a really difficult Legacy of some not very
Sensible ways of working um I think it was a mistake to go to the public and to say that we were going to extract everyone’s records into one big machine and even worse than that to disseminate that out to multiple different locations um and I think it was a r mate that we
Gave false reassurance about the benefits of pseudonymization um as a way of protecting people’s privacy so we are where we are I think with opt outs we have to make sure that we don’t we have to make sure that we stop making new mistakes around privacy management and infrastructure to stop the problem
Getting any worse um I don’t think the time is right now to start revisiting um opt outs but I think when we’ve got a really good coherent data infrastructure for sensitive data and that could be health and could be other data sets as well then I think that’s the right time
To go back to the public um in terms of the NHS England open safely service which is the service in 58 million patients record that uses the open safely tools during the pandemic uh we were able to run code legally across the whole population data um now under our
Um permanent Direction and under the new direction for non-op safely for for non um covid research uh Tye one opt outs will be upheld um and I hope that over time we can justify to the public that there are different ways of accessing data and that for the most secure
Hopefully people will give um give their consent or assent to that data being used great thank you uh let’s come back into the room we’ve got a question down the front here um can you speak a little bit more about um the dummy data that you
Mentioned um I think I’m I’m from aduk and we’re we’re thinking about synthetic data we we’re using some of it and I I’m guessing that’s what you mean by that um and I know that there are different levels of fidelity and that I believe people who are creating synthetic data
Are still s we we’re still figuring out where the line is drawn um so if you could speak a little bit more about the dummy data that you mentioned and please don’t make your answer too technical because I’m not a data scientist so I wouldn’t understand thank you so so um
We use dummy data synthetic data in a very different way to most groups usually people take real data and then they add noise to that data and the ambition that they have is that they will add enough noise that they can protect people’s privacy but not so much
Noise that they destroy the true statistical signals in the data we think that probably doesn’t work and that’s because we’ve worked with it in various different ways in different settings before open safely so we use synthetic data very differently we have completely randomly generated dummy data and the
User the analyst never uses that to actually run their real code they only use that randomly generated dummy data as a kind of test environment so they use that to write their code but when their code is ready to run against Real dat they press a button it gets wrapped up
In a container using something called Docker but a very highly refined and standardized version of that that we’ve built and then that gets sent off to run against the real data but researchers never use synthetic data to try and do their real analysis great thank you um I’m going to
Go online for the next question this is another one from Steve black I think we can still hear you Ben although you seem to have Frozen image- wise for us uh Steve asks um government seems Keen to invest in analysis tools like like paler but do they invest enough in the
Underlying infrastructure and data quality um look I think it’s the biggest shortcoming in this space today I think um traditional research funders have really you know pretty good at picking winners when it comes to single research projects epidemiologists and so on but I think we haven’t been investing in
Innovation and tools and services around uh data infrastructure and that is a different set of skills that we’ve optimized for so I think um it’s natural when people feel that they haven’t got um other options that they reach for traditional approaches like a one big techbook procurement um I think it falls
To the whole Community to prove out different ways of working um I suppose the the one thing that I really hope for the future is that we can get away from the idea that to link this data together you have to to make a giant data Lake
Where you put all the data on one machine I think what we’ve been able to prove with open safely is not just that you can do a remote analysis where users don’t have to Tinker directly with the data but you can also Federate data centers so that you do just in time
Linkage you take only what you need from each data center to the other data center in order to run your analysis I think I think the the era of data Lakes is hopefully coming to an end but that does require that that funders and the community engage with teams of
Innovators not not just us who are pushing out in this direction thanks I think we’ve got time for one more quick question from the room straight up with a hand there hi thanks Paul Aon um I’m just interested how are you going to persuade the public uh about the benefits of sort
Of Open Source I was at um socon 24 a couple of weekends ago and one of the biggest problems they were having is going trying to explain to the public the benefits of Open Source um I don’t know if it’s necessary for us to directly explain that to the
Public I think explaining it to Commissioners and payers is really critical um I think one of the things that I’m really struck by with always working in the open sharing our code as the best best way to drive technical collaboration not particularly as a moral good the thing I’m struck by is
That just drives delivery I think teams that are still in the old habits of not sharing their code uh just simply don’t deliver as well so you know when I look at the fact that open safely has delivered 63 completed published papers for 10 million PS in the space of just a
Couple of years which is very favorable productivity in comparison with other um platforms that work in more traditional ways um what I see there is just the power of of open it’s every new user gets to benefit from every previous user’s code and that is a beautiful
Beautiful thing to observe that if you can create the right structures for people to work in then it’s a bit like catching rainfall in a barrel instead of it just draining away into a ditch at the end of the road it means that you’ve you’ve created this resource every
Single useful technical act by every single user in the platform when it’s shared in a structured standard way is available for review and forking and reuse by every subsequent user so you know I I share I think your your uh your urge to evangelize for open methods but
I think actually what you can also do is just use open ways of working and prove that that’s what ships and in so far as this is a rational Market which I’m not entirely convinced it is but in so far as this is a rational market then the
The more you deliver the more people will recognize that that’s the right way to go well Ben thank you very much for rounding off our evening perfectly thank you for joining us thanks I say that I do have a few quick Parish notices before we go uh we will
Aim to have video and audio of this event on the ifg website within 24 hours you can already watch it back as live on slido or YouTube um the next ifg event there are plenty coming up uh will be on Tuesday the 5th of March it’s all about
Mission driven government what does it really mean uh we’ve got the CEO of Nester and Georgia goul the leader of Camden uh on that panel also events coming up on fixing the center of government uh Asylum policy and the role of think tanks in the general election
There’ll also be lots of budget coverage and much else besides on the ifg website datab bites Returns on Monday the 18th of March we’ve already got speakers confirmed from the Ico and from National highways so that’s bit. L yifg databytes 49 uh if you’d like to
Sign up to that if you’d like to sponsor a future one or speak a future one uh come and speak to me or pesh afterwards all that’s left for me to say before I release those of you in the in the room out to the reception on the landing uh
Three thank yous first of all to you our audience uh here in the room and online some brilliant questions tonight thank you for coming along uh second a huge thank you to Smart data research UK for sponsoring tonight’s event and bringing us some brilliant speakers because my
Third big thank you and please do join me in some Applause in thanks to our wonderful speakers this evening thank you very much indeed