Data (Re)Makes the World Conference, March 31st & April 1st, 2023
Information Society Project, Yale Law School
https://law.yale.edu/isp/events/data-remakes-world
Panel 2: Trusting Sources
Matthew Jones, Columbia University, panel chair
Gabriel Grill, University of Michigan, “Constructing Certainty in Machine Learning: On the performativity of testing and its hold on the Future”
Kadija Ferryman & Odia Kane, John Hopkins, “Identifying and Interrogating Algorithmic Accounts in Medical AI”
Kushang Mishra, IIIT Bangalore, Bidisha Choudhary, IIIT Bangalore, “Data-driven ‘precision’ vs Farmer’s guesswork: How Data is (Re)Making Agriculture in India”
Alexander Campolo (Duham U) & Katia Schwerzmann (Ruhr-Universität Bochum), “From ‘Is’ to ‘Ought’: Data as Example in Machine Learning”
[Gabriel Grill] of high accuracy and certainty are constructed by investigating testing in machine learning tests are one important way how the adoption of algorithms is justified. There have been an avalanche of grandious claims around the accuracy of algorithms in the AI. For example, a team at IBM argued to be able to
Predict with 95% accuracy which workers will quit in the future. A new research project claimed to be able to detect lies in border control with 75% accuracy, and researchers claim to be able to predict sexual orientation from face images with 91% accuracy. I aim to unpack the situation and continue the
Project of a sociology of testing in the realms of machine learning. With the current hype around generative AI, like ChatGPT which is claimed to be a sort of universal model due to its complexity inscrutability and all-encompassing data. Testing to show capabilities and accuracy is again receiving a lot of attention–
The clicker seems to be not working. Thanks. I will say Just next slide, okay. Yeah, that’s the correct one. So… Okay. No, the next slide, please. Sorry. Yeah. Okay. Accuracy in machine learning, simply put, refers to a metric that quantifies a correlation between the results of an algorithm and test data.
This definition highlights how accuracy is not some absolute Universal but instead depends on chosen test data and perspective. Naming a metric “accuracy” can already be considered an act of power as it suggests, situated functionality can be expressed with a singular number.
This confusing naming in machine learning has led to false descriptions of certainty. For example, the Air Force reportedly developed a missile recognition system that after initial tests was believed to have an accuracy of 90% but it was tested only with images that contained one missile
And another test later with pictures of multiple missiles revealed a much lower accuracy of 25%. Major General Daniel Simpson described the system as being “confidently wrong.” Researchers have in the last years highlighted how such accuracy numbers can misinform about functionality and hide problematic effects arguing even that ML is experiencing a reproducibility crisis.
Their work has challenged this avalanche of grandiose claims and deconstructed assumed Universal accuracy. Next slide Yet accuracy metrics still remain important. In part, this is due to how they are needed for developing machine learning algorithms as they give direction and provide a sense of the capabilities of otherwise opaque, highly complex, algorithms.
They are meant to be scientific, formal, and standardized quantifications of quality and progress but on the other hand, they also are performative, normative, and rhetorical numbers circulated to convince others that an algorithm works. The tests influence discourse around accuracy as they are tied to a promise of mechanical objectivity.
I argue this duality marks a conflict of interest when those that conduct a test also benefit from favorable results. In the paper, I also discuss in more detail how did the circulation of seemingly ever-increasing accuracy numbers and ever-bigger data sets within the field and Industry
Enabled the performance of continuous scientific progress worthy of investment and attention. Next slide I draw an ignorance study as a framework to theorize these current issues around unreliability conflicts of interest and politics in testing illustrating how the construction of high accuracy claims also entails the production of ignorance.
I argue that the often opaque flexibility in testing and the concealment of human judgment enable the construction of accuracy claims and thereby produced accuracy is not ignorance is not not necessarily problematic but it is always political imbued with power and productive as its circulation entails world making and can engender epistemic violence.
Various prior works have highlighted how high accuracy claims have been used to justify and objectify systems of oppression. Next slide, please. I also understand this ignorance as strategic, since the current dominant paradigm in ML incentivizes reporting ever higher accuracy to stay relevant and convince others.
This tendency to overpromise has been identified as part of the reproducibility crisis in other fields and also critiqued within the tech industry. The produced ignorance should in turn not be understood as a mere puck mistake or a result of pitfalls but as a feature from which some actors benefit.
The current incentives encourage actors to not conduct and even impede investigations into harms and failures to avoid controversy and liabilities while claiming innocence. For example, Meta reportedly asked employees to avoid terms such as discrimination when talking about algorithms to avoid liability. Various companies also removed employees highlighting risks of key technologies.
These strategies are reminiscent of tactics used by controversial companies with a high concentration of market power such as big oil or big tobacco. In the paper, I highlight several of these tactics and how big tech companies have employed them recently. For example, ShotSpotter advertised its systems for its accuracy
But experts highlighted that they must have excluded test cases for which it was unclear whether a shot had been fired after police arrived to achieve such accuracy numbers. Chicago’s Inspector General even noted that physical evidence of a gunshot was found in only 9% of all ShotSpotter alerts
Which only further suggests that ShotSpotter is likely misrepresenting the capabilities of its system by choosing what data to consider when calculating accuracy. Next slide. I will now explain several ways how high accuracy claims in machine learning are constructed in testing by producing ignorance. I will illustrate this by unpacking testing for emotion recognition
Algorithms trained on pictures with a few categories, emotion categories. In the paper, I mentioned also several additional ways of producing ignorance in testing. Producing ignorance is unavoidable in testing as it’s not possible to test for every eventuality so a central question of testing is thus, about how priorities and perspectives are considered.
Next slide. Yeah. In order to enable high accuracy like the underlying data the the underlying data needs to be comprised of many stable and recognizable patterns. One central job of engineers is to scope problem spaces and data so they encompass predictable phenomena while excluding those that are unpredictable messy or resistant to measurement.
This practice of exclusion and simplification is essential for enabling high accuracy in algorithms and making such systems useful but is also a political practice there are several accepted ways practitioners justify such scoping. For instance by excluding certain examples as outliers
Such exclusions can also be made intentionally in obfuscated ways but often they are just unnoticed. For instance, because a certain way of framing a problem is considered hegemonic. In the case of emotional recognition, the appearance of high accuracy is made possible by corresponding standard facial by–
I focus on a few widely recognizable stable emotion categories in corresponding standard facial expressions. Yet, as previous work has highlighted, facial expressions don’t necessarily correspond to actual inner emotions and experiences of emotion cannot be fully captured by decontextualized categories. The high
Accuracy is thereby enabled by the production of ignorance of the messiness of emotions. Next slide. The predictability that enables calculated high accuracy is also not just out there but made. For example, for rules culture material constraints. They can stabilize patterns that algorithms recognize as correlations. Algorithms, when deployed, can also co-produce predictability
By intervening in the world and influencing different actors. For example, this means emotional expressions are somewhat predictable also because they are learned as part of membership in a culture. Testing produces Ignorance by not revealing how emotions and cultures could be otherwise. Next slide.
It is usually not possible to measure constructs directly so proxies are used instead if a proxy is accepted depends on whether it is seen as similar enough to a construct. For example, a consistent classification scheme mapping facial expressions to several widely recognized emotions can be created with high accuracy
So researchers and technologists advocating for emotion recognition can by convincing others that a proxy is equal to the constructs create the appearance of highly accurate emotion recognition. In the paper, I describe several rhetorical moves used to do this. Like Folk believed that emotions can be read from faces.
And pointing to macro expressions only visible to algorithms and therefore difficult to challenge. Next slide. The optimization logic in machine learning entails that during training, majority perspectives are learned since they maximize overall accuracy. In turn, minoritized perspectives or test cases have only little impact
On accuracy numbers and are often neglected in favor of majorities. This is an intended behavior to make algorithms work for majorities while less visible minoritized perspectives can be ignored. For example, this means that minoritized expressions of emotion in test data are ignored while overall accuracy is seemingly not much affected
Because this only includes a few test cases. Next slide, please. In the paper, I discuss several recommendations for how to deal with issues in testing. I argue practitioners should focus on careful naming. For example, by renaming the accuracy symetric test correlation to highlight it does not represent some Universal notion of accuracy
But actually a correlation based on a constructed test. I argue to develop different conceptions of accuracy that are more participatory, justice-focused, challenge power, humble and seamful. This would require embracing feminist sensibility sensibilities such as situatedness, local knowledges, and also encourage more quality and deeper data and research.
I also discuss how optimization algorithms often produce singular results thereby reproducing one perspective and argue instead that plurality and activity should be explored more, similar to simulation. Finally, I also argue for more social studies of accuracy that seek to unpack what understandings of accuracy are held
By whom and how they are co-produced and stabilized and what their politics are. Finally, I want to end this talk with a short reflection on current regulatory trajectories. The EU AI Act, for example, proposes a technocratic agency that tests algorithms and AI for safety
But this problematically depoliticizes testing and puts it in the hand of agencies already accused of being captured by interests of big tech companies. I think more testing can improve this current situation but it also comes with various challenges like, who decides what testing is needed and when it is enough?
How can misrepresentation be recognized? How can testing be done to enable more democratic oversight? Beyond these approaches situating problems within the technology and its uses more structural changes are also important as it’s unlikely that the market or the scientific process by itself will fix these issues. Current incentive structures and dependencies make
It difficult for practitioners to not participate in the hype. Speaking up can result in stigmatization and exclusion as people are framed as naysayers. The high overlap and interconnection of Industry and Academia make machine learning unique in contrast to other areas that face similar issues around corporate pressure like environmental science.
In turn, it is important to create more opportunities for independent multidisciplinary work by introducing taxation for tech companies, improving labor standards, and supporting whistleblowers that point to these issues. Next slide, yeah. Next slide, please. Yeah, okay. Yeah. Thanks for your time. Pre-print is available in this oral and yeah there’s currently
A labor strike going on at the University of Michigan of graduate students instructors and I want to express my solidarity to that. Thank you for your time. [Matthew Jones] Okay. Thank you very much for that. Next up we’re really lucky to have Kadija Ferryman and Odia Kane, from Hopkins,
Talking to us in a really amazing paper on interrogating algorithmic accounts in medical AI. [Kadija Ferryman] Great. Thank you so much. Can you all hear me? Okay and I just want to check that the– Yes, it works. Okay, great.
So I want to just thank the– we want to just thank the organizers for um having us today, we’re really excited to be here and we just want to say that being here has special resonance for both um Odia and myself. Odia grew up in New Haven so this is you know,
Special for her to be here I also went to Yale as an undergrad. So it’s nice for this is kind of the– I’ve been back, obviously, or not obviously but I’ve been back since graduation but this is my first time presenting here as a scholar
Rather than a student and also I worked here at the Yale Law School as an undergraduate I was a circulation assistant at the Law Library so it’s nice to be here in in this capacity today. So one thing we don’t sort of have in our slide but I just wanted to make
A note of is the acknowledge the lands of the indigenous– the various indigenous tribes of Connecticut there are a number including the Mohegan, Pequot, Niantic, Quinnipiac and again for me, as a student, these were names that I had seen but did not really sort of know the
History so it’s really important I think to just um make those acknowledgements before we begin. Okay so just a couple of disclosures before we start. So for me I serve on the um National Review Board for the All Of Us research program
And I’m also a member of the Digital Ethics Advisory Panel for Merck Germany and Odia has no disclosures to disclose. All right and so we are going to focus on– the focus of our work is Health Equity of our research is Health Equity in medical
AI with a specific focus on medical device regulation by the Food and Drug Administration and we’ll talk a little bit more about that but if you didn’t know that the Food and Drug Administration is the federal agency that’s responsible for regulating
Software that’s used as– software that’s used in medicine and including medical AI and today what we want to sort of the some of the threads of that work that we want to bring together are sort of a focus on thinking about Health Equity alongside algorithmic bias,
Oops, algorithmic bias and discrimination in medical AI and we hope that some of those threads will bring those together in the presentation and that they’ll come out in the discussion. So just a little overview of the research study that we conducted that our paper draws on it’s
The advancing Health Equity and AI and machine learning health regulation and policy study. This was a 10-month exploratory study funded by the Pew Charitable Trust. Thank you, Pew. And it included a couple of aims so we did a content analysis of publicly available FDA clearance and approval documents of medical AI
We also did a national landscape analysis of sort of organizations working at the intersection of Health Equity and AI and we also did uh qualitative interviews with key stakeholders again, at this intersection of Health Equity and AI policy such as National minority health organizations, National medical organizations,
Device manufacturers, and scholars who research at at this intersection. So before we kind of delve in, I just want to give a brief overview of the way that the FDA has regulated medical AI, so the history of this sort of– and this is not the the kind of full history,
Right– these are some key moments so we can start– one place to start um even though we can see sort of threads of this earlier but one place to start that we’ll start for today is in 2014 the international Medical Device Regulators Forum issued
Guidance on um how to sort of think about software as a medical device. So as you can imagine, right, and, as many you know, people in this room know, software has been around for decades and there was this move in 2014 to make a distinction
Between software that was part of medical devices, right, that were in, you know, software that was used in Hardware devices to Software that was independently acting as a medical device, right. So in 2014, there was this explanation by this International body to say this is how
We define an independent software that’s acting as a medical device not software in a medical device and then, in 2017, the FDA adopted this International sort of definition as their own as well and also announced this pre-certification program where
This was sort of one major step in thinking about how to regulate these software, these independent software medical devices, and it was this experiment essentially to think about regulating the companies rather than the devices themselves and this was a pretty significant divergence from the way the FDA
Had been regulating medical devices before, right thinking about regulating the companies to actually thinking about moving from regulating the devices to regulating the companies. And then, in the last couple of years, the agency has released several policy guidance
Documents and in relation to, you know, what we heard just now there were you know some cases very recently the University of Michigan, actually, researchers there showed that an algorithm that had been approved and had a certain level of accuracy that once it was
Actually being implemented clinically its accuracy level sort of went down and so just recently, in 20 September of 2022, the agency sort of updated– issued updated guidance on clinical decision support software and how to think about those. Okay. So now I’m going to turn it over to Odia. [Odia Kane] Thank you.
So just to level set further in terms of talking about policy at the FDA’s level it’s important to consider what the FDA does not regulate for the context of this conversation. So the first is this concept of “Homegrown” AI/ML devices so these would be tools that are developed
Within a hospital within a clinical setting to do set of tasks, whatever those tasks may be but they are insulated to that institution or set of Institutions within a partnership. The second would be software technologies that have an administrative function so these would be techn these would be algorithms that are
Used for putting calendars together, for example, or some other metric that might be used if like speech to text, for example and then the third one would be software technologies that provide lists of recommendations to clinicians but don’t diagnose in any particular way
So this would be a patient that would come in with flu-like symptoms and then there would be an output that might suggest what kinds of treatment options would be best for that patient, so with that in the background, we’re talking about a specific set of medical AI that tend to have more
Sophisticated, more complicated, and often riskier implications when they’re used in healthcare. So I just want to take some time to focus a little bit on our Aim 1 which was a content analysis of these FDA summaries and to use this we used a medical AI database that had
One hundred forty-one FDA-approved devices that were recorded in there and the focus of our research, as Kadija mentioned earlier, is on Health Equity so we wanted to focus on the devices that collected and reported on demographic information that they took on the people using their sample and, of the one hundred forty-one devices,
We saw initially that only sixteen of them even collected and reported on demographic information, so we don’t know the demographic makeup per se of these sixteen different devices but we do know that they’ve reported on it well the other one hundred thirty almost did not. and within that you’ll see that there’s huge
Variability in the figure in the top my left, your right where you can see breaking down by demographic types what were included and as you can see here, gender data is most consistently included within different devices and it’s not universally as well as race data is way less
Frequently found compared to data related to age. So we saw a ton of variability in the demographic information that was collected but also in the way that the performance testing was conducted. So some of them had multi-site phasing, some had different phases where they would go and have a
More iterative process to testing the performance of their algorithms, others had simulated users, some used actual patients and participants to do their research to see how well their algorithms were working, and those weren’t consistent at between two different devices
So then moving on to the main points of the paper that we submitted to the conference We talk about these three key concepts that we want to break down and apologies if this is repetitive based on your background. This is for audience members who might have diverse introductions to computer science,
And Kadija and I are not computer scientist, so we were introduced to these concepts through this research. So the first is Ground Truth which was referred to a little bit in the the past panel as well which is this baseline or reference to which results from experiments and tests are compared.
So this is, can your algorithm do what you say it’s going to do? And, as I just mentioned, there’s a lot of variability in what that even means and where this information is coming from and briefly we’ll give an example of a Ground Truth. The second is Epistemic Authority.
So tthese are the ways in which and apply to artificial intelligence. The ways in which AI tools are established as bearers of knowledge in a particular domain. Many times, the AI that we focus on in this paper they have a specific task
That they’re trying to complete so, how do we know that we can trust this output and giving the AI the authority to actually be entrusted with their conclusions? And then this third notion of accountability from Neyland, a paper 2016 that was cited,
And accountability having these dual meanings which is, first, being open to review of something and the second is to be able to give an account or an explanation of something an applying these standards to how we view Medical AI. Passing over back to Kadija. [Kadija Ferryman] Thanks. So, we wanted to
So, again, just to sort of reframe our paper’s points we the what we’re trying to sort of think about analytically is the way that Ground Truth is represented in these review documents submitted to the FDA, right, and in all of the documents that we reviewed there is a performance testing section, right,
So this is one of the things that medical AI developers in order to, you know, have their device approved or cleared by the FDA, they have to show evidence of some kind of performance testing, right. They have to show that their medical AI does what it says it does
And, in that section of performance testing, there is a description of the ground truth that was used to test the performance of that medical AI and so this is where we are really trying to focus our argument for this particular paper and our work going forward is sort of
How that ground truth establishes the epistemic authority of the medical AI within this context of FDA approval or clearance, right, as well as, how that ground truth in addition to sort of establishing the Epistemic Authority also provides an account of how the algorithm is sort of seeing the world?
So, just very quickly, the example that we use the one example that we focus on in the paper is a triage medical device and it alerts clinicians, it reads images and it alerts clinicians about an abnormality in a set of images and triages those images for clinicians to review
And when we looked at the Ground Truth and the performance testing for that device the Ground Truth was established by three radiologists, right, and the accuracy of the algorithm was based on the agreement of two out of those three radiologists, right, and so we just wanted to sort of complicate and
Bring up that notion of sort of like who is establishing the Ground Truth, right, and what does that, what does that say about expertise, and the value of expertise in medical Ai? and the other sort of issue as well is that this was just one
Example of a kind of ground truth and then, just very quickly, you know, oops sorry, very quickly, two of the points that we came away with is that, you know, medical AI developers establish Ground Truth in different ways, it’s critical to establishing the epistemic authority and that this
Issue of sort of the variability of Ground Truth we’re thinking about it as an opportunity and a challenge for the FDA’s governance in this area, right, and so we think about how aspects of ground truthing in this domain should be more
Transparent and subject to review and that, you know, there should be an examination of standards of the reference databases and things like that that are used for the literature. And then we just have a couple of questions but I think we’re
Out of time and they’re in the paper as well so I think we’ll just end there. So thanks. -[Matthew Jones] Okay, I believe our next paper is on Zoom, is that– good. We so the next paper, while we’re getting it up, is from Kushang Mishra
Of IIIT Bangalore and Bidisha Choudhary also of IIIT Bangalore and the subject is “Data-driven ‘precision’ vs Farmer’s guesswork: How Data is (Re)Making Agriculture in India.” [Kushang Mishra] Hello, can you hear me? [Matthew Jones] Yes. Welcome, [Kushang Mishra] Yeah. Thank you so much for the introduction, Matthew.
So, my name is Kushang Mishra, I’m a research associate at IIIT Bangalore and I wrote this paper as part of the Humanizing Automation Project at IIIT Bangalore along with my supervisor Professor Bidisha Chaudhuri where essentially in this project we are
Looking at the impact of automation AI and machine learning specifically in the agriculture sector. Yes, so, I will start by giving a brief background of this paper. In the introduction and then I will provide the state of the art on the current literature on this subject
And then I will talk about the methods we used followed by our findings and, finally, analysis and conclusion. As you can see in the slide, there is a sorting and grading machine over here which uses AI in machine learning to sort and grade onions into various qualities
So it is going up the conveyor belt and it is being sorted into various categories over here. Yeah, so, in the next slide, as you can see, data-driven precision farming is essentially the introduction of AI and machine learning in farming and it has
Been adopted in the countries in the Global North and it is being pushed by both the state as well as Silicon Valley companies and private sector as well. Obviously, it portrays a very revolutionary technology and it portrays it especially in the context of climate change
Which the narrative is that it is hampering our ability to grow food. The population is rising, the input prices for the farmers are rising, and so, we need the data-driven precision, we need data from the soil, the climate, the water to essentially, you know, make precise and data-driven decisions
And these are pushed– proposed in opposite to the traditional knowledge and, you know, skills of the farmers which are no longer considered in this narrative relevant enough now. As you can see in this image of the farmer which looks at how, you know, farming is envisioned how uh the farm of–
Farming is the farmer is confused but the future of farming is informed by data Insights and how the farmer is able to now make better decisions because of data. So this is the kind of narrative which is being pushed by Agri-tech
Companies as well as by States in many of the Global North countries. Yeah, so, the existing literature on the impact of digital agriculture essentially which critiques this essentially from the STS and it basically critiques this narrative of Revolution which devalues the farmer and put them in the hands of the private companies
And the critique is that it doesn’t bring any systematic transformation but is merely a technological fix, highly-complex social and economic problem in the farming. There is also critique of the datafication of the farm and how it essentially hides the human infrastructure which essentially ensures that all the operation runs smoothly.
Yeah, a sec, yeah. Now the problem with the literature right now is that these are focused on the Global North. These technologies are now also– these technologies are now also coming in countries in the Global South like India and these countries have their own colonial and postcolonial context.
Now, as you can see in this image, you know, before that in the colonial times, the narrative which was pushed by the British was that, you know, the, you know, essentially, that we need superior scientific knowledge over the ignorance and inertia of India’s agricultural classes
And then how we, and how, you know, science can essentially increase the yield for the Empire to grow a similar narrative post-independence was pushed when, you know, in the post colonial times by the American state during the Green Revolution foundations like Fort Foundation played a crucial role and then US
Universities played a crucial role in setting up an infrastructure of agriculture in the country, which essentially pushed for scientific knowledge and technology over the local knowledge and skills of the farmers and the local agricultural practices. As you can see over here, this is Frank Schuman from University of Illinois,
An agronomist who essentially is gendered by this phrase called “Nitrogen Zindabad,” which essentially means Long Live Nitrogen and you can see what kind of narrative that they were trying to push in that sense. so uh this is happening now over the past 10 years in a context
In which the Agri-tech sector in India is booming. It has already crossed a billion dollars in 2021 and even the Indian government is pushing for its promotion through projects like Agristack which, essentially, is collecting and creating databases of farmers with the help of companies like Microsoft and the government will give a unique
ID to a farmer which will allow direct benefit transfers and yield and and help them in yield forecasting and the idea is to increase farmers’ income through digital technologies like that. And so, in this context, the objective of this paper is firstly to question this narrative of preciseness, right,
That data-driven technologies are essentially precise. vis-à-vis the quote-unquote “guesswork” of farmers, the traditional skills of farmers and how this narrative is not something which is new but was part of a colonial and postcolonial history of India and so, you know, we want to push this narrative and placing it in this historical context.
So I won’t go much into the methodology but what our aim was to understand, look at both the context in which the technology is developed so we interviewed people from various companies and startups Agritech startups as well as we did a participat observation study with,
In the context in which the technology is deployed which is the sorting and reading machine in a village in the state of Maharashtra, in India, as well as we interviewed several farmers to understand what they think of these technologies and how they’re using it.
So this was the narrative which is pushed by these Agritech companies, right, that this is essentially these decisions cannot be left to an element of guesswork, we need data-driven precision because in the context in which this is happening is something which is not conducive for the so-called guesswork decision-making by farmers
And so one of the major findings from this the study was the fact that precision farming is neither precise nor is it– precision even useful for those who use it essentially. So basically, the irony is that, you know, is that the Agri-tech companies build their models on the feedback of the same farmers
Whose skills are deemed as “guesswork” but the farmers still feel that their voice is is not heard as you can see like the idea is that they need the feedback to build these models but even then the farmers feel that their voice is not being heard. Sometimes, farmers do not actually, you know,
Find these machines to be precise enough for them these sensors precise enough for them. The farmers would rather listen to their fellow farmers or the rich dominant farmer who successful in their own village rather than use the predictions made by these sensors as you can see, you know,
These sensors don’t essentially tell these farmers what they actually want and then, you know, in fact, they rely on their own knowledge. They find their own knowledge to be more precise in certain context. For instance, in the northern part of the country there’s this Festival of Chhath
Puja which is celebrated with groups where grapes is essentially consumed. Now, while talking to grape farmers in nask, what I found was that in order to cater to the demand of this Festival the farmers can’t follow the normal cycle of harvest and pruning and
Because this Festival comes in October and the normal season is in April and March but the sensors are installed in the farms are, you know, developed based on the normal season of harvesting and pruning and so the recommendations which these sensors and these which they give are
Not suitable if the farmers want to target the October cycle of grapes and here you can see that while the farmers and the work that they do is portrayed as something which is not precise they have a precise schedule of what kind of pesticide they want to use.
I saw that they maintain a diary in which they have schedule of how much pesticide on each day they will use so it is not that what they do is not precise and it’s mere “guesswork” as the Agri-tech companies would like us to believe.
Another problem is that the Agri-tech companies whose product we saw initially in the slide claims that accuracy is needed for targeting export markets so they say that we need machines which can provide more accurate sorting and gradings that we can sort more accurate more more better quality onions for the export market
And this is something which the human sorting and grading workers cannot do as you can see how women are sorting and grading onions over here but if you ask the vendors to whom they sell these machines, you know,
These companies are targeting they will they don’t even want that accuracy because it will lead to more wastage uh they the machine will sort out onions which can potentially be sorted and graded for domestic markets so they actually do not even want that kind of an accuracy which the machines
Can provide and they actually want little less accuracy but certainty in terms of less wastage so even here the the so-called “accuracy” in procession is not even needed even if it’s there and so, so… Sorry, yeah. Finally, what I’d like to say is that the binary created here which between,
You know, data-driven decision farming data driven farming versus the guesswork of so these farmers here devalues the local embedded knowledge which these farmers have which these these machines are not able to essentially you know capture and work with even when the Agri company seek the feedback they want it to better their
Model and not necessarily cater to the concerns of the farmers so in the end we need to probably reimagine how we create these data-driven technologies historically from the colonial to the postcolonial times science and technology has deemed, you know, has been deemed superior to the local knowledge and the local context.
Perhaps, it is time that we change that approach. Thank you so much. [Matthew Jones] Okay, thank you very much. I really I really appreciated the illustrations that really brought home the onions and the farmers at the heart of the story. So our final paper comes from Alex Campolo who’s at Durham and
And Katia Schwerzmann at Ruhr-Universität Bochum. From is to, oh no, “From Rules to Examples: Machine Learning’s Type of Authority.” Take it away, my friends. [Alexander Campolo] So, yeah, many many thanks to everybody for including us in the conference, for coming, and also for reading the paper, so, yeah. I’m Alex Campolo,
I’m postdoc on the algorithmic societies project at Durham, and I just want to acknowledge the European Research Council which has supported our stuff and, okay, yeah, and I’ll just say our plan here is to rehearse just the basics of this argument and kind of
And, at the end, do a little bit of reflection on the problem of machine learning in relation to the type of social theory that we’re kind of aspiring to do. And, just for context, what you’ve read here is an article that’s a under review from a
Journal and this is this is we’re sort of giving a response to our reviewers in this stage but like your your feedback will also– can potentially also affect the finished version. So our paper just began by seeing statements like this appear in the machine learning literature
And we were really intrigued by the way that the community seems to have conceptualized it kind of or at least, in this case, a very neat and symmetric reversal from programming with rules to training by examples and we began with a kind of hypothesis or even just an intuition that this
Perceived change could shed light on deeper questions involving how concepts of rules, examples, algorithms, etc have changed over time. These in turns I think opened onto a whole range of historical, sociological, and ethical issues.
The one we chose to pursue was to ask whether and if so how machine learning might entail a specific type of regulation of our conducts governed by operations like classification and prediction and whether there might be something happen different happening here than the kind
Of widely known ideas like calculation, quantification, rationalization that have been used in social theory to describe you know kind of modernity at large. So this led us to look for theoretical frameworks that could help us make sense of this supposed transition from a kind of rules
To examples-based form of knowledge and there are tons of candidates. We took a lot of inspiration from Lorraine Daston’s recent book on rules but as we proceeded we were increasingly drawn to a very old set of ideas from sociologist Max Weber, in particular, his his idea of a rational type of authority
Gave us a good account of what connects rules and authority, and, when we use the term authority in this paper, we use it in this kind of sociological sense, right, refers to not the raw exercise of power or violence but rather specific reasons
That people follow commands or allow themselves to be governed in certain ways the legitimacy of rules and with machine learning this legitimacy you know it could refer to why we accept to be classified or interact with predictive systems in certain ways. So our understanding of programming rules and examples in machine learning
As comparative we wanted to show how a certain understanding of rules emerge in connection with digital computing and we think this is kind of well known and established but it’s important to rehearse because it gives us a concrete point to try to discern what might
Be more novel in a type of authority based on examples en acting in machine learning. So of course as Daston’s work shows, the idea of a rule covers a huge amount of historical diversity, and in the past the ideas of rules and examples very often worked in tandem actually,
You know, examples illustrate rules but for our purpose we focus on this set of Weberian ideas regarding rational rules because we think they formed a kernel of a type of authority that was intensified in programming logics which Daston refers to as sort of
“thin algorithmic rules” that can later be separated analytically from examples. [Katia Schwerzmann] Foundational work in computer science in the middle of the 20th century linked the technical characteristics of digital programming with questions of rules and authority. This is, for instance, the case of Alan Turing’s account of programming rules in
His classic paper “Computing Machinery and Intelligence.” In the paper, Turing described computer rules and are functioning using the language of authority. The definition of programming rules he offers echoes many of the characteristics Weber attributes to rules however the technical features of the Universal Turing machine its discrete
Character and its infinite storage capacity induce important changes in the nature of rules. Rules become analytic. They are as numerous as necessary to break down complex behaviors into unambiguous steps. Their extreme specificity and the discontinuity between the computer states allow for a total control over the movement from input to output
And that is what leads to what we had in the earlier panel this idea of determinism that is linked to the rule-based programming paradigm. In our second genealogy, which is this time the genealogy of examples, this genealogy reconstructs the history of the concept of example traditionally
An example is a concrete singularity which comprises the essential features of a type. So in a first metaphysical or even the theological sense an example is a concrete individual that embodies a moral ideal like, for instance, Jesus or a saint those are examples that Kant mentions in a scientific sense.
An example is a specimen that expresses all the essential features of a species or a type so that it is representative of the type and so example can be contrasted to something like single instances or token of a type. While rules tell or prescribe explicitly how something should be or someone should behave
Examples show or reveal norms leaving space for interpretation and implicit knowledge. However, this is not to say that examples have a lesser prescriptive force than rules. The difference is maybe that the subject is left with the difficult task to interpret these norm and the relationship between examples and the norm.
As for the rational type of authority, machine learnings exemplar what we called “exemplary type of authority” must appear legitimate to exert its influence, and the legitimacy of this what this new exemplary type of authority is based on what we call an “artificial naturalism,”
Ah, oh my God, I’m sorry, I– let’s, let’s, roll back a little bit, okay? So, I’m coming now to the transformation induced by machine learning in the understanding of example. So machine learning examples are not singularities representative of types anymore. Instead, and it was not easy for us to define what becomes examples
In machine learning we try to define them that way they are complex assemblage by which data is aggregated, pluralized through scale, formatted and processed through feature engineering and model, so that norms also called representations can emerge from them. The representation produced by machine learning become normative in a dual sense.
First, they affect how models will classify new instances by making generalization possible and second, in a more traditional sense, these norms influence our own behaviors by making possible prediction and classification. So, to summarize, while examples in the traditional sense induce obedience through their perfection and their reference to a type, in machine learning,
Examples induce obedience through the norms they inductively elicit in a what we have come to define as in a naturalistic way and that’s I’m coming back to now, the way machine learning exemplary type of authority legitimate itself and it legitimate itself through a kind of artificial naturalism.
The expression “artificial naturalism” can seem contradictory but it renders the messy tension that characterizes machine learning between a desire to let data speak for itself through models and the engineering practices that permit it to do so. And we think it’s a “naturalism” in the sense that first it presupposes
A world determined by deep statistical structures that are inaccessible to human perception and only accessible through representations produced by models. These representations should increasingly be learned from data itself rather than from human specified interventions. The motto being “scale is all you need.”
The goal is to discover regularities in the data that function as norms capable of accurately classifying new data in new contexts so, unlike rules which are transcendent to the order to which they apply in machine learning,
Norms and examples are imminent to one another so that they seem to be one of the same kind. Thus, the representations only seem to express the regularities found in the data. -[Alexander Campolo] Okay, and yeah. We’ll just end here by sort of reflecting a little bit on some issues that we were kind
Of thinking about and struggling with while writing this paper. and, yeah, so first it’s it’s probably obvious but it’s worth saying explicitly that we’re kind of attempting to link developments in Science and Technology with concepts from social seriously and
We take very seriously that this is challenging to do in a in a kind of convincing way, and we were thinking that like one of the implications of our argument goes against the idea that we must build kind of analytical walls to distinguish technical and colloquial
Or philosophical senses of notions like bias training examples prediction and classification and I think we were thinking that one implication of our argument commits us to a position that it’s in fact neither desirable or, perhaps, not even possible, to separate these technical senses from the kind of sedimented ethical and political ramifications
And that technical transformations add new layers to these. We hope that this kind of conceptual historical approach can kind of navigate between a technological determinism and a more you know sort of idealistic position although I think we have to be totally frank that we see
Our work probably more on the side of the latter. A second point just concerns the scale or scope of the paper which is big and kind of like epical so this means that we had to be very selective so we chose to analyze
A very few tendencies or phenomena that we hope can give a kind of analytical grip on our problems but we also recognize that there are times when the paper goes into some technical detail and that getting these details right doesn’t just matter to us for,
You know, our own integrity but they’re also related to the theoretical problems and then just a final point concerns the question of like, what kind of critique are we making? so I think it’s worth saying explicitly that we think there are probably some
Differences between our work and a lot of other very important existing scholarship on AI ethics where a kind of existing framework is used to identify and hopefully mitigate some concrete harms caused by machine learning. This work is super important but we just thought that it might be worth saying that
We see ourselves doing something a little bit different which has to do with the way that machine learning makes possible different forms of ethical relationships in the sense of different ways of conducting ourselves and producing or legitimating obedience and I
And I think it, you know, people could criticize us for saying that we’re kind of relativists right that we suspend in a certain case we suspend judgment on some of the statements from the machine learning community and instead work in a kind of more analytical mode to try to describe
Possible contours and effects of this type of authority rather than advocate for or against it. So, yeah, we’ll just leave it there and we’re we’re sort of welcome to your comments to hear what you think. Thanks. [Matthew Jones] Okay. Thank you very much.
And that the last point is I think a great jumping off. Please, so, our our papers clearly– well, thank you to all four groups. And I think one of the richnesses of this panel is it’s actually framed by two pieces which offer a kind of a historical emplotment a meta-analysis
Of the development of machine learning in two really important and very pressing case studies and I think what’s characteristic of the papers that we’ve just seen is that they all take machine learning activity seriously without taking it literally.
What I mean by that is they all are interested in investigating how in fact machine learning operates while simultaneously underscoring the gulf with epistemic marketing clinical claims and activity and I think for our purposes and thinking through what are we doing as the different
Communities working in this it shows that we both need as it were careful analysis of machine learning and its limits and the powers of data trained on it and distinct accounts of the hype and claims and institutions built on them that are often not authorized at all by that machine.
In fact, are often in tension with what are the some of the epistemic building blocks of machine learning so a way to ask about these papers and I think probably quite a few other papers is to say to look at a difference between the sets of inquiries that take the form of
To what extent do we need to nerd harder do we need to nerd better that is not hype inappropriately or use better ground truth data and that’s slightly different than the sort of fairness discourse as it exists but it has a technical side all of these papers. I just lost my notes.
The machine learning doesn’t want me to nerd harder. The other kind of critique and all of these papers are involved in it too is we need not to nerd this way at all, right. This is just wrong now what’s fascinating about all the papers is,
And I don’t mean this in a critical way, is that they both all are involved at both of those operations and in different ways and recognize the ethical political urgency. and one of the reasons I think machine learning as it actually is as opposed to a lot of the glosses
On it as AI that would make it just absolutely continuous with other 20th century sort of things and the breaks that all of our speakers are noting is that machine learning is very much premised on a lot of the epistemic critiques that humanists would have made of strong AI and other sorts of
Things as being empirically unsound as projecting a particular vision of reasoning onto all of humanity of being incapable of understanding diverse perspectives of being limited. and yet, for all of that which is constitutive of machine learning, machine learning has even a greater hype machine that’s of greater political importance and I so
I think all of these papers charge us to think both about machine learning in practice and what that means and different layers of what I’ll gloss as kind of hype and thinking now they differ in a way and I can’t get at the texture of all of them,
So I’m gonna sort of two big questions that I’m interested and I think they’re useful for these papers and perhaps for some others is the first to ask the authors, plural: To what extent do you see your work as doing two different forms of critique?
I gloss them as nerding in different ways one is it’s a critique that is internal to machine learning and resolution and is open to bettering the very limited form of knowledge or that machine learning claims in a technical sense through say better more
Representative data and ground truths and we saw that that figures very prominently in the paper on medical AI as a really important one facet of them, the other is better-delimited claims that we saw in in Grill’s paper about not even using accuracy
Maybe as even rhetorically but understanding in fact what it is that machine learning claims we saw that again with the paper on how to accurately talk about compass data, right, in the last one. So those are that that is an internal critique and that has been part of the
Way fairness has been instrumentalized in CS but I think it can go beyond that. The second is the external need for radically different forms of knowledge making and institution building and this could range all the way from “that metric is wrong” to “metrics are exactly the wrong” to of more concrete
Answers like “these technological schemes obfuscate the real sources of political inequality” and we see invocations in our authors while also dealing with this other and then in the last paper questions about the ethical self-making involved with these forms of knowledge.
Now what I think is interesting is that all the papers work at different levels between both of these ways and there’s an ethical urgency and an epistemic urgency and there’s a I think a care with not flattening machine learning into something that it’s not what that you read the
Hype you might think machine learning is something very different from what it is. And I don’t know whether this is fair but in your paper you recommend, you know, you talk about remaining accuracy and Ferryman and Kane’s paper about discussions of ground in Campolo and
Schwerzmann you have this bad they didn’t really talk about this they just this great discussion in their paper about machine learning and uh and the use of 23andMe data to draw sort of inferences whereas the Mishra paper is much more skeptical of data in general, right,
It has less of the it it doesn’t say propose that actually we should look at on we can build better systems for onions now that leads me to ask to what extent in this conversation does a techno solutionism defang some of your critiques? What
Ways do your papers contribute to better technical social making and what ways do they reject those? How to think about those levels of your project? Because I find they’re so rich in their articulation. A second set of questions I think all of them say well there’s also a danger of worrying too
Much about technical solutionism and in the case particularly the medical AI paper really brings out that many of the problems with the medical AI are because they automate pre-existing fundamental algorithmic issues of who has clinical authority and that what
Who that clinical authority applies to and so the problem is that the subjectivity of the ground truth when automated amplifies and accelerates but the ground of critique is at machine learning but cannot be just machine learning because it’s far more fundamental. Okay, I’m not going to go I I’ve already said way
Too much I’m just going to get back to where I began. I think the richness of these papers is again looking concretely at what machine learning is in these contexts and then taking very seriously the inferences the
Hype that builds around them and I’ll just list a few and I won’t go into detail but one is a gap between the modesty that in some sense the epistemic modesty that’s at the heart of machine learning as opposed to Old AI versus the over promising world of
You know Liberation through data, better farming through data, a subjectivity gap in which again machine learning is premised on using human subjectivity in many cases as its ground truth but even in things without ground truth like training a a large language model
That is human data right it’s precisely it’s human-based data and the last paper really helps us see that this is these are connected to other gaps an inference gap about the gulf between how inferences are presented and what in fact is authorized
A novelty gap and that was very clear in the paper on farming practices where the continuities with long-term transformations the Green Revolution are very clear but claims of radical novelty are both false and enormously important in the stories we tell.
Above all there’s a kind of hype gap, right, all of these are kind of a hype gap but we can’t just in our operations just strip away the hype and get at the real systems because hype is part of the real systems and those are
Different levels of analysis that I think are really that I I commend and I really learn so much from our speakers in helping me through think through those different levels. Okay, I will end there and if there’s anything you want to respond to that to those questions, go ahead,
And I’m sure our audience is extremely good with questions so we can turn things to you soon. [Kadija Ferryman] I have uh thank you so much for that bringing all the connections together and you know you really brought up one of the kind of concerns or fears or something
That we had as we were you know writing this paper which is of course in you know draft form and so there’s more we have more to think through but one of the things we wanted to that we didn’t want our paper to come off saying is that a better that we want
A better ground truth that we want to tell the FDA “hey what you have for your ground truth here is bad because it’s only based on three radiologists so what you need to do is just make sure you get 10,000 radiologists or 40,000 radiologists” right
Like that is actually not what we are thinking about as the solution right because you know for me as an anthropologist right like I believe that there is truth in one account like in a single account there is truth and that you know within a single
Account is the truth of the whole world right so it’s not necessarily about having more numbers or a higher quantity of accounts to make a better ground truth so I’m really glad that you sort of brought that up but it
I think for us what we are thinking about is what well you know the kind of first step is just making going back to the idea of sort of hype is just making sort of bringing this up and sort
Of exposing this right so saying that this is what this is how this process is working right like right now this is how these developers and agencies and this is what the FDA is accepting as ground truth and to approve or clear these devices right and we think there is we think
There’s value in just sort of doing that and so it’s to the next step of sort of like well what do we do next or what are our recommendations or what do we you know where we want to go from there
I think the next thing that we’re sort of considering is thinking about how what that ground truth how that ground truth is operating in different ways so one it’s establishing this epistemic authority of the tool itself but it’s also establishing the epistemic authority of
Yes these three radiologists in the case of AIMI Triage and what is that doing what does that do when the ground truth for a device is constructed as the knowledge of three US radiologists right what does that say about how we are thinking of the basis of medical evidence in you know
Today right that that medical you know evidence for this particular tool can be based on three US-based radiologists and it’s sort of like what kinds of images and longer histories of expertise and knowledge generation and where knowledge comes from and where you know privilege to make these
Kind of pronouncements right like how is just you know saying that this is an acceptable ground truth enacting so many patterns from the past and pushing those things forward into the future so I think there’s part of the sort of like thinking about what that account
As ground truth is actually doing in the world is as the second part of what we think why we think it’s important to sort of bring this up so again not necessarily to say “hey let’s you know have more ground truthers” right
But and this is the last part I’ll you know the last thing I say is that but we also do right we neglected to say in our intro we are both situated at the Berman Institute of Bioethics
And the Health Policy and Management Department at John Hopkins right so we actually do have a great opportunity to be involved with policy makers right so with agencies who will come to us and say “great you found these limitations with our process, what should we do now to make this better?”
So there’s also that process of like you know wanting to think about how to actually operationalize some of these recommendations that we might have for a distinct policy space so [Katia Schwerzmann] I just want to add something to what Kadija just said I
Think to insist that a single account can be true is really important and I because what we show in our paper is that asking for more data is actually contributed to the naturalistic claim. The idea that the map could cover the whole territory so that’s just
By adding data we would come closer and closer to the truth so that is a claim that’s I think on the basis of our paper we would question [Gabriel Grill] Yeah, yeah. Thanks for the summarization and questions, yeah. One so I think you pointed two important critiques here about
Yeah improvement and what– like again the limiting claims but part of the so I hope this comes out in a newer version of the paper better but part of the project is also like moving away from to some degree from debate that is centered
Around sort of a discourse on rationality in some sense and focus on yeah politics I mean that’s I think the like, I think accuracy is like this gateway towards like ideas around like accuracy itself sounds like again like if you make an accuracy claim like you have this some Universal notion
It comes with these ideas of universality embedded in it to some degree yeah and, yeah I hope with the paper not to make the debate about how do we improve rationality improve accuracy necessarily or some notion of accuracy
But I can highlight how accuracy is deeply political and how we need to or machine learning and like this broad discourse needs to engage more with the politics I mean I think there’s lots of research which which is trying to do that yeah and
And one danger of like engaging like I in the paper like so much with like these technical claims is that it continues I think this this rational discourse instead of like moving to yeah questions around politics which are I think central again to testing
And deciding on whether something counts as accurate or ground truth. [Matthew Jones] Yeah. Do our Zoom wants to comment at all? Yeah do we have any comments? Kushang, did you want to say anything did you want to say anything?
[Kushang Mishra] No I think, I echo what just Gabriel said that maybe we don’t need to just think about you know focused on a technical aspect of it in terms of what will accuracy how can we improve accuracy but essentially talk about whose accuracy are we talking about I mean
Coming from my own example like um the the sorting and grading machines for onions they are accurate for the purpose of exporting those onions to external markets in the west to other countries which can offer that kind of money but then
What the onion vendors the local onion vendors what they want essentially is, “do we even need that accuracy because for us the market is is the domestic market” so are we creating these technologies just to serve a certain quality which can only– serves a certain Market a certain understanding of what that
Who can define what that polity essentially means and so we need to probably think about the largest political political and economic questions as to who are we building these technologies for essentially. Yeah so that’s my– [Matthew Jones] Great. No, I love those answers and it really makes
Me think about the way in which papers in this space generally speaking can be instrumentalized in ways that we find worrisome right that readings that do accord with bits of what we might be doing can be disaggregated from larger other kinds of critiques.
Okay. I think at this point we should open it up to the floor given the quality of the questions that we’ve been having and then to the Zoom. So let’s begin with the floor and then we’ll go to the zoom. [Question from the audience] I’ll introduce myself. I’m David Stark
Of Sociology Columbia these are three– I want to talk comments on each of these three things which it’s a pity we don’t have a half hour for each one of them. The problem of having three great papers and want to make comments is that the comments
You had problems doing it in 12 minutes comments have to be extremely cryptic. Okay so starting with with Gabriel, my question to you is Under what circumstances in the problem of machine learning does efficacy depend on accuracy? Okay so you mentioned you wanted to contribute to the new sociology of testing so Noortje
Marres and I just recently added this special issue of British Journalist Sociology and one of the things we are seeing in the new sociology of testing is in a way a kind of move from thinking about the test results to thinking about the results of the test
And a key paper for us in that volume is this wonderful paper by Joan Robinson on the home pregnancy test and what she does in that paper is think about what is the result of the test so
How is it that the test has results for the social relations of the woman who just got tested? and she has all kinds of examples great great great research behind it so has do the affect the relation to the father to the mother-in-law to the swimming coach and other things by the
Way she has a prior paper about the testing of the medical device at the FDA in which a critical thing was that a judge ruled that pregnancy was not a disease which is important. Okay that moves us to FDA and Kadija and Odia.
So my question to you it’s a kind of thought experiment about a two by two table in which you have accuracy and explainability and both could be either positive or negative so like, what matters if they’re both positive like you have accuracy and explainability like
When does that matter could you get away and have efficacy with neither accuracy nor explainability or under what circumstances could you have like low explainability but high accuracy again or low accuracy but high explainability and like how would that work in different
Kinds of settings and problems in even just for example the medical field. I think there’s something there you had three things I forgot the I remember accurate you call it ground truth and accountability I called it accuracy and explainability but I think there’s really something there.
Okay so for Alex and Katia, super cryp. I love this paper very very much I think it’s so so interesting but to you think about a set of oppositions and I’m going to give you three and then wondering like how does rules
And and examples fit into that or not and just it some spark some thinking so the first risk versus uncertainty so and economic sociology there is this idea of Knightian uncertainty which is not a situation of risk where there is calculability somehow the
Future can be seen in some kind of probabilistic terms and Knight says, the economist Frank Knight, “uncertainty is a situation where all bets are off” like we can’t assign probabilities to the future. Okay but that was just so risk, uncertainty, calculation, judgment
You can kind of see how that sets up there’s another confidence and trust which are not the same I can assign a confidence an interval to something I can be high or low but I but trust is so you see how these kind of line up uncertainty judgment trust conf confidence and
Then rules examples so does it fit in those and how and I love the idea of this indexical the pointing the showing as opposed to the telling and we would have to talk all afternoon about Weber’s three types of legitimacy in your fourth but it’s really great, thank you.
[Katia Schwerzmann] So I want to jump on one of these couple of opposites that to me is really fundamental when we think about machine learning and algorithm it’s the opposition between calculation and judgment and the necessity to reintroduce this difference a judgment is not reducible to a calculation
Because it entail an interpretation and so I think that machine learning may present itself as pure computation but we know that judgment enters in many ways at different moments in the process. I think an issue is in the genealogy that we the current development of machine learning
There is this desire to move away from judgment and to use scaling so data and models and to present it as a way away from judgment as pure computation as if that would be possible and I think it is not and I think it’s a it’s a very problematic
Claim that goes in the direction of this kind of naturalism that we point out yeah. [Alexander Campolo] Maybe I could just have at one thing I think the yeah the calculation of judgment is interesting like I could just speak for myself you know maybe not Katia
But like I was also interest like the Daston work on rules I see is like her project is in you know kind of implicitly is is a desire to reintroduce judgment into rules you know I think like she sort of is not happy with the kind of algorithmic sense of thin algorithmic
Rules that we have here and I guess I guess one possibility maybe that that our paper sort of raises is like we have to think about judgments differently like the the first one of in the someone in the first panel talked about model selection and we talk about
Like all these kind of like normative things that have to happen for data to be turned into examples so maybe we look at judgment you know judgments and calculation not as a binary but as this sort of like messily assembled sort of thing I think another way we could think
About it too is this like predictability too which I think has to do with calculation too so like predictability in the in the kind of programming paradigm had to do with this deterministic relationship between binary output States you know theoretically not
Much can go wrong there whereas here we see like a much more I don’t I don’t even know how to describe it if we do it well but like like ways to associate in inputs and outputs
States in this more kind of like stochastic but still like very powerful way you know I’m still kind of grappling with this sort of like forms of predictability type thing so yeah [Katia Schwerzmann] And just a last remark I think what you say about external and internal
Critique at the beginning was very was one of the difficulty for us because all the time we have to weigh between what the computer scientists tell they are doing what we think they’re doing what the technology is actually doing and all that points in slightly different directions and
So that is one of the difficulty and and it’s very messy I think you use a correct word here [Odia Kane] I’ll take the question that you asked in terms of what matters most accuracy or explainability I mean my reflex says explainability but then I hearkened back to one of
The key questions that we had in our paper towards the end which is “how will the commonplace use of medical AI influence our account of biomedical practices like research and health expertise as well as practices?” and I think that’s something critical that we kind of miss out when we talk about medical
AI and just the tools themselves is when they are are actually used in practice on patients and what that dynamic between a physician or specialist and their patient might be and explainability and communication is critical in that sense so yes patients obviously want accurate results and they
Want accurate answers but above all they want to know what’s going on and there’s one you can say that your algorithm is right but your algorithm explaining why it’s right or more importantly why it might be wrong is really critical in the discussion that we need to have about accounts
Because there is this preference there’s also this reflex when we talk about quantifiable measures to just rely on those numbers and those results and we talked on this panel at length of how this idea of accuracy is murky in general so being able to explain whether it’s the algorithm
Whether it’s the people who work with the algorithm what these outputs are and what they mean especially when we are dealing with patients and there are some algorithms that go as far as to diagnose certain cancers that’s going to be super critical to disentangle. [Gabriel Grill] Yeah, thanks for the question.
Yeah so when I talk about accuracy then I mean a specific expert discourse to some degree which also influences right sort of policy public understanding which sort of like flows into all these different areas and efficacy is something which I understand is situated and in in the paper
I discuss a bit more how I mean I had like one slide rethinking accuracy which was very short a bit more how other fields beyond machine learning have sort of dealt with issues around accuracy and efficacy and how it would be important for machine learning as a discipline to learn from
Those fields so I think there are fields who have figured this out much better than machine learning where we are again in this we just need to look on Twitter where people post accuracy yeah accuracy numbers now with ChatGPT like they post like a bunch of benchmarks and say
Look our model is this great and then people are like questioning that because they’re saying the benchmark so it’s sort of like uh back and forth um yeah so I think there are sort of ways that could be taken to make accuracy more useful and correspond more to like sort of like
In yeah sort of a stronger correspondence to sort of efficacy in some sense but no matter the case or no matter if accuracy works or not but it has this right social effects as you also discuss and yeah is performative and that’s really important to always consider.
[Matthew Jones] Aaron, we are at the end of our time. Do you want to call it now or leave room for another couple minutes? [Aaron Mendon-Plasek] That’s great question. [Matthew Jones] There’s clearly a lot I mean I can talk about each paper for like
Three hours but you don’t need that I have 90 slides, if you have time. [Aaron Mendon-Plasek] Maybe Kushang wants to add anything? Yes and there was one question I could you ask. [Mendon-Plasek] Is it super fast? Yeah I could be really quick. [Mendon-Plasek] Okay.
[Aaron Mendon-Plasek] So Kushang do you want to say anything in response to the previous question before we take the last question? [Kushang Mishra] No, thank you. [Aaron Mendon-Plasek] Just making sure. Okay, please. [Question from the audience] I’ll try to be really brief I have a few questions for several
The panelist so uh my name [__] I am a student at Harvard in STS so that’s where my question– I was just saying I’m a PhD student in STS Science and Technology Studies so that’s sort of where my questions are coming from my question to Kadija and Odia are you know
I was really interested in kind of the way you spoke about epistemic authority and the way it changes what medical evidence is I was wondering what you think the sort of ontological payoffs of that is so now that medical evidence has been reshaped by AI, what does it do to kind of,
What it means to provide good Healthcare or good medical advice? Does it sort of reconstitute what it means to be a patient or a doctor and the kind of relationship of what good Healthcare and good medical advice means? My question to Kushang also. Thanks for such a good presentation online,
I know it’s very difficult to do. My question to you is you know you emphasized at several points how there are differences in the kind of needs and advice that farmers give to themselves and to their friends and the way in which these Agri-tech companies provide advice.
I was wondering if you can say a little bit about the political economy dimensions of it sort of how is it that needs for generating capital and profit are intertwined with the the kind of advice that these companies are able to give or the kind of modeling practices they have so the
Implications of their need to build Capital on the kind of modeling in AI that is that is possible? and finally to Alexander and Katia, thank you again. My question is to do actually with the just the very last point Alexander that you made about kind of ethical framework so you
Brought up of course Kant and Descartes and other really influential ethicists uh you know a lot of their work can be read through the lens of Ethics I was wondering, how you think the kind of ethical problems themselves are being reshaped through AI?
So not so much technological solutions but the kind of ethical problems that one makes up um how is that different from uh earlier paradigms. Thank you. Yeah, Kushang, do you want to go first? [Kushang Mishra] Yeah sure. Thank you for the question.
So in terms of the political economy the way these algorithms are built not just in India but across the world is I mean these Agritech companies they they cater to larger farms I mean and and to certain kinds of crops
So for instance certain cash crops like grapes etc are catered to which are uh you know more profitable in that sense as compared to let’s say wheat or rice which a small farmer usually cruise secondly in terms of the kind of farmers that they target as of now they mostly
Target farmers and even even the interviews that I had with Farmers the main interviews were with farmers who can actually afford to have these kind of sensors installed and even they themselves feel you know that these sensors they they do not really provide the
Kind of advice that we require but yeah in terms of the political economy of the kind of Agritechs you know system that is being that is coming up in India it is largely towards more larger farms farmers who have more money and towards crops who are cash
Crops which can generate that kind of money so I think I hope that answers the question. [Alexander Campolo] Yeah so as regard to this this question of Ethics, yeah it’s a complicated one and we’re probably not so precise but I think the the general impetus of our paper is that we
Understand ethics here like less in the in the position that like we want to say this is good or bad or certain people should like do this or do that and rather to ask questions like how do these techniques make different ways of regulating human conducts possible
And then like what are the kind of what effects do these are these likely to have? so again like I certainly understand if people aren’t satisfied with this and say you know don’t you why don’t you take a position or this but I but I what
I would hope is that with with these kind of uh analytical tools then you know other like you could sort of like build more convincing normative positions on top of those so it so I’d say it’s a sort of sociological you know approach ethics are like a you know Foucaultian
Kind of like “techniques of the self” or “conduct of conducts” that type of thing. [Katia Schwerzmann] So we mentioned Descartes, and Kant, and Weber in the context of our genealogy meaning that we are not relying necessarily on them to provide categories and allowing us to judge these technologies now I think we
Need other frameworks because the categories are transformed by the technology so and and the critical dimension our position I think are pretty clear but we think that we are going to develop it in further paper here it’s it’s like a more general framework.
[Kadija Ferryman] Thank you so much for that question and you know a couple things so you know when thinking about the sort of on ontology of of healthcare and what it means to be a physician and
What it means to be a patient in the context of you know ever increasing development and use of medical AI tools it I think that we talked more about it in the paper and we mentioned in the presentation that we’re trying to work through this idea of the algorithmic account that was
Proposed that sort of has these two senses right of like an account of something and being able to be open to an accounting and in that paper they give an example of the introduction of AI to Transportation Security right so people who typically you know look it’s like security
Personnel who at airports and railway stations who looks at look at tons and tons of videos and bringing an AI into that space to help essentially kind of triage instead of them having to look at
Hundreds of screens or a bunch of images all the time that the AI would sort of predict and flag for them suspicious images right so and what was interesting about that account and how I
Or that description of the AI being used in that space and how I relate it to the medical space is that and this is something we neglected to kind of talk about either in the paper or the presentation is that within medical AI used in radiology is actually highly deemed highly
Accurate so accurate that it has incited fears that radiologists will lose their jobs because AI in radiology is so accurate and there have been you know tests showing that AI is you know more accurate than human radiologist so there’s sort of this context this background context of like
Are human radiologist going to lose their jobs to AI and part of what we are trying to sort of think through by um thinking about this idea of an account is that with a tool like AIMI Triage
Right AIMI Triage is it just like the security advisers because what [__] argues in that account is that it’s not that the using this tool it’s shifting the way those security folks interact with so instead of looking at a ton of data and then trying to pinpoint they get a smaller
Amount of data but then they actually usually when they’re presented with something that’s flagged as suspicious they ask for more they can ask the tool for more background and more images right so the argument there is that it’s not sort of putting this AI in this space is not
Putting security people out of a job because it can flag suspicious bags in an airport it’s just having them do their jobs in a different way and so that’s what we’re thinking about in terms of some of these AI tools in radiology maybe it’s not that it’s putting it’s going to put radiologists
Out of business but the way that they interact with sources of data in their and the way they do their jobs will be shaped and will be different because of the AI so I think for me that’s one way
Of getting to your question of sort of what is the ontology of these things like what does it mean to be a radiologist now and how is that going to be different with the introduction of you know
These kinds of tools and I think it remains to be seen right and I don’t know I don’t think that these you know tools will put radiologists out of business if you will but it will sort of change the way that they’re interacting with data with information with patients you know etc
The other you know the other important issue too is and it sort of it also ties back to epistemic authority is sort of thinking about what counts as evidence and in some ways right we can see the way
That these tools are sort of enshrining a certain a particular set of US-based medical expertise as the expertise right as the ground truth for these tools but there’s also some really interesting work in computer science especially when these tools fail or when they’re when they kind of break
Or are brittle work computer science researchers are sort of looking back and seeing why they do it and and and are able to sort of test with different inputs if you will and there’s a great article where an AI that was trained using you know physicians expertise did not work well
Sort of had an overall accuracy rate but then when you looked at it across demographic groups it did not work well for certain marginalized demographic groups racial and ethnic minoritized groups and they said well how about we actually sort of not try to get better ground truth and
Get you know more data more representative data why don’t we actually not train this model using physician data from physicians at all why don’t we use patients reports on pain and and this was a study about about knee pain and images in the knees and when they train their model
Using patient reported data those gaps that they saw between um groups kind of shrug right so I think it’s actually providing there are some openings for where uh there is a possibility for epistemic authority to be reshaped as well as these tools are being developed more and
Then are breaking in practice as Odia was saying right like so much of how we need to think about these things is how they’re actually working and when they’re implemented and so there is I think an opportunity when they are used and then they break to say well let’s actually think critically
About what we’re what we’re enshrining here as epistemic authority and can we use something else can we like totally think of something else as our you know ground truth for expertise. [Matthew Jones] I think we can genuinely say that all four papers are something to greatly look forward to to think differently with.
So let’s thank all of our presenters both present and virtual thank you, Kushang, for joining us. [Kushang Mishra] Thank you so for the opportunity. [Matthew Jones] I’ll see you all there