Data (Re)Makes the World Conference, March 31st & April 1st, 2023
Information Society Project, Yale Law School
https://law.yale.edu/isp/events/data-remakes-world

Panel 2: Trusting Sources
Matthew Jones, Columbia University, panel chair
Gabriel Grill, University of Michigan, “Constructing Certainty in Machine Learning: On the performativity of testing and its hold on the Future”
Kadija Ferryman & Odia Kane, John Hopkins, “Identifying and Interrogating Algorithmic Accounts in Medical AI”
Kushang Mishra, IIIT Bangalore, Bidisha Choudhary, IIIT Bangalore, “Data-driven ‘precision’ vs Farmer’s guesswork: How Data is (Re)Making Agriculture in India”
Alexander Campolo (Duham U) & Katia Schwerzmann (Ruhr-Universität Bochum), “From ‘Is’ to ‘Ought’: Data as Example in Machine Learning”

[Gabriel Grill] of high accuracy and certainty are   constructed by investigating  testing in machine learning tests are one important way how the  adoption of algorithms is justified. There have been an avalanche of grandious claims  around the accuracy of algorithms in the AI. For example, a team at IBM argued to be able to  

Predict with 95% accuracy which  workers will quit in the future. A new research project claimed to be able to  detect lies in border control with 75% accuracy,   and researchers claim to be able to predict sexual  orientation from face images with 91% accuracy. I aim to unpack the situation and continue the  

Project of a sociology of testing  in the realms of machine learning. With the current hype around generative AI,  like ChatGPT which is claimed to be a sort   of universal model due to its complexity  inscrutability and all-encompassing data. Testing to show capabilities and accuracy  is again receiving a lot of attention–

The clicker seems to be not working. Thanks. I will say Just next slide, okay. Yeah, that’s the correct one. So… Okay. No, the next slide, please. Sorry. Yeah. Okay. Accuracy in machine learning,  simply put, refers to a metric   that quantifies a correlation between the  results of an algorithm and test data.

This definition highlights how  accuracy is not some absolute   Universal but instead depends on  chosen test data and perspective. Naming a metric “accuracy” can  already be considered an act of power as it suggests, situated functionality  can be expressed with a singular number.

This confusing naming in machine learning  has led to false descriptions of certainty. For example, the Air Force reportedly  developed a missile recognition system that after initial tests was believed  to have an accuracy of 90% but it was tested only with  images that contained one missile

And another test later with pictures of multiple  missiles revealed a much lower accuracy of 25%. Major General Daniel Simpson described  the system as being “confidently wrong.” Researchers have in the last years highlighted   how such accuracy numbers can  misinform about functionality and hide problematic effects arguing even that  ML is experiencing a reproducibility crisis.

Their work has challenged this  avalanche of grandiose claims   and deconstructed assumed Universal accuracy. Next slide Yet accuracy metrics still remain important. In part, this is due to how they are needed  for developing machine learning algorithms as they give direction and provide a sense  of the capabilities of otherwise opaque,   highly complex, algorithms.

They are meant to be scientific, formal,   and standardized quantifications  of quality and progress but on the other hand, they also are performative,   normative, and rhetorical numbers circulated  to convince others that an algorithm works. The tests influence discourse around accuracy as   they are tied to a promise  of mechanical objectivity.

I argue this duality marks  a conflict of interest when   those that conduct a test also  benefit from favorable results. In the paper, I also discuss in more detail how  did the circulation of seemingly ever-increasing   accuracy numbers and ever-bigger data  sets within the field and Industry

Enabled the performance of continuous scientific  progress worthy of investment and attention. Next slide I draw an ignorance study as a  framework to theorize these current   issues around unreliability conflicts  of interest and politics in testing illustrating how the construction of high accuracy  claims also entails the production of ignorance.

I argue that the often opaque  flexibility in testing and the   concealment of human judgment enable  the construction of accuracy claims and thereby produced accuracy  is not ignorance is not not   necessarily problematic but it is always political imbued with power and productive as its circulation entails world making  and can engender epistemic violence.

Various prior works have highlighted  how high accuracy claims have been used   to justify and objectify systems of oppression. Next slide, please. I also understand this ignorance as  strategic, since the current dominant   paradigm in ML incentivizes reporting ever higher  accuracy to stay relevant and convince others.

This tendency to overpromise  has been identified as part   of the reproducibility crisis in other fields and also critiqued within the tech industry. The produced ignorance should in turn not be understood as a mere puck  mistake or a result of pitfalls but as a feature from which some actors benefit.

The current incentives  encourage actors to not conduct and even impede investigations into harms and  failures to avoid controversy and liabilities while claiming innocence. For example, Meta  reportedly asked employees to avoid terms   such as discrimination when talking  about algorithms to avoid liability. Various companies also removed employees  highlighting risks of key technologies.

These strategies are reminiscent of  tactics used by controversial companies with a high concentration of market  power such as big oil or big tobacco. In the paper, I highlight several of these tactics   and how big tech companies  have employed them recently. For example, ShotSpotter advertised  its systems for its accuracy

But experts highlighted that they must have  excluded test cases for which it was unclear   whether a shot had been fired after police  arrived to achieve such accuracy numbers. Chicago’s Inspector General even  noted that physical evidence of   a gunshot was found in only  9% of all ShotSpotter alerts

Which only further suggests that ShotSpotter  is likely misrepresenting the capabilities of   its system by choosing what data to  consider when calculating accuracy. Next slide. I will now explain several ways how high  accuracy claims in machine learning are   constructed in testing by producing ignorance. I will illustrate this by unpacking  testing for emotion recognition  

Algorithms trained on pictures with  a few categories, emotion categories. In the paper, I mentioned also several additional  ways of producing ignorance in testing. Producing ignorance is unavoidable in testing as  it’s not possible to test for every eventuality   so a central question of testing is thus, about  how priorities and perspectives are considered.

Next slide. Yeah. In order to enable high accuracy like  the underlying data the the underlying   data needs to be comprised of many  stable and recognizable patterns. One central job of engineers is to scope problem  spaces and data so they encompass predictable   phenomena while excluding those that are  unpredictable messy or resistant to measurement.

This practice of exclusion and simplification is   essential for enabling high accuracy in  algorithms and making such systems useful but is also a political practice there are several  accepted ways practitioners justify such scoping. For instance by excluding  certain examples as outliers

Such exclusions can also be made intentionally in  obfuscated ways but often they are just unnoticed. For instance, because a certain way of  framing a problem is considered hegemonic. In the case of emotional recognition,  the appearance of high accuracy is made   possible by corresponding standard facial by–

I focus on a few widely  recognizable stable emotion   categories in corresponding  standard facial expressions. Yet, as previous work has highlighted,   facial expressions don’t necessarily  correspond to actual inner emotions and experiences of emotion cannot be fully  captured by decontextualized categories. The high  

Accuracy is thereby enabled by the production  of ignorance of the messiness of emotions. Next slide. The predictability that enables calculated high  accuracy is also not just out there but made. For example, for rules  culture material constraints. They can stabilize patterns that  algorithms recognize as correlations. Algorithms, when deployed, can  also co-produce predictability  

By intervening in the world and  influencing different actors. For example, this means emotional  expressions are somewhat predictable   also because they are learned as  part of membership in a culture. Testing produces Ignorance by not revealing  how emotions and cultures could be otherwise. Next slide.

It is usually not possible to measure  constructs directly so proxies are used instead if a proxy is accepted depends on whether  it is seen as similar enough to a construct. For example, a consistent classification  scheme mapping facial expressions to   several widely recognized emotions  can be created with high accuracy

So researchers and technologists advocating for  emotion recognition can by convincing others   that a proxy is equal to the constructs create the  appearance of highly accurate emotion recognition. In the paper, I describe several  rhetorical moves used to do this. Like Folk believed that  emotions can be read from faces.

And pointing to macro expressions only visible to  algorithms and therefore difficult to challenge. Next slide. The optimization logic in machine  learning entails that during training, majority perspectives are learned  since they maximize overall accuracy. In turn, minoritized perspectives or  test cases have only little impact  

On accuracy numbers and are often  neglected in favor of majorities. This is an intended behavior to make algorithms   work for majorities while less visible  minoritized perspectives can be ignored. For example, this means that minoritized  expressions of emotion in test data are   ignored while overall accuracy  is seemingly not much affected  

Because this only includes a few test cases. Next slide, please. In the paper, I discuss several recommendations  for how to deal with issues in testing. I argue practitioners should  focus on careful naming. For example, by renaming the accuracy  symetric test correlation to highlight   it does not represent some  Universal notion of accuracy  

But actually a correlation  based on a constructed test. I argue to develop different conceptions  of accuracy that are more participatory,   justice-focused, challenge  power, humble and seamful. This would require embracing feminist  sensibility sensibilities such as situatedness,   local knowledges, and also encourage more  quality and deeper data and research.

I also discuss how optimization  algorithms often produce singular   results thereby reproducing one perspective and argue instead that plurality and activity  should be explored more, similar to simulation. Finally, I also argue for more social studies of   accuracy that seek to unpack what  understandings of accuracy are held

By whom and how they are co-produced and  stabilized and what their politics are. Finally, I want to end this talk with a short  reflection on current regulatory trajectories. The EU AI Act, for example,   proposes a technocratic agency that  tests algorithms and AI for safety

But this problematically depoliticizes  testing and puts it in the hand of agencies   already accused of being captured  by interests of big tech companies. I think more testing can  improve this current situation but it also comes with various challenges like, who decides what testing is  needed and when it is enough?

How can misrepresentation be recognized? How can testing be done to  enable more democratic oversight? Beyond these approaches situating  problems within the technology and   its uses more structural  changes are also important as it’s unlikely that the market or the scientific  process by itself will fix these issues. Current incentive structures and dependencies make  

It difficult for practitioners  to not participate in the hype. Speaking up can result in stigmatization and  exclusion as people are framed as naysayers. The high overlap and interconnection  of Industry and Academia make machine   learning unique in contrast to other areas that   face similar issues around corporate  pressure like environmental science.

In turn, it is important to  create more opportunities   for independent multidisciplinary work by  introducing taxation for tech companies,   improving labor standards, and supporting  whistleblowers that point to these issues. Next slide, yeah. Next slide, please. Yeah, okay. Yeah. Thanks for your time. Pre-print is available in this  oral and yeah there’s currently  

A labor strike going on at  the University of Michigan   of graduate students instructors and I  want to express my solidarity to that. Thank you for your time. [Matthew Jones] Okay. Thank  you very much for that. Next up we’re really lucky to have Kadija  Ferryman and Odia Kane, from Hopkins,  

Talking to us in a really amazing paper on  interrogating algorithmic accounts in medical AI. [Kadija Ferryman] Great. Thank you so much. Can you all hear me? Okay and I just want to check that the– Yes, it works. Okay, great.

So I want to just thank the– we want to just  thank the organizers for um having us today,   we’re really excited to be here and  we just want to say that being here   has special resonance for both um Odia and myself. Odia grew up in New Haven so this is you know,  

Special for her to be here I also  went to Yale as an undergrad. So it’s nice for this is kind of the– I’ve  been back, obviously, or not obviously but I’ve been back since graduation but this is  my first time presenting here as a scholar  

Rather than a student and also I worked here  at the Yale Law School as an undergraduate I was a circulation assistant at the Law Library  so it’s nice to be here in in this capacity today. So one thing we don’t sort of have in  our slide but I just wanted to make  

A note of is the acknowledge  the lands of the indigenous– the various indigenous tribes of Connecticut there are a number including the Mohegan,  Pequot, Niantic, Quinnipiac and again for me, as a student, these were names that I  had seen but did not really sort of know the  

History so it’s really important I think to just  um make those acknowledgements before we begin. Okay so just a couple of  disclosures before we start. So for me I serve on the um National Review  Board for the All Of Us research program

And I’m also a member of the Digital  Ethics Advisory Panel for Merck Germany and Odia has no disclosures to disclose. All right and so we are going to focus on– the focus of our work is Health Equity of  our research is Health Equity in medical  

AI with a specific focus on medical device  regulation by the Food and Drug Administration and we’ll talk a little bit more about  that but if you didn’t know that the Food   and Drug Administration is the federal  agency that’s responsible for regulating  

Software that’s used as– software that’s  used in medicine and including medical AI and today what we want to sort of the some  of the threads of that work that we want to   bring together are sort of a focus on thinking  about Health Equity alongside algorithmic bias,  

Oops, algorithmic bias and  discrimination in medical AI and we hope that some of those threads will bring   those together in the presentation and  that they’ll come out in the discussion. So just a little overview of the research study  that we conducted that our paper draws on it’s  

The advancing Health Equity and AI and machine  learning health regulation and policy study. This was a 10-month exploratory study  funded by the Pew Charitable Trust. Thank you, Pew. And it included a couple of aims  so we did a content analysis of   publicly available FDA clearance  and approval documents of medical AI

We also did a national landscape analysis of   sort of organizations working at the  intersection of Health Equity and AI and we also did uh qualitative  interviews with key stakeholders again, at this intersection of Health  Equity and AI policy such as National   minority health organizations,  National medical organizations,  

Device manufacturers, and scholars  who research at at this intersection. So before we kind of delve in,   I just want to give a brief overview of the  way that the FDA has regulated medical AI, so the history of this sort of– and  this is not the the kind of full history,  

Right– these are some key moments so we can start– one place to start  um even though we can see sort of   threads of this earlier but one place  to start that we’ll start for today is in 2014 the international Medical  Device Regulators Forum issued  

Guidance on um how to sort of think  about software as a medical device. So as you can imagine, right, and, as  many you know, people in this room know,   software has been around for decades and there  was this move in 2014 to make a distinction  

Between software that was part of medical  devices, right, that were in, you know, software that was used in Hardware devices   to Software that was independently  acting as a medical device, right. So in 2014, there was this explanation by  this International body to say this is how  

We define an independent software that’s acting as  a medical device not software in a medical device and then, in 2017, the FDA adopted this  International sort of definition as   their own as well and also announced  this pre-certification program where

This was sort of one major step in thinking  about how to regulate these software,   these independent software medical devices, and it was this experiment essentially to   think about regulating the companies  rather than the devices themselves and this was a pretty significant  divergence from the way the FDA  

Had been regulating medical devices before, right thinking about regulating the companies  to actually thinking about moving from   regulating the devices to  regulating the companies. And then, in the last couple of years, the  agency has released several policy guidance  

Documents and in relation to, you know, what we  heard just now there were you know some cases   very recently the University of Michigan,  actually, researchers there showed that an   algorithm that had been approved and had a  certain level of accuracy that once it was  

Actually being implemented clinically  its accuracy level sort of went down and so just recently, in 20 September  of 2022, the agency sort of updated–   issued updated guidance on clinical decision  support software and how to think about those. Okay. So now I’m going to turn it over to Odia. [Odia Kane] Thank you.

So just to level set further in terms of  talking about policy at the FDA’s level   it’s important to consider what the FDA does not  regulate for the context of this conversation. So the first is this concept of “Homegrown” AI/ML  devices so these would be tools that are developed  

Within a hospital within a clinical setting to  do set of tasks, whatever those tasks may be but they are insulated to that institution  or set of Institutions within a partnership. The second would be software technologies  that have an administrative function so these would be techn these  would be algorithms that are  

Used for putting calendars together, for example, or some other metric that might be used  if like speech to text, for example and then the third one would be  software technologies that provide   lists of recommendations to clinicians  but don’t diagnose in any particular way

So this would be a patient that would come in  with flu-like symptoms and then there would be an   output that might suggest what kinds of treatment  options would be best for that patient, so with that in the background, we’re talking about a  specific set of medical AI that tend to have more  

Sophisticated, more complicated, and often riskier  implications when they’re used in healthcare. So I just want to take some time to focus a  little bit on our Aim 1 which was a content   analysis of these FDA summaries and to use  this we used a medical AI database that had  

One hundred forty-one FDA-approved  devices that were recorded in there and the focus of our research,  as Kadija mentioned earlier, is on Health Equity so we wanted to focus on  the devices that collected and reported on   demographic information that they took  on the people using their sample and,   of the one hundred forty-one devices,

We saw initially that only sixteen of them even  collected and reported on demographic information, so we don’t know the demographic makeup  per se of these sixteen different devices but we do know that they’ve reported on it well  the other one hundred thirty almost did not. and within that you’ll see that there’s huge  

Variability in the figure in  the top my left, your right where you can see breaking down by  demographic types what were included and as you can see here, gender data is most  consistently included within different   devices and it’s not universally  as well as race data is way less  

Frequently found compared to data related to age. So we saw a ton of variability in the  demographic information that was collected but also in the way that the  performance testing was conducted. So some of them had multi-site phasing, some had  different phases where they would go and have a  

More iterative process to testing the performance  of their algorithms, others had simulated users,   some used actual patients and participants  to do their research to see how well their   algorithms were working, and those weren’t  consistent at between two different devices

So then moving on to the main points of the  paper that we submitted to the conference We talk about these three key concepts that we   want to break down and apologies if this  is repetitive based on your background. This is for audience members who might have  diverse introductions to computer science,  

And Kadija and I are not computer scientist,   so we were introduced to these  concepts through this research. So the first is Ground Truth which was referred  to a little bit in the the past panel as well   which is this baseline or reference to which  results from experiments and tests are compared.

So this is, can your algorithm  do what you say it’s going to do? And, as I just mentioned, there’s a lot of  variability in what that even means and where   this information is coming from and briefly  we’ll give an example of a Ground Truth. The second is Epistemic Authority.

So tthese are the ways in which and  apply to artificial intelligence. The ways in which AI tools are established as  bearers of knowledge in a particular domain. Many times, the AI that we focus on in  this paper they have a specific task  

That they’re trying to complete so, how do we know that we can trust this   output and giving the AI the authority to  actually be entrusted with their conclusions? And then this third notion of accountability  from Neyland, a paper 2016 that was cited,  

And accountability having these dual meanings  which is, first, being open to review of something and the second is to be able to give an  account or an explanation of something   an applying these standards  to how we view Medical AI. Passing over back to Kadija. [Kadija Ferryman] Thanks. So, we wanted to

So, again, just to sort of  reframe our paper’s points we the what we’re trying to sort of think  about analytically is the way that Ground   Truth is represented in these review  documents submitted to the FDA, right, and in all of the documents that we reviewed  there is a performance testing section, right,

So this is one of the things that  medical AI developers in order to,   you know, have their device  approved or cleared by the FDA,   they have to show evidence of some  kind of performance testing, right. They have to show that their  medical AI does what it says it does

And, in that section of performance  testing, there is a description of the   ground truth that was used to test  the performance of that medical AI and so this is where we are really trying to focus   our argument for this particular paper  and our work going forward is sort of

How that ground truth establishes  the epistemic authority of the   medical AI within this context of  FDA approval or clearance, right, as well as, how that ground truth in  addition to sort of establishing the   Epistemic Authority also provides an account of  how the algorithm is sort of seeing the world?

So, just very quickly, the example that we use the one example that we focus on in the paper is  a triage medical device and it alerts clinicians,   it reads images and it alerts clinicians  about an abnormality in a set of images   and triages those images for clinicians to review

And when we looked at the Ground Truth and  the performance testing for that device the   Ground Truth was established  by three radiologists, right, and the accuracy of the algorithm was based   on the agreement of two out of  those three radiologists, right, and so we just wanted to sort of complicate and  

Bring up that notion of sort of like who  is establishing the Ground Truth, right, and what does that, what does  that say about expertise,   and the value of expertise in medical Ai? and the other sort of issue as  well is that this was just one  

Example of a kind of ground truth and  then, just very quickly, you know, oops sorry, very quickly, two of the points  that we came away with is that, you know,   medical AI developers establish  Ground Truth in different ways, it’s critical to establishing the  epistemic authority and that this  

Issue of sort of the variability of  Ground Truth we’re thinking about   it as an opportunity and a challenge for  the FDA’s governance in this area, right, and so we think about how aspects of ground  truthing in this domain should be more  

Transparent and subject to review and that,  you know, there should be an examination of   standards of the reference databases and things  like that that are used for the literature. And then we just have a couple  of questions but I think we’re  

Out of time and they’re in the paper as  well so I think we’ll just end there. So thanks. -[Matthew Jones] Okay, I believe  our next paper is on Zoom,   is that– good. We so the next paper, while  we’re getting it up, is from Kushang Mishra  

Of IIIT Bangalore and Bidisha Choudhary  also of IIIT Bangalore and the subject is “Data-driven ‘precision’ vs Farmer’s guesswork:  How Data is (Re)Making Agriculture in India.” [Kushang Mishra] Hello, can you hear me? [Matthew Jones] Yes. Welcome, [Kushang Mishra] Yeah. Thank you so much for the introduction, Matthew.

So, my name is Kushang Mishra, I’m a  research associate at IIIT Bangalore   and I wrote this paper as part of the Humanizing  Automation Project at IIIT Bangalore along with   my supervisor Professor Bidisha Chaudhuri  where essentially in this project we are  

Looking at the impact of automation AI and machine  learning specifically in the agriculture sector. Yes, so, I will start by giving  a brief background of this paper. In the introduction and then  I will provide the state of   the art on the current literature on this subject

And then I will talk about the methods  we used followed by our findings and,   finally, analysis and conclusion. As you can see in the slide, there is  a sorting and grading machine over here   which uses AI in machine learning to sort  and grade onions into various qualities

So it is going up the conveyor belt and it is  being sorted into various categories over here. Yeah, so, in the next slide, as you can  see, data-driven precision farming is   essentially the introduction of AI and  machine learning in farming and it has  

Been adopted in the countries in  the Global North and it is being   pushed by both the state as well as Silicon  Valley companies and private sector as well. Obviously, it portrays a very  revolutionary technology and   it portrays it especially in  the context of climate change

Which the narrative is that it is  hampering our ability to grow food. The population is rising, the input  prices for the farmers are rising,   and so, we need the data-driven precision, we need data from the soil, the climate,  the water to essentially, you know, make precise and data-driven decisions

And these are pushed– proposed in opposite  to the traditional knowledge and, you know,   skills of the farmers which are no longer  considered in this narrative relevant enough now. As you can see in this image of the  farmer which looks at how, you know,   farming is envisioned how uh the farm of–

Farming is the farmer is confused but the  future of farming is informed by data Insights and how the farmer is able to now  make better decisions because of data. So this is the kind of narrative  which is being pushed by Agri-tech  

Companies as well as by States in  many of the Global North countries. Yeah, so, the existing literature on the  impact of digital agriculture essentially which critiques this essentially from the STS  and it basically critiques this narrative of   Revolution which devalues the farmer and put  them in the hands of the private companies

And the critique is that it doesn’t  bring any systematic transformation but   is merely a technological fix, highly-complex  social and economic problem in the farming. There is also critique of the datafication  of the farm and how it essentially hides   the human infrastructure which essentially  ensures that all the operation runs smoothly.

Yeah, a sec, yeah. Now the problem with the literature right now  is that these are focused on the Global North. These technologies are now also– these technologies are now also coming in  countries in the Global South like India and these countries have their own  colonial and postcolonial context.

Now, as you can see in this image, you know, before that in the colonial times, the  narrative which was pushed   by the British was that, you know, the, you know, essentially, that we need  superior scientific knowledge   over the ignorance and inertia  of India’s agricultural classes

And then how we, and how, you know, science  can essentially increase the yield for the   Empire to grow a similar narrative  post-independence was pushed when,   you know, in the post colonial times by the  American state during the Green Revolution   foundations like Fort Foundation  played a crucial role and then US  

Universities played a crucial role in setting up  an infrastructure of agriculture in the country, which essentially pushed for scientific  knowledge and technology over the local   knowledge and skills of the farmers  and the local agricultural practices. As you can see over here, this is Frank  Schuman from University of Illinois,  

An agronomist who essentially is gendered  by this phrase called “Nitrogen Zindabad,” which essentially means Long  Live Nitrogen and you can see   what kind of narrative that they  were trying to push in that sense. so uh this is happening now over  the past 10 years in a context  

In which the Agri-tech sector in India is booming. It has already crossed a billion dollars in 2021  and even the Indian government is pushing for its   promotion through projects like Agristack which,  essentially, is collecting and creating databases   of farmers with the help of companies like  Microsoft and the government will give a unique  

ID to a farmer which will allow direct benefit  transfers and yield and and help them in yield   forecasting and the idea is to increase farmers’  income through digital technologies like that. And so, in this context, the objective of this   paper is firstly to question this  narrative of preciseness, right,

That data-driven technologies  are essentially precise. vis-à-vis the quote-unquote  “guesswork” of farmers,   the traditional skills of farmers and how  this narrative is not something which is new but was part of a colonial and postcolonial  history of India and so, you know, we want to push this narrative and  placing it in this historical context.

So I won’t go much into the methodology  but what our aim was to understand,   look at both the context in  which the technology is developed so we interviewed people from  various companies and startups   Agritech startups as well as we did  a participat observation study with,

In the context in which the technology  is deployed which is the sorting and   reading machine in a village in  the state of Maharashtra, in India, as well as we interviewed several farmers   to understand what they think of these  technologies and how they’re using it.

So this was the narrative which is  pushed by these Agritech companies,   right, that this is essentially these decisions cannot be left to an element  of guesswork, we need data-driven precision because in the context in which  this is happening is something   which is not conducive for the so-called  guesswork decision-making by farmers

And so one of the major findings from this the  study was the fact that precision farming is   neither precise nor is it– precision even  useful for those who use it essentially. So basically, the irony is that, you know, is that the Agri-tech companies build their  models on the feedback of the same farmers  

Whose skills are deemed as “guesswork” but the  farmers still feel that their voice is is not   heard as you can see like the idea is that  they need the feedback to build these models but even then the farmers feel that  their voice is not being heard. Sometimes, farmers do not actually, you know,  

Find these machines to be precise enough for  them these sensors precise enough for them. The farmers would rather listen to their  fellow farmers or the rich dominant farmer   who successful in their own village rather  than use the predictions made by these sensors as you can see, you know,  

These sensors don’t essentially tell  these farmers what they actually want and then, you know, in fact,  they rely on their own knowledge. They find their own knowledge to be more  precise in certain context. For instance,   in the northern part of the country  there’s this Festival of Chhath  

Puja which is celebrated with groups  where grapes is essentially consumed. Now, while talking to grape farmers in nask, what  I found was that in order to cater to the demand   of this Festival the farmers can’t follow  the normal cycle of harvest and pruning and

Because this Festival comes in October and  the normal season is in April and March but the sensors are installed  in the farms are, you know,   developed based on the normal  season of harvesting and pruning and so the recommendations which these  sensors and these which they give are  

Not suitable if the farmers want to  target the October cycle of grapes and here you can see that while the farmers and   the work that they do is portrayed  as something which is not precise they have a precise schedule of what  kind of pesticide they want to use.

I saw that they maintain a diary in which  they have schedule of how much pesticide   on each day they will use so it is not  that what they do is not precise and   it’s mere “guesswork” as the Agri-tech  companies would like us to believe.

Another problem is that the Agri-tech  companies whose product we saw initially   in the slide claims that accuracy is  needed for targeting export markets so they say that we need machines which can  provide more accurate sorting and gradings   that we can sort more accurate more more  better quality onions for the export market

And this is something which the human  sorting and grading workers cannot do as you can see how women are  sorting and grading onions over here but if you ask the vendors to whom  they sell these machines, you know,  

These companies are targeting they will they don’t  even want that accuracy because it will lead to   more wastage uh they the machine will sort out  onions which can potentially be sorted and graded   for domestic markets so they actually do not even  want that kind of an accuracy which the machines  

Can provide and they actually want little less  accuracy but certainty in terms of less wastage so even here the the so-called “accuracy”   in procession is not even needed  even if it’s there and so, so… Sorry, yeah. Finally, what I’d like to say is that  the binary created here which between,  

You know, data-driven decision farming  data driven farming versus the guesswork   of so these farmers here devalues the local  embedded knowledge which these farmers have which these these machines are not able to  essentially you know capture and work with even when the Agri company seek the  feedback they want it to better their  

Model and not necessarily cater  to the concerns of the farmers so in the end we need to probably reimagine how we  create these data-driven technologies historically   from the colonial to the postcolonial  times science and technology has deemed,   you know, has been deemed superior to the  local knowledge and the local context.

Perhaps, it is time that we change that approach. Thank you so much. [Matthew Jones] Okay, thank you very much. I really I really appreciated the illustrations   that really brought home the onions and  the farmers at the heart of the story. So our final paper comes from  Alex Campolo who’s at Durham and  

And Katia Schwerzmann at Ruhr-Universität Bochum. From is to, oh no, “From Rules to Examples: Machine  Learning’s Type of Authority.” Take it away, my friends. [Alexander Campolo] So, yeah, many many thanks  to everybody for including us in the conference,   for coming, and also for reading the  paper, so, yeah. I’m Alex Campolo,  

I’m postdoc on the algorithmic societies project  at Durham, and I just want to acknowledge the   European Research Council which has  supported our stuff and, okay, yeah, and I’ll just say our plan here is to rehearse  just the basics of this argument and kind of

And, at the end, do a little bit  of reflection on the problem of machine learning in relation to the type of  social theory that we’re kind of aspiring to do. And, just for context, what you’ve read here  is an article that’s a under review from a  

Journal and this is this is we’re sort of giving  a response to our reviewers in this stage but like your your feedback will also– can  potentially also affect the finished version. So our paper just began by seeing statements like  this appear in the machine learning literature  

And we were really intrigued by the way that the  community seems to have conceptualized it kind of or at least, in this case, a very  neat and symmetric reversal from   programming with rules to training by examples and we began with a kind of hypothesis  or even just an intuition that this  

Perceived change could shed light on deeper  questions involving how concepts of rules,   examples, algorithms, etc have changed over time. These in turns I think opened onto a whole range  of historical, sociological, and ethical issues.

The one we chose to pursue was to ask whether and  if so how machine learning might entail a specific   type of regulation of our conducts governed by  operations like classification and prediction and whether there might be something happen  different happening here than the kind  

Of widely known ideas like calculation,  quantification, rationalization that have   been used in social theory to describe  you know kind of modernity at large. So this led us to look for theoretical  frameworks that could help us make sense   of this supposed transition from a kind of rules  

To examples-based form of knowledge  and there are tons of candidates. We took a lot of inspiration from Lorraine  Daston’s recent book on rules but as we   proceeded we were increasingly drawn to a very  old set of ideas from sociologist Max Weber, in particular, his his idea of  a rational type of authority

Gave us a good account of what  connects rules and authority, and, when we use the term authority in this paper,   we use it in this kind of  sociological sense, right, refers to not the raw exercise of power  or violence but rather specific reasons

That people follow commands or allow  themselves to be governed in certain ways the legitimacy of rules and with machine learning  this legitimacy you know it could refer to why   we accept to be classified or interact  with predictive systems in certain ways. So our understanding of programming  rules and examples in machine learning  

As comparative we wanted to show how a  certain understanding of rules emerge in   connection with digital computing and we think  this is kind of well known and established but it’s important to rehearse because it gives  us a concrete point to try to discern what might  

Be more novel in a type of authority based  on examples en acting in machine learning. So of course as Daston’s work shows,   the idea of a rule covers a huge amount  of historical diversity, and in the past the ideas of rules and examples very  often worked in tandem actually,  

You know, examples illustrate rules but for our purpose we focus on this  set of Weberian ideas regarding rational   rules because we think they formed a  kernel of a type of authority that was   intensified in programming logics  which Daston refers to as sort of  

“thin algorithmic rules” that can later  be separated analytically from examples. [Katia Schwerzmann] Foundational work in computer  science in the middle of the 20th century   linked the technical characteristics of digital  programming with questions of rules and authority. This is, for instance, the case of Alan  Turing’s account of programming rules in  

His classic paper “Computing  Machinery and Intelligence.” In the paper, Turing described computer rules and  are functioning using the language of authority. The definition of programming rules he offers  echoes many of the characteristics Weber   attributes to rules however the technical features  of the Universal Turing machine its discrete  

Character and its infinite storage capacity  induce important changes in the nature of rules. Rules become analytic. They are as numerous as   necessary to break down complex  behaviors into unambiguous steps. Their extreme specificity and the  discontinuity between the computer   states allow for a total control over  the movement from input to output

And that is what leads to what  we had in the earlier panel this   idea of determinism that is linked to  the rule-based programming paradigm. In our second genealogy, which is  this time the genealogy of examples, this genealogy reconstructs the history  of the concept of example traditionally  

An example is a concrete singularity which  comprises the essential features of a type. So in a first metaphysical or  even the theological sense an   example is a concrete individual  that embodies a moral ideal like, for instance, Jesus or a saint those are examples  that Kant mentions in a scientific sense.

An example is a specimen that  expresses all the essential   features of a species or a type so that  it is representative of the type and so   example can be contrasted to something  like single instances or token of a type. While rules tell or prescribe explicitly how  something should be or someone should behave  

Examples show or reveal norms leaving space  for interpretation and implicit knowledge. However, this is not to say that examples  have a lesser prescriptive force than rules. The difference is maybe that the subject is  left with the difficult task to interpret   these norm and the relationship  between examples and the norm.

As for the rational type of authority,  machine learnings exemplar what we   called “exemplary type of authority” must  appear legitimate to exert its influence, and the legitimacy of this what this new exemplary   type of authority is based on what  we call an “artificial naturalism,”

Ah, oh my God, I’m sorry, I– let’s,  let’s, roll back a little bit, okay? So, I’m coming now to the transformation induced   by machine learning in the  understanding of example. So machine learning examples are not  singularities representative of types anymore. Instead, and it was not easy for  us to define what becomes examples  

In machine learning we try to define them that way they are complex assemblage by which data  is aggregated, pluralized through scale,   formatted and processed through feature  engineering and model, so that norms also called representations can emerge from them. The representation produced by machine  learning become normative in a dual sense.

First, they affect how models will classify  new instances by making generalization possible and second, in a more traditional sense,   these norms influence our own behaviors by  making possible prediction and classification. So, to summarize, while examples  in the traditional sense induce   obedience through their perfection and their  reference to a type, in machine learning,  

Examples induce obedience through  the norms they inductively elicit   in a what we have come to define  as in a naturalistic way and that’s I’m coming back to now, the way machine learning  exemplary type of authority legitimate itself and it legitimate itself through  a kind of artificial naturalism.

The expression “artificial naturalism” can  seem contradictory but it renders the messy   tension that characterizes machine learning  between a desire to let data speak for itself through models and the engineering  practices that permit it to do so. And we think it’s a “naturalism” in  the sense that first it presupposes  

A world determined by deep statistical  structures that are inaccessible to human   perception and only accessible through  representations produced by models. These representations should increasingly be   learned from data itself rather than  from human specified interventions. The motto being “scale is all you need.”

The goal is to discover regularities in  the data that function as norms capable   of accurately classifying new data in new contexts so, unlike rules which are transcendent to the  order to which they apply in machine learning,  

Norms and examples are imminent to one another  so that they seem to be one of the same kind. Thus, the representations only seem to  express the regularities found in the data. -[Alexander Campolo] Okay, and yeah. We’ll just end here by sort of reflecting a  little bit on some issues that we were kind  

Of thinking about and struggling  with while writing this paper. and, yeah, so first it’s it’s probably obvious  but it’s worth saying explicitly that we’re kind   of attempting to link developments in Science and  Technology with concepts from social seriously and  

We take very seriously that this is challenging  to do in a in a kind of convincing way, and we were thinking that like one of the  implications of our argument goes against   the idea that we must build kind of analytical  walls to distinguish technical and colloquial

Or philosophical senses of  notions like bias training   examples prediction and classification and I think we were thinking that one  implication of our argument commits   us to a position that it’s in fact neither  desirable or, perhaps, not even possible,   to separate these technical senses from the kind  of sedimented ethical and political ramifications

And that technical transformations  add new layers to these. We hope that this kind of conceptual  historical approach can kind of navigate   between a technological determinism and a  more you know sort of idealistic position although I think we have to  be totally frank that we see  

Our work probably more on the side of the latter. A second point just concerns the scale or  scope of the paper which is big and kind of   like epical so this means that we had to  be very selective so we chose to analyze  

A very few tendencies or phenomena that we  hope can give a kind of analytical grip on   our problems but we also recognize that  there are times when the paper goes into   some technical detail and that getting these  details right doesn’t just matter to us for,  

You know, our own integrity but they’re also  related to the theoretical problems and then just a final point concerns the question of like, what kind of critique are we making? so I think it’s worth saying explicitly  that we think there are probably some  

Differences between our work and a  lot of other very important existing   scholarship on AI ethics where a kind of  existing framework is used to identify and   hopefully mitigate some concrete  harms caused by machine learning. This work is super important but we just  thought that it might be worth saying that  

We see ourselves doing something a little  bit different which has to do with the way   that machine learning makes possible different  forms of ethical relationships in the sense of   different ways of conducting ourselves and  producing or legitimating obedience and I

And I think it, you know, people could criticize  us for saying that we’re kind of relativists right   that we suspend in a certain case we suspend  judgment on some of the statements from the   machine learning community and instead work in a  kind of more analytical mode to try to describe  

Possible contours and effects of this type of  authority rather than advocate for or against it. So, yeah, we’ll just leave it  there and we’re we’re sort of   welcome to your comments to hear what you think. Thanks. [Matthew Jones] Okay. Thank you very much.

And that the last point is  I think a great jumping off. Please, so, our our papers clearly–  well, thank you to all four groups.  And I think one of the richnesses of  this panel is it’s actually framed   by two pieces which offer a kind of a  historical emplotment a meta-analysis  

Of the development of machine learning in two  really important and very pressing case studies and I think what’s characteristic of  the papers that we’ve just seen is that   they all take machine learning activity  seriously without taking it literally.

What I mean by that is they all are interested  in investigating how in fact machine learning   operates while simultaneously underscoring the  gulf with epistemic marketing clinical claims and   activity and I think for our purposes and thinking  through what are we doing as the different  

Communities working in this it shows that we  both need as it were careful analysis of machine   learning and its limits and the powers of data  trained on it and distinct accounts of the hype   and claims and institutions built on them that  are often not authorized at all by that machine.

In fact, are often in tension with what are the  some of the epistemic building blocks of machine   learning so a way to ask about these papers  and I think probably quite a few other papers   is to say to look at a difference between  the sets of inquiries that take the form of

To what extent do we need to nerd  harder do we need to nerd better   that is not hype inappropriately  or use better ground truth data   and that’s slightly different than the  sort of fairness discourse as it exists but it has a technical side all of  these papers. I just lost my notes.

The machine learning doesn’t  want me to nerd harder. The other kind of critique and all  of these papers are involved in it   too is we need not to nerd this way at all, right. This is just wrong now what’s  fascinating about all the papers is,  

And I don’t mean this in a critical way, is  that they both all are involved at both of   those operations and in different ways and  recognize the ethical political urgency. and one of the reasons I think machine learning as  it actually is as opposed to a lot of the glosses  

On it as AI that would make it just absolutely  continuous with other 20th century sort of things and the breaks that all of our speakers are noting  is that machine learning is very much premised on   a lot of the epistemic critiques that humanists  would have made of strong AI and other sorts of  

Things as being empirically unsound as projecting  a particular vision of reasoning onto all of   humanity of being incapable of understanding  diverse perspectives of being limited. and yet, for all of that which is  constitutive of machine learning,   machine learning has even a greater hype machine  that’s of greater political importance and I so  

I think all of these papers charge us  to think both about machine learning in   practice and what that means and different  layers of what I’ll gloss as kind of hype and thinking now they differ in a way and  I can’t get at the texture of all of them,

So I’m gonna sort of two big questions that  I’m interested and I think they’re useful for   these papers and perhaps for some others  is the first to ask the authors, plural: To what extent do you see your work as  doing two different forms of critique?

I gloss them as nerding in different ways  one is it’s a critique that is internal to   machine learning and resolution and is  open to bettering the very limited form   of knowledge or that machine learning claims  in a technical sense through say better more  

Representative data and ground truths and  we saw that that figures very prominently  in the paper on medical AI as a  really important one facet of them, the other is better-delimited claims that we saw  in in Grill’s paper about not even using accuracy  

Maybe as even rhetorically but understanding in  fact what it is that machine learning claims we   saw that again with the paper on how to accurately  talk about compass data, right, in the last one. So those are that that is an internal  critique and that has been part of the  

Way fairness has been instrumentalized  in CS but I think it can go beyond that. The second is the external need for  radically different forms of knowledge   making and institution building and  this could range all the way from   “that metric is wrong” to “metrics are  exactly the wrong” to of more concrete  

Answers like “these technological schemes  obfuscate the real sources of political   inequality” and we see invocations in our  authors while also dealing with this other and then in the last paper questions about the   ethical self-making involved  with these forms of knowledge.

Now what I think is interesting is that all the  papers work at different levels between both of   these ways and there’s an ethical urgency and  an epistemic urgency and there’s a I think a   care with not flattening machine learning into  something that it’s not what that you read the  

Hype you might think machine learning is  something very different from what it is. And I don’t know whether this is fair but in  your paper you recommend, you know, you talk   about remaining accuracy and Ferryman and Kane’s  paper about discussions of ground in Campolo and  

Schwerzmann you have this bad they didn’t really  talk about this they just this great discussion   in their paper about machine learning and uh and  the use of 23andMe data to draw sort of inferences whereas the Mishra paper is much more  skeptical of data in general, right,

It has less of the it it doesn’t say  propose that actually we should look   at on we can build better systems  for onions now that leads me to ask to what extent in this conversation does a techno  solutionism defang some of your critiques? What  

Ways do your papers contribute to better technical  social making and what ways do they reject those? How to think about those levels of your project?   Because I find they’re so  rich in their articulation. A second set of questions I think all of them  say well there’s also a danger of worrying too  

Much about technical solutionism and in the  case particularly the medical AI paper really   brings out that many of the problems with  the medical AI are because they automate   pre-existing fundamental algorithmic issues  of who has clinical authority and that what  

Who that clinical authority applies to and  so the problem is that the subjectivity of   the ground truth when automated amplifies and  accelerates but the ground of critique is at   machine learning but cannot be just machine  learning because it’s far more fundamental. Okay, I’m not going to go I I’ve already said way  

Too much I’m just going to  get back to where I began. I think the richness of these papers is  again looking concretely at what machine   learning is in these contexts and then  taking very seriously the inferences the  

Hype that builds around them and I’ll just  list a few and I won’t go into detail but one is a gap between the modesty that in  some sense the epistemic modesty that’s   at the heart of machine learning as opposed  to Old AI versus the over promising world of  

You know Liberation through data, better farming  through data, a subjectivity gap in which again   machine learning is premised on using human  subjectivity in many cases as its ground truth but even in things without ground truth  like training a a large language model  

That is human data right it’s precisely  it’s human-based data and the last paper   really helps us see that this is these are  connected to other gaps an inference gap   about the gulf between how inferences are  presented and what in fact is authorized  

A novelty gap and that was very clear in  the paper on farming practices where the   continuities with long-term transformations  the Green Revolution are very clear but   claims of radical novelty are both false and  enormously important in the stories we tell.

Above all there’s a kind of hype gap,  right, all of these are kind of a hype   gap but we can’t just in our operations  just strip away the hype and get at the   real systems because hype is part  of the real systems and those are  

Different levels of analysis that I think  are really that I I commend and I really   learn so much from our speakers in helping me  through think through those different levels. Okay, I will end there and if there’s anything you   want to respond to that to  those questions, go ahead,

And I’m sure our audience is extremely good with  questions so we can turn things to you soon. [Kadija Ferryman] I have uh thank you so much  for that bringing all the connections together   and you know you really brought up one of  the kind of concerns or fears or something  

That we had as we were you know writing this  paper which is of course in you know draft   form and so there’s more we have more to think  through but one of the things we wanted to that we didn’t want our paper to come off  saying is that a better that we want  

A better ground truth that we want to  tell the FDA “hey what you have for   your ground truth here is bad because it’s  only based on three radiologists so what you   need to do is just make sure you get 10,000  radiologists or 40,000 radiologists” right

Like that is actually not what we are  thinking about as the solution right because you know for me as an anthropologist  right like I believe that there is truth in   one account like in a single account there  is truth and that you know within a single  

Account is the truth of the whole  world right so it’s not necessarily   about having more numbers or a higher  quantity of accounts to make a better   ground truth so I’m really glad that  you sort of brought that up but it

I think for us what we are thinking about is  what well you know the kind of first step is   just making going back to the idea of sort of hype  is just making sort of bringing this up and sort  

Of exposing this right so saying that this is  what this is how this process is working right   like right now this is how these developers and  agencies and this is what the FDA is accepting   as ground truth and to approve or clear these  devices right and we think there is we think  

There’s value in just sort of doing that and so  it’s to the next step of sort of like well what   do we do next or what are our recommendations or  what do we you know where we want to go from there

I think the next thing that we’re sort  of considering is thinking about how   what that ground truth how that ground  truth is operating in different ways so one it’s establishing this epistemic  authority of the tool itself but it’s   also establishing the epistemic authority of

Yes these three radiologists in the case of AIMI  Triage and what is that doing what does that do   when the ground truth for a device is constructed  as the knowledge of three US radiologists right what does that say about how we are thinking  of the basis of medical evidence in you know  

Today right that that medical you know evidence  for this particular tool can be based on three   US-based radiologists and it’s sort of like what  kinds of images and longer histories of expertise   and knowledge generation and where knowledge comes  from and where you know privilege to make these  

Kind of pronouncements right like how is just  you know saying that this is an acceptable ground   truth enacting so many patterns from the past  and pushing those things forward into the future so I think there’s part of the sort of  like thinking about what that account  

As ground truth is actually doing  in the world is as the second part   of what we think why we think it’s  important to sort of bring this up so again not necessarily to say “hey let’s  you know have more ground truthers” right

But and this is the last part I’ll you know the  last thing I say is that but we also do right   we neglected to say in our intro we are both  situated at the Berman Institute of Bioethics  

And the Health Policy and Management Department at  John Hopkins right so we actually do have a great   opportunity to be involved with policy makers  right so with agencies who will come to us and say “great you found these  limitations with our process,   what should we do now to make this better?”

So there’s also that process of like you  know wanting to think about how to actually   operationalize some of these recommendations  that we might have for a distinct policy space so [Katia Schwerzmann] I just want to add  something to what Kadija just said I  

Think to insist that a single account can be  true is really important and I because what we   show in our paper is that asking for more data is  actually contributed to the naturalistic claim. The idea that the map could cover  the whole territory so that’s just  

By adding data we would come closer  and closer to the truth so that is a   claim that’s I think on the basis  of our paper we would question [Gabriel Grill] Yeah, yeah. Thanks  for the summarization and questions,   yeah. One so I think you pointed  two important critiques here about  

Yeah improvement and what– like again  the limiting claims but part of the so I hope this comes out in a newer version of  the paper better but part of the project is   also like moving away from to some  degree from debate that is centered  

Around sort of a discourse on rationality  in some sense and focus on yeah politics I mean that’s I think the like,  I think accuracy is like this   gateway towards like ideas around like  accuracy itself sounds like again like   if you make an accuracy claim like  you have this some Universal notion

It comes with these ideas of universality  embedded in it to some degree yeah and, yeah I hope with the paper not to  make the debate about how do   we improve rationality improve accuracy  necessarily or some notion of accuracy

But I can highlight how accuracy is deeply  political and how we need to or machine   learning and like this broad discourse  needs to engage more with the politics I mean I think there’s lots of research  which which is trying to do that yeah and  

And one danger of like engaging like I in the  paper like so much with like these technical   claims is that it continues I think this  this rational discourse instead of like   moving to yeah questions around politics  which are I think central again to testing  

And deciding on whether something  counts as accurate or ground truth. [Matthew Jones] Yeah. Do our  Zoom wants to comment at all? Yeah do we have any comments? Kushang, did you want to say anything  did you want to say anything?

[Kushang Mishra] No I think, I echo what just  Gabriel said that maybe we don’t need to just   think about you know focused on a technical  aspect of it in terms of what will accuracy how can we improve accuracy but essentially talk  about whose accuracy are we talking about I mean

Coming from my own example like um the  the sorting and grading machines for   onions they are accurate for the purpose of  exporting those onions to external markets   in the west to other countries which  can offer that kind of money but then  

What the onion vendors the local onion  vendors what they want essentially is, “do we even need that accuracy because for  us the market is is the domestic market” so are we creating these technologies  just to serve a certain quality which   can only– serves a certain Market  a certain understanding of what that  

Who can define what that polity essentially means and so we need to probably think about the largest  political political and economic questions as to   who are we building these technologies  for essentially. Yeah so that’s my– [Matthew Jones] Great. No, I love  those answers and it really makes  

Me think about the way in which  papers in this space generally   speaking can be instrumentalized in  ways that we find worrisome right that readings that do accord with bits of what we   might be doing can be disaggregated  from larger other kinds of critiques.

Okay. I think at this point we  should open it up to the floor   given the quality of the questions that  we’ve been having and then to the Zoom. So let’s begin with the floor  and then we’ll go to the zoom. [Question from the audience] I’ll  introduce myself. I’m David Stark  

Of Sociology Columbia these are three–  I want to talk comments on each of these   three things which it’s a pity we don’t  have a half hour for each one of them.   The problem of having three great papers and  want to make comments is that the comments  

You had problems doing it in 12 minutes  comments have to be extremely cryptic. Okay so starting with with  Gabriel, my question to you is  Under what circumstances in the problem of  machine learning does efficacy depend on accuracy? Okay so you mentioned you wanted to contribute  to the new sociology of testing so Noortje  

Marres and I just recently added this special  issue of British Journalist Sociology and   one of the things we are seeing in the new  sociology of testing is in a way a kind of   move from thinking about the test results  to thinking about the results of the test

And a key paper for us in that volume is this  wonderful paper by Joan Robinson on the home   pregnancy test and what she does in that paper  is think about what is the result of the test so

How is it that the test has results for the  social relations of the woman who just got tested? and she has all kinds of examples great great  great research behind it so has do the affect   the relation to the father to the mother-in-law  to the swimming coach and other things by the  

Way she has a prior paper about the testing  of the medical device at the FDA in which   a critical thing was that a judge ruled that  pregnancy was not a disease which is important. Okay that moves us to FDA and Kadija and Odia.

So my question to you it’s a kind of thought  experiment about a two by two table in which   you have accuracy and explainability and both  could be either positive or negative so like, what matters if they’re both positive like  you have accuracy and explainability like  

When does that matter could you get away and have  efficacy with neither accuracy nor explainability   or under what circumstances could you have  like low explainability but high accuracy again or low accuracy but high explainability  and like how would that work in different  

Kinds of settings and problems in even  just for example the medical field. I think there’s something there  you had three things I forgot the   I remember accurate you call it  ground truth and accountability I   called it accuracy and explainability but  I think there’s really something there.

Okay so for Alex and Katia, super  cryp. I love this paper very very   much I think it’s so so interesting but  to you think about a set of oppositions   and I’m going to give you three and  then wondering like how does rules  

And and examples fit into that or not  and just it some spark some thinking so the first risk versus uncertainty so  and economic sociology there is this idea   of Knightian uncertainty which is not a situation  of risk where there is calculability somehow the  

Future can be seen in some kind of probabilistic  terms and Knight says, the economist Frank Knight,   “uncertainty is a situation where all bets  are off” like we can’t assign probabilities   to the future. Okay but that was just so  risk, uncertainty, calculation, judgment

You can kind of see how that sets up there’s  another confidence and trust which are not the   same I can assign a confidence an interval  to something I can be high or low but I but   trust is so you see how these kind of line up  uncertainty judgment trust conf confidence and  

Then rules examples so does it fit in those and  how and I love the idea of this indexical the   pointing the showing as opposed to the telling  and we would have to talk all afternoon about   Weber’s three types of legitimacy in your  fourth but it’s really great, thank you.

[Katia Schwerzmann] So I want to jump on  one of these couple of opposites that to   me is really fundamental when  we think about machine learning   and algorithm it’s the opposition  between calculation and judgment  and the necessity to reintroduce this difference  a judgment is not reducible to a calculation

Because it entail an interpretation and so I think  that machine learning may present itself as pure   computation but we know that judgment enters in  many ways at different moments in the process.  I think an issue is in the genealogy that we  the current development of machine learning  

There is this desire to move away from  judgment and to use scaling so data and   models and to present it as a way away  from judgment as pure computation as if   that would be possible and I think it is not  and I think it’s a it’s a very problematic  

Claim that goes in the direction of this  kind of naturalism that we point out yeah. [Alexander Campolo] Maybe I could just have at  one thing I think the yeah the calculation of   judgment is interesting like I could just  speak for myself you know maybe not Katia  

But like I was also interest like the Daston  work on rules I see is like her project is in   you know kind of implicitly is is a desire  to reintroduce judgment into rules you know I think like she sort of is not happy with the  kind of algorithmic sense of thin algorithmic  

Rules that we have here and I guess I guess  one possibility maybe that that our paper   sort of raises is like we have to think  about judgments differently like the the   first one of in the someone in the first panel  talked about model selection and we talk about  

Like all these kind of like normative things  that have to happen for data to be turned into   examples so maybe we look at judgment you know  judgments and calculation not as a binary but   as this sort of like messily assembled sort  of thing I think another way we could think  

About it too is this like predictability too  which I think has to do with calculation too   so like predictability in the in the kind  of programming paradigm had to do with this   deterministic relationship between binary  output States you know theoretically not  

Much can go wrong there whereas here we see  like a much more I don’t I don’t even know   how to describe it if we do it well but like  like ways to associate in inputs and outputs  

States in this more kind of like stochastic but  still like very powerful way you know I’m still   kind of grappling with this sort of like  forms of predictability type thing so yeah [Katia Schwerzmann] And just a last remark I  think what you say about external and internal  

Critique at the beginning was very was one of the  difficulty for us because all the time we have to   weigh between what the computer scientists tell  they are doing what we think they’re doing what   the technology is actually doing and all that  points in slightly different directions and  

So that is one of the difficulty and and it’s  very messy I think you use a correct word here [Odia Kane] I’ll take the question that you   asked in terms of what matters  most accuracy or explainability I mean my reflex says explainability  but then I hearkened back to one of  

The key questions that we had in  our paper towards the end which is “how will the commonplace use of  medical AI influence our account   of biomedical practices like research and  health expertise as well as practices?” and I think that’s something critical that we  kind of miss out when we talk about medical  

AI and just the tools themselves is when they are  are actually used in practice on patients and what   that dynamic between a physician or specialist  and their patient might be and explainability   and communication is critical in that sense so yes  patients obviously want accurate results and they  

Want accurate answers but above all they want to  know what’s going on and there’s one you can say   that your algorithm is right but your algorithm  explaining why it’s right or more importantly   why it might be wrong is really critical in the  discussion that we need to have about accounts  

Because there is this preference there’s also  this reflex when we talk about quantifiable   measures to just rely on those numbers and those  results and we talked on this panel at length of   how this idea of accuracy is murky in general so  being able to explain whether it’s the algorithm  

Whether it’s the people who work with the  algorithm what these outputs are and what   they mean especially when we are dealing with  patients and there are some algorithms that go   as far as to diagnose certain cancers that’s  going to be super critical to disentangle. [Gabriel Grill] Yeah, thanks for the question.

Yeah so when I talk about accuracy then I mean  a specific expert discourse to some degree which   also influences right sort of policy public  understanding which sort of like flows into all   these different areas and efficacy is something  which I understand is situated and in in the paper  

I discuss a bit more how I mean I had like one  slide rethinking accuracy which was very short a   bit more how other fields beyond machine learning  have sort of dealt with issues around accuracy   and efficacy and how it would be important for  machine learning as a discipline to learn from  

Those fields so I think there are fields who have  figured this out much better than machine learning where we are again in this we just need to  look on Twitter where people post accuracy   yeah accuracy numbers now with ChatGPT like  they post like a bunch of benchmarks and say  

Look our model is this great and then people are  like questioning that because they’re saying the   benchmark so it’s sort of like uh back and  forth um yeah so I think there are sort of   ways that could be taken to make accuracy more  useful and correspond more to like sort of like  

In yeah sort of a stronger correspondence to  sort of efficacy in some sense but no matter   the case or no matter if accuracy works or  not but it has this right social effects as   you also discuss and yeah is performative and  that’s really important to always consider.

[Matthew Jones] Aaron, we are at the  end of our time. Do you want to call   it now or leave room for another couple minutes? [Aaron Mendon-Plasek] That’s great question. [Matthew Jones] There’s clearly a lot I  mean I can talk about each paper for like  

Three hours but you don’t need that  I have 90 slides, if you have time. [Aaron Mendon-Plasek] Maybe  Kushang wants to add anything? Yes and there was one question I could you ask. [Mendon-Plasek] Is it super fast? Yeah I could be really quick. [Mendon-Plasek] Okay.

[Aaron Mendon-Plasek] So Kushang  do you want to say anything in   response to the previous question  before we take the last question? [Kushang Mishra] No, thank you. [Aaron Mendon-Plasek] Just  making sure. Okay, please. [Question from the audience] I’ll try to be  really brief I have a few questions for several  

The panelist so uh my name [__] I am a student  at Harvard in STS so that’s where my question– I was just saying I’m a PhD student in STS  Science and Technology Studies so that’s   sort of where my questions are coming from  my question to Kadija and Odia are you know

I was really interested in kind of the  way you spoke about epistemic authority   and the way it changes what medical evidence is I was wondering what you think the sort of  ontological payoffs of that is so now that   medical evidence has been reshaped  by AI, what does it do to kind of,  

What it means to provide good  Healthcare or good medical advice? Does it sort of reconstitute what  it means to be a patient or a doctor   and the kind of relationship of what good  Healthcare and good medical advice means? My question to Kushang also. Thanks  for such a good presentation online,  

I know it’s very difficult to do. My question  to you is you know you emphasized at several   points how there are differences in the kind of  needs and advice that farmers give to themselves   and to their friends and the way in which  these Agri-tech companies provide advice.

I was wondering if you can say a little bit about  the political economy dimensions of it sort of how   is it that needs for generating capital and  profit are intertwined with the the kind of   advice that these companies are able to give or  the kind of modeling practices they have so the  

Implications of their need to build Capital on the  kind of modeling in AI that is that is possible? and finally to Alexander and Katia, thank you  again. My question is to do actually with the   just the very last point Alexander that you  made about kind of ethical framework so you  

Brought up of course Kant and Descartes  and other really influential ethicists   uh you know a lot of their work can  be read through the lens of Ethics I was wondering, how you think the kind of ethical  problems themselves are being reshaped through AI?

So not so much technological solutions  but the kind of ethical problems that   one makes up um how is that different  from uh earlier paradigms. Thank you. Yeah, Kushang, do you want to go first? [Kushang Mishra] Yeah sure.  Thank you for the question.

So in terms of the political economy the  way these algorithms are built not just   in India but across the world is I  mean these Agritech companies they   they cater to larger farms I mean  and and to certain kinds of crops

So for instance certain cash crops like grapes  etc are catered to which are uh you know more   profitable in that sense as compared to let’s say  wheat or rice which a small farmer usually cruise secondly in terms of the kind of farmers  that they target as of now they mostly  

Target farmers and even even the interviews  that I had with Farmers the main interviews   were with farmers who can actually afford  to have these kind of sensors installed and   even they themselves feel you know that these  sensors they they do not really provide the  

Kind of advice that we require but yeah in  terms of the political economy of the kind   of Agritechs you know system that is being  that is coming up in India it is largely   towards more larger farms farmers who have  more money and towards crops who are cash  

Crops which can generate that kind of money  so I think I hope that answers the question. [Alexander Campolo] Yeah so as regard to this  this question of Ethics, yeah it’s a complicated   one and we’re probably not so precise but I think  the the general impetus of our paper is that we  

Understand ethics here like less in the in the  position that like we want to say this is good   or bad or certain people should like do this  or do that and rather to ask questions like how do these techniques make different  ways of regulating human conducts possible  

And then like what are the kind of what  effects do these are these likely to have? so again like I certainly understand  if people aren’t satisfied with this   and say you know don’t you why don’t you  take a position or this but I but I what  

I would hope is that with with these kind of  uh analytical tools then you know other like   you could sort of like build more convincing  normative positions on top of those so it so   I’d say it’s a sort of sociological you know  approach ethics are like a you know Foucaultian  

Kind of like “techniques of the self” or  “conduct of conducts” that type of thing. [Katia Schwerzmann] So we mentioned  Descartes, and Kant, and Weber in the   context of our genealogy meaning that  we are not relying necessarily on them   to provide categories and allowing us to  judge these technologies now I think we  

Need other frameworks because the categories  are transformed by the technology so and and   the critical dimension our position I think  are pretty clear but we think that we are   going to develop it in further paper here  it’s it’s like a more general framework.

[Kadija Ferryman] Thank you so much for that  question and you know a couple things so you know   when thinking about the sort of on ontology of of  healthcare and what it means to be a physician and  

What it means to be a patient in the context of  you know ever increasing development and use of   medical AI tools it I think that we talked more  about it in the paper and we mentioned in the   presentation that we’re trying to work through  this idea of the algorithmic account that was  

Proposed that sort of has these two senses right  of like an account of something and being able   to be open to an accounting and in that paper  they give an example of the introduction of   AI to Transportation Security right so people  who typically you know look it’s like security  

Personnel who at airports and railway stations  who looks at look at tons and tons of videos and   bringing an AI into that space to help essentially  kind of triage instead of them having to look at  

Hundreds of screens or a bunch of images all the  time that the AI would sort of predict and flag   for them suspicious images right so and what  was interesting about that account and how I  

Or that description of the AI being used in that  space and how I relate it to the medical space   is that and this is something we neglected to  kind of talk about either in the paper or the   presentation is that within medical AI used  in radiology is actually highly deemed highly  

Accurate so accurate that it has incited fears  that radiologists will lose their jobs because AI   in radiology is so accurate and there have been  you know tests showing that AI is you know more   accurate than human radiologist so there’s sort  of this context this background context of like  

Are human radiologist going to lose their jobs  to AI and part of what we are trying to sort of   think through by um thinking about this idea of  an account is that with a tool like AIMI Triage  

Right AIMI Triage is it just like the security  advisers because what [__] argues in that account   is that it’s not that the using this tool it’s  shifting the way those security folks interact   with so instead of looking at a ton of data  and then trying to pinpoint they get a smaller  

Amount of data but then they actually usually  when they’re presented with something that’s   flagged as suspicious they ask for more they can  ask the tool for more background and more images   right so the argument there is that it’s not  sort of putting this AI in this space is not  

Putting security people out of a job because it  can flag suspicious bags in an airport it’s just   having them do their jobs in a different way and  so that’s what we’re thinking about in terms of   some of these AI tools in radiology maybe it’s not  that it’s putting it’s going to put radiologists  

Out of business but the way that they interact  with sources of data in their and the way they   do their jobs will be shaped and will be different  because of the AI so I think for me that’s one way  

Of getting to your question of sort of what is  the ontology of these things like what does it   mean to be a radiologist now and how is that going  to be different with the introduction of you know  

These kinds of tools and I think it remains to  be seen right and I don’t know I don’t think   that these you know tools will put radiologists  out of business if you will but it will sort   of change the way that they’re interacting with  data with information with patients you know etc

The other you know the other important issue too  is and it sort of it also ties back to epistemic   authority is sort of thinking about what counts as  evidence and in some ways right we can see the way  

That these tools are sort of enshrining a certain  a particular set of US-based medical expertise as   the expertise right as the ground truth for these  tools but there’s also some really interesting   work in computer science especially when these  tools fail or when they’re when they kind of break  

Or are brittle work computer science researchers  are sort of looking back and seeing why they do   it and and and are able to sort of test with  different inputs if you will and there’s a   great article where an AI that was trained using  you know physicians expertise did not work well  

Sort of had an overall accuracy rate but then  when you looked at it across demographic groups   it did not work well for certain marginalized  demographic groups racial and ethnic minoritized   groups and they said well how about we actually  sort of not try to get better ground truth and  

Get you know more data more representative data  why don’t we actually not train this model using   physician data from physicians at all why don’t  we use patients reports on pain and and this   was a study about about knee pain and images  in the knees and when they train their model  

Using patient reported data those gaps that they  saw between um groups kind of shrug right so I   think it’s actually providing there are some  openings for where uh there is a possibility   for epistemic authority to be reshaped as well  as these tools are being developed more and  

Then are breaking in practice as Odia was saying  right like so much of how we need to think about   these things is how they’re actually working and  when they’re implemented and so there is I think   an opportunity when they are used and then they  break to say well let’s actually think critically  

About what we’re what we’re enshrining here as  epistemic authority and can we use something else   can we like totally think of something else  as our you know ground truth for expertise. [Matthew Jones] I think we can genuinely  say that all four papers are something to   greatly look forward to to think differently with.

So let’s thank all of our presenters both present  and virtual thank you, Kushang, for joining us. [Kushang Mishra]   Thank you so for the opportunity. [Matthew Jones] I’ll see you all there

Share.
Leave A Reply