MedAI #104: Reveal to Revise - How to Uncover and Correct Biases of Deep Models

Title: Reveal to Revise: How to Uncover and Correct Biases of Deep Models in Medical Applications

Speaker: Maximilian Dreyer

Abstract:
Deep Neural Networks are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions. This poses risks when deploying these models for high-stakes decision-making, such as in medical applications. In this talk, we will explore the latest techniques to reveal and revise model biases. To reveal model misbehavior, we will study Explainable AI methods of the next generation that communicate model behavior using human-understandable concepts (locally and globally). To revise biases, techniques based on full retraining, fine-tuning or no additional training (post-hoc) are discussed. At last, possible ways to evaluate the success of bias unlearning are presented.

Speaker Bio:
Maximilian Dreyer is a PhD student in the Explainable AI group led by Sebastian Lapuschkin and Wojciech Samek of the Fraunhofer Heinrich Hertz Institute in Berlin (Germany).
His research focuses, on the one hand, on developing XAI method that are human-understandable, insightful and yet require low human effort. Secondly, Maximilian works on frameworks that allow to improve AI models based on XAI insights. Specifically, his research focuses here on revealing and revising model (mis)-behavior. Maximilian obtained is B.Sc. in Physics at Humboldt-University of Berlin and M.Sc. in Computational Science at University of Potsdam.

——

The MedAI Group Exchange Sessions are a platform where we can critically examine key topics in AI and medicine, generate fresh ideas and discussion around their intersection and most importantly, learn from each other.

We will be having weekly sessions where invited speakers will give a talk presenting their work followed by an interactive discussion and Q&A.

Our sessions are held every Monday from 1pm-2pm PST.

To get notifications about upcoming sessions, please join our mailing list: https://mailman.stanford.edu/mailman/listinfo/medai_announce

For more details about MedAI, check out our website: https://medai.stanford.edu. You can follow us on Twitter @MedaiStanford

Organized by members of the Rubin Lab (http://rubinlab.stanford.edu) and Machine Intelligence in Medicine and Imaging (MI-2) Lab:
– Nandita Bhaskhar (https://www.stanford.edu/~nanbhas)
– Amara Tariq (https://www.linkedin.com/in/amara-tariq-475815158/)

Hi everyone welcome to 105th Stanford Med Group exchange session this week we have Max uh Trier from University of Berlin here with us to talk about his work on uncovering and correcting biases in deep models Max is a PhD student in explainable AI group uh and his research

Is focused on developing explainable AI methods that are human understandable insightful and yet require very low human effort so max thank you very much for joining us today before we start do you have any preference on how you would like to take questions uh can we

Interrupt you in the middle or do we have like dedicated breaks in your talk for questions um feel free to just interrupt me if you have a question thank you very much uh everyone let’s try to make this session as interactive as possible and without further Ado let me hand it over to

Max Thank you very very much for the introduction amera and thanks also to Nandita for inviting me today to uh present some of my methods and also to talk about how to uncover and correct biases and deep neur networks for medical applications um let’s begin so first I want to just

Say that in this talk I focus mostly on deep neuron networks meaning convolutional neuron Networks multi-layer perceptron and in principle or Transformer based architectures and um regarding the data types I work mostly with image data but most of the methods I’m going to present are also applicable to time series or text data

So um recently the trend continues and deep new networks are very popular they are um very successful in playing games for example beating humans or also an image and text generation as we can see with J gbt for example uh deep n networks are also successfully applied for medical task like skin cancer

Detection however they are not perfect they also do mistakes for example they can be manipulated or they can rely on spous artifacts in the data and this is especially uh problematic for safety critical applications like in medicine the origin of such Spirit behavior is often the data

Itself and here I show you some spous correlations I encountered for example in um some open Benchmark data sets and maybe you can already guess what some of these correlations are there are for example skin markings um medic instruments band aids in the image net

Data set or also in the real world cats usually uh or in some case correlate with cartons then there might be also color shifts low resolution uh training samples shifts and brightness dual for example a different medical um devices or uh blurry backgrounds whereas these artifacts here on the left they are

Localized the ones here on the right they are unlocalized meaning that they spread over uh the over the entire input features or pixels in this case and now when we have biases or spous cations in the data then during training the model is also likely to

Become become biased in some way this is because these biases or artifacts they correlate with one of the output classes or targets and then and um this is the model becomes especially biased when this when these correlations or are easier to detect than to to perform the actual

Task so and uh usually deep Nur networks they can be seen as a sequence of layers and in these layers we can identify subunits that um I refer to as neurons here and these neurons they act kind of as feature extractors and usually we can always Identify some neurons that correspond to a

Bias the question my talk addresses today is first how to how can we reveal such model misbehavior then how can we correct the model and last how can we evaluate if we have have success successfully corrected the model so let’s begin with the first one how can we uncover

Biosis and here I show you four examples of medical tasks for deep new networks for example the first two are skin cancer detection tasks uh the third one is uh gastrointestinal track classification and the fourth one is um bone age classification of radiog gra and for the first two for example

The uh neural network says that it’s benign so there’s no melanoma for the third one it’s also harmless and for the fourth one uh this hand is supposedly corresponding to a human uh with an age of less than 46 months the question that now can arise

Uh or important also in practice is why why is this prediction taking place uh what are the relevant features here and here the field of explainable AI addresses this question and tries to open up the blackbox of deep Mur networks and the first generation of explainable AI are so-called um heat Maps or

Also called attribution Maps these give for each prediction um such a heat map and here I show heat maps from the lrp method um layerwise relevance propagation which is also from my lab U popular alternatives are also red Cam deeplift or integrated gradients and what we can see in these

Heat Maps is um are usually um areas here marked with a a dark red color that are relevant for for the prediction and in a blue color here are features marked that speak against the prediction and in all these four cases we can uh clearly see some um series

Behavior for example in the first one it focuses uh quite on the background uh on in the second one on this band eight and the third one the model seems to FOC focus on uh the green patch and for the last one on this lead lead marker which

Indicates that this is the left hand so these heat maps are very easy to understand and they’re very simple and this makes them also very popular in practice however heat Maps can be ambiguous because they only tell us where something is relevant but not what exactly the model sees there for example

It’s not clear whether um here the texture is relevant the form of an object or the color and also in order to understand the Model Behavior on the whole data set um looking at individual heat Maps is quite a challenging task uh because I mean the bias could be present in only

1% of the samples alternatively to heat Maps which are also callede local explanations um there Al there’s also Global um explainable Ai and one popular technique is called feature visualization which tries to reveal the role of individual neurons and here for example we can take um a specific neuron and then collect

Over the data set the most activating image patches and in this case we can identify uh one neuron that corresponds to this band a artifact and there’s also one corresponding to a skin markings so these feature visualizations they kind of allow um a global model understanding um however um there are also

Problems because these neurons they are not necessarily very human interpretable they can be for example polysemantic or also redundant which kind of makes the interpretation more difficult further and which is also quite important is that we do not gain an understanding for individual samples so we do not know how these features are

Necessarily used in combination as a side note here on the right there’s also the idea of concept activation vectors these vectors they don’t refer to single neurons but they describe a direction in the latent space so they can be seen as kind of a superposition of neurons and here usually you uh

Beforehand Define a set of predefined Concepts that you um have in form of data sampls for example here you have sampit with this with stripe texture and then you use these samps to find a characteristic Direction in the latent space um the advantage is is that you

Kind of know what these Concepts should represent however um you’re potentially missing Concepts and when you want a probe for spirous buyers you the likelihood is also high that uh you don’t know what you’re looking for so these Global explanations um again they give you a global overview

But you do not know exactly what happens for individual samps how these features are used in combination can you combine this global view and the local view um yes you can and this is also what we’ve or I’ve recently worked on so here on the left you can see the the local

Explanations explanation side where we compute the heat map and on the right you can see the feature visualization and such a heat map is computed as follows um we start at the output and then we propagate relevances through the layers until we reach the input and then uh in the input

Here we received this heat map and in the backward pass here usually neurons that contributed strongly for the output in the forward PA also receive a lot of relevance in the backward path but what is important here is that when we do this propagation once we have

For free also relevances not in the input but also here in the latent space so we for free also get the relevance of these latent concepts of these latent neurons so we know for for one sample which neuron is relevant and we can do even more we can

Restrict this backward path and we can compute a heat map specifically for one neuron by only propagating the relevance FL flow through this neuron and stopping the relevance flow through all other neurons okay this summarized um is given by this by by our method called concept relevance propagation which combines the where and

The what question and with CRP we um get for each concept um so in this case it’s each neuron um we get a localization a heat map that allows us to see where this concept is present we get also Global relevance scores for each latent concept for

Example here we see that the snout and the fur concept are most relevant and we also have these feature visualizations what does it mean in medical cases for example here I give you um examples where I only show the um the uh two most relevant neurons and um here for example we see

Uh two that kind of um uh point to um the outside and to this red color um for the second one we have one concept that focuses on the mo but also one that focuses on the band Ed um for the third one here we have also

One that focuses on a seemingly good concept so this the so in the middle but also one that clearly points to this um green patch and then the last one of course this uh lead marker all right quickly ask a question here yes so you uh are mentioning neuron

Number like neuron 220 neuron 332 but then then these neurons they are corresponding to image patches yeah let me go back so um here a neuron is a convolutional channel for example and um here um usually in De networks uh we have hundreds of neurons so I I show you the

Neurons that are most relevant and these are two and and um actually I um you can see uh here I showed you the feature visualization for uh three neurons and in this case we have five visualizations um in here I only show you the first one so you can uh think of

Having your a lot of other samps that look like this one so I just cropped it to make it a little bit simpler but the when you would look at the feature visualization for neuron 220 it would be kind of you would see red color with

Hair okay thank you and uh we have one question in the chat uh these examples are for incorrect predictions is that right all of the examples that you’re showing are these for incorrect predictions no these are correct predictions okay so the um red color for example or the band aid

Correlates with no melanoma but this would of course be uh problematic if you really had a um a malignant um patch and a bandaid for example and then it would classify it’s harmless but actually it is harmful so but in this case it’s um no problem so it’s correct but it’s

The behavior is kind of suspicious in some cases for example in the last example uh model is looking at that letter L instead of looking at the you know the actual bones but it’s still uh making a right prediction yes okay thank you so yeah with this um next generation

Of um expandable ey methods um that operate in the concept space you get much more information um however you still have to look at individual samples so exactly so exp this was I was about to ask that uh even if gives you like more intuitive explanation but still you have to look

At each and individual sample to actually derive some semantic explanation right exactly so for example like the first one when you say like no melanoma and neuron 220 you said that that is because of the red color but it can be the ede it can be the hair I don’t know how you

Interpret that it is because of the red color yes exactly so in this case um I looked at not only this one feature visualization but at um add more and then in this in this case there were only red color patches with hair yeah so exactly so the the problem

Of looking at individual samples remains and I mean also these Concepts they give you much more information but they also um enlar in the complexity but uh we can make it easier um the idea now is to summarize similar explanations so there are a lot of samples in the data set

They are kind of similar and where you receive is a similar explanation and so the idea is to summarize them into prototypes this is um also a recent work um we call it prototypical concept based explanations PCX and again the idea is to summarize similar explanations via prototypes and

Here I show you a umap embedding where I um where I show uh all concept based explanations um on the training set and we can see um that we get three distinct clusters so and again to remind you each point represents one prediction or one and the explanation for this

Prediction so let’s begin with prototype one here I show you on the hor horizontal axis for representative samples that are in the center of this cluster and now we can study for this cluster or this prototype the relevant Concepts so we can go to the concept level and

Understand what’s going on in this cluster also in detail and here we can see that the most characteristic Concepts uh which uh in this case are also neurons but could be also different concept bases uh um they correspond to this green patch alternatively so we can also look here

At Prototype 3 for example and here we can also see it in the representative samples but also on the most relevant Concepts in this cluster that this is um focusing on this uh instrument for Prototype 2 um it looks a bit better um these Concepts seem to make a bit more sense

Um the second one here for example could also correspond to for example these uh Reflections um so uh at this point uh for me um I’m not a medical expert so it’s for me easier to detect uh when something um obviously spurious is happening but um here for example it would be of

Course necessary to have also a medical expert to look at these Concepts and um also as a side note um here we understand these prototypes as a a com a combination or composition of Concepts and their relevances and not as some um parts of instances uh which is used for protopet

Maybe some of you have heard about protet so in order to describe you shortly the the whole pipeline um you take samples of one class and then you feed it into your deep neural network you compute uh explanations on the concept level so you get uh for each

Concept that you defined a relevant score for each sample and you get a vector and this Vector you can uh basically cluster and find your prototypes for now these uh Max have a question in our chat how are the prototypes defined um I’m not sure if the question

Is still valid or it’s already answered um I can say a bit more maybe on how you can compute prototypes uh basically you can use any uh any clustering method for example uh the most basic approach would be to use K means um in this case um I like to use a

Gorge mixture models which are a bit more complex but they um also have nice properties and um then you would see the mean of this gorion mixture or one of the gorion in the mixture as a prototype but the important part is that using or looking at this this kind of

Embedding and finding these clusters um decreases the work workload uh significantly and it it not only allows you to find spurious Behavior but it also allows you to gain an understanding of what the model has learned um uh on the H data set so it would also be nice to

Use for example discovering new new knowledge um yes um one parameter that is all still free in this approach is the number of prototypes um so I would say one should always also look at the embedding besides um uh besides the prototypes so you know how good you kind of uh um um

Describe your your um explanation space and it’s better to have too much too many prototypes which may might be redundant than too few okay so next I want to just briefly cover the topic of studying inliers versus outliers Max can I have a quick question yes can you go to the previous

Slide once so this concept that you are defining these are like predefined concept right uh yes and um usually I Define each neuron as a concept as a concept okay so so it’s not really like these concept are mapped to any kind of particular semantic in property of the U

Image data right like they are like just kind of random concept um yeah the these when you study individual neurons uh you kind of study the concepts that the model has learned intrinsically right right so it’s not nothing I was just curious like if it is somehow like related to the semantic of

The data no right like it’s just a neuron that you are defining as a concept yeah yes exactly in principle um there are also now more methods that um um also are more um kind of kind of summarize multiple neurons into more interpretable Concepts um you can also use in

Principle these approaches but in this case I show you individual neurons because it’s the most simple approach okay because I was thinking that um that if the individual this concepts are kind of like interpretable then you don’t need to follow the next step of uh clustering and identifying the Prototype

Individually right so I don’t know why didn’t you try the interpretable neuron instead of just taking the neuron and cluster and you mean like looking at if you have for example 500 neurons let you just look at these 500 neurons neuron and the corresponding um semantic mapping of the semantic property of the

Data so imagine like you are going with like dog versus cat so just going with semantic properties like and map this concept with a semantic property so that it doesn’t need any other further exploration of the space um can you elaborate a little bit more yeah because there are like some

Techniques now available that that we can Define each of the neuron as a single interpretable neuron and by combining those neuron you can actually map them to the semantic embedding space right what do you mean with semantic embedding space so imagine like um so your bone density uh data set right

Where you have the kids uh age so they have like a lot of semantic property that the gender of the um baby then like what is the body part of the like what is the actual like um x-ray protocol of the uh X-ray and all these things so those are kind of like

Properties of the data set right that you can map with individual images okay and those are actually can be defining as your um combination a combination of those can be represented as your prototype right so some of them has lateral marker some of them don’t have lateral marker some of them has um

A male patient versus a female patient something like that okay um then um as far as I understood you you still need kind of these um label information yes definitely yeah yeah but then in that case your label is actually related to your input data rather than exploring the space right okay

Um because the main problem that I see currently with the architecture that still you need to explore the space I understand that you don’t need to do it individually you can do on a cluster basis but still like this kind of cluster exploration is some sometime not feasible from the computer science

Developers right because they don’t know really how to map this prototype to a clinical concept yes that’s true um so if you don’t have a medical expert I think yeah it’s always good if you have some um okay more labels in your data to kind of

Also map it to the latent semantics as you said okay um but in this case I I assume we don’t have any labels besides the out output classes yeah okay makes sense thank you okay so there exist for example methods that kind of try to Cluster multiple explanations for example

PCX um but also there exist methods that uh kind of cluster uh Concepts or neurons uh in the latent space and usually these methods they they focus on outliers and in these outliers you often find spous behavior for example here for Dora you find um a neuron that corresponds to Chinese waterm

Marks um and in the um PCX cluster for example you can see here an outlier cluster with with the cats in the C in cartons and you can also um see that there are cat features relevant in this cluster however uh in this uh for this carton class there’s also this Chinese

Watermark uh concept present in the uh overall prototype so in the normal behavior of the model uh this um Watermark is used so um where outlier tend to be spurious we should Al always also check um the inliers so here I give you again an overview of um uncovering spous Behavior using

Explanation based methods and uh the first approach is kind of look at individual samples for example at heat Maps or at concept based explanations for example CRP are also craft by Thomas fail um this approach is very thorough however it’s quite expensive or unfeasible for large data sets then alternatively you can

Look at or use methods that kind of summarize these local explanations for example on the heat map level with spray or on the concept based level with PCX which uh resides in a much smaller workload and still captures uh kind of the full Model Behavior and um alternatively you can

Also study the latent representations and here you can have predefined Concepts maybe if you have more labits you can also uh kind of incorporate this into your Concepts to um identify good behavior easy more easily um looking at latent representations is also often uh requiring a smaller workload because you

Have um maybe a couple of hundred Concepts uh or hopefully less versus thousands of input samples however it is to note that looking at neurons is also um not output specific so you often don’t know how exactly are these features used you just know that the model has has learned

Them okay so hopefully using some of these methods you’ve uh found or I don’t know if I can call it hopefully but maybe you found some spous correlations um and some bias in the model and you want to correct it let’s come to the next step how can we unlearn

Biases and here I roughly group these methods into three groups uh the first one requiring a a full retraining then um methods that are based on fine-tuning and posttop model correction and whereas posttop model correction usually usually requires to modify the model itself fine tuning regularizes the model in some way and

For for retraining often the data is modified so for retraining is computationally of course uh the most costly however it’s also Al on the other hand a little bit more flexible because um um usually posttop model correction is mod specific and also once you’ve cleaned your data for example you can um take

Any new model architecture and just train it on the data let’s begin with the first method which is based on full retraining for discover and cure and here the idea is to add the bias to the data uniformly so in this example we generate or we need some

Samples of the Biers in this case it’s a blanket and a bad because in the data set uh cats correlate with the bads or blankets and then the idea is to add the sample to other classes and um here um they do they just overlay these sampit with the bias over

Sampit of other classes uh via so-called mixup which looks maybe a little bit weird but um it um Works remarkably well however uh for this method you need these bias samples and here they propos to generate them via stable diffusion but if you have some very special for example instruments that correlate with

Your data data for example medical instruments it’s maybe not possible to generate such samples with stable diffusion another uh method based on full retraining is called a fast dime and here the idea is to not add the buyers to every sample but to remove the buyers entirely from the

Data and they um propos to use uh diffusion based models and so you first have to train a diffusion model on your input data and secondly you also have to train um a bias classifier um that operates also on the input image and you further need in this

Case also localization masks for uh the artifact and then the idea is to kind of take a biased sample and then to U generate via the diffusion model a new sample which is or reconstruct this image and then you tell it via this um um mask and um by um uh optimizing to

Um change the prediction of the bias detector to Falls um so this way you generate then um a counteract rle that should mimic the original image but doesn’t have the bias anymore so as you can see this method is is quite complex but when it works it’s it’s quite

Awesome okay so these were methods for full retraining now I will present some methods for fine tuning and one popular method is called right for the right reasons here the idea is to uh penalize the use of bias features in the input and so we add an extra regularization term that basically tells

The model to not use a specific part of the input data and for this uh you need to compute um an input gradient and also uh you need to have um localization of your artifact in the input for example here with the spand aid but then with fine tuning where at first

Here the vanilla is the uncorrected model and you can see the heat map here so where at first uses the information according to the explanation it then after fine-tuning uh focuses much more strongly on the mole right for Right Reasons is quite intuitive however it’s only really applicable for localized biases and it

Requires also a localization so the three methods I just presented they operate basically in the input space and now I will also present methods that operate in the latent space on in concept space and one framework that is quite popular is called um quite big is called class artifact

Compensation there are several methods and the first one I present is called augmentive Clark and here the idea is um similar to the one of disk to add the bias uniformly but now not in input space but in Laten space and um the intuition I show you

Here so here I show you a umap embedding of latent activations for two sets of samples one one uh samples with without the spous correlation and one with the bias and in this case it’s this color artifact that correlates with a man having dark hair

Color and so the idea now is that we transform in latent space sampls from this clean cluster towards this artifact cluster so we add the information of the bias and for this we need a concept activation Vector so the directional Laten space it kind of allows us to

Transform from from the one to the other to just add the information of the bios so as a side note how can we compute these cares we need these two sets of samples ideally we have really one set of samples with the bias and one that only

Differs in the bias and not any other attribute but this is often not possible but this is the idea and then uh when we collect the latent activations of some intermediate layer um we compute here the SC and the US usual approach is to use classifiers for example as M or logistic

Regression however we found out that these classifier based cuffs are not optimal they perform quite bad because these classifiers they try to find a good separation between these clusters but their goal is not to model the concept well so to describe this transformation from one cluster to the

Other so often classifier based CS they diverge why exactly uh feel free to check out also a a recent paper uh from Frederick from our group who um talked you a little bit more about this problem and here on the right I can also show you some quantitative experiments

Where we used signal uh classifier based calves also for model correction and they perform in this case or here I show you the uh we have controlled settings where we know the true Direction and we compute the coine similarity between the calves and the true Direction and we can

See that they um they have much uh um they have they don’t have a very high Cent similarity so they don’t model the concept uh fully which when we want to correct the model uh only partially uh removes the bias and yeah in this paper from uh fedick parle he

Also proposes a different approach for Signal C okay now to another method based on fine-tuning this one is called right reason C dark and here the idea is similar to right for the right reasons we but we don’t penalize the bias in the input space but now again in the latent

Space and um for this the idea is um that we want to add a regularization term during fine tuning where uh when we add or remove the concept slightly in the activations it shouldn’t change the model output and this we can achieve by regularizing the latent gradient and the

Dot product with this C which is uh for some of you who know the TF testing concept activation Vector paper which is quite similar to the TF score and um yeah this paper we also present next week at the triple AI conference so when you are by chance also there feel

Free to reach out okay now uh the last method that is not based on training it’s just post talk applicable it’s uh this one is called projective Clark and here the idea is to remove the bias in latent space and um again this time you don’t add the bias information in later space

But you project it out and there are also similar methods um available or that have a similar idea they are called Spix and editing classifiers here editing classifiers kind of incorporates this operation into the weights of the model okay um at this point I just want to

Introduce another new point on this you can also see it as uh input space correction versus concept space correction and here in input space we are more model independent and it’s much more interpretable so we we better see what’s going on or how how well we’ve erased or added the

Bias however it’s also much more difficult to do because we need for example diffusion models alternatively there also Concepts based model correction approaches that are usually more lightweight and also more Universal we can apply them to localized artifacts but also to unlocalized artifacts however they are a lot less

Interpretable because you often need to find this directional lat and space and you really what really difficult to say if you’ve modded the the bias correctly or if you have also incor or whether in this direction there’s also good information present or other correlations so you only

Know uh how well you um found this direction afterwards when you evaluate at this point I want to give you oh want to show you a small experiment uh we did also for here for data sets for the ISC data set where we we added um a texture over the input

Sampls for a bone age estimation we added a brightness increase for image net we added a time stamp artifact and for C A we use this uh color bias now the task was to correct these uh these biases in the model and we tried out different methods based on

Fine-tuning and post talk model correction and here I show you some heat maps and maybe we can begin here in the second row for this for the uncorrected model and we can see that it focuses on the background for the unlocalized artifacts um on the artifacts themselves

Also for the localized ones so the time stamp and the colar and uh then we can continue to the U correction methods um here we’ve seen or observed that the methods operating on the concept level and activations tend to only partially unlearn the biases because we do not directly enforce it on the

Model here methods based on the gradient perform better uh right for the right reason is very good for the localized artifacts however when we cannot describe it via input Mass anymore for the unlocalized ones um some weird behavior um develops in the model here um um gradient based correction on the

Concept level with right reason Clark um performed uh the most um reliable however here we just look at heat Maps now the question is how can we evaluate it also quantitatively and one possible way is to evaluate performance uh here the question is how does adding the bios for example uh affect the

Model and here for example if we have the bias in all classes we can look at subgroups of the data and when here for example um the bias correlates with class A then usually in the subset of Class A with the bias the performance is higher versus when we have it in another

Class um and the buy is present the performance usually goes down and now the the task would be to uh kind of measure this difference between uh the set with bias and the set without and uh ideally this would be as small as possible so the bias doesn’t have an

Influence this evaluation from the idea is quite easy however there could be also other influencing factors in each subgroup so especially when we do not have perfect pairs of samples so one clean and one poisoned one and so this setting is also not always given perfectly um alternatively there are also

Approaches that artificially at this bias for example here um for localized biases it’s sometimes possible to crop uh the Biers out of um bias sample and then to add it to as overlay of the clean sample and then we can measure the the change in the uh output um

Alternatively one could also leverage diffusion based methods for example you have fast dime um however uh here you you need this diffusion model and the bias detector so these methods they are more direct which is great um so you directly measure this influence however they require um kind of this bias model where

You can just add the bias to a clean sample as a third alternative uh what what you what one can also do is we uh take biased samples and then we compute an explanation and um the idea is to measure the relevance on the bias and this is also quite uh easier for

Localized um biases because you can just um um yeah measure the the spatial relevance um on in the heat map and to few the relevance the better here the advantage is also that you do not have to uh change uh the input so no input transformation is

Needed however you require in this case localized biases okay so uh if we only had one big toolbox covering all these steps uh this would be quite nice if we also had a lot of methods and we could just try out multiple methods and if we have nice

Evaluation schemes also we can then just choose the best performing one um yeah in order to go in this direction uh we also uh published one paper last year about a method called reveal to revise which kind of um aims to incorporate this whole explainable AI life cycle

And uh here’s one quick overview so R revise supposed to be um a highly automated framework for bias correction so we included two approaches for revealing biases uh one U it kind of summarizes these local explanations and one that looks uh at the latent representations

And then we also have uh in this case three mod correction approaches right for Right Reasons uh Clark and CD and um we also have evaluation uh schemes however when you remember right for right reasons for example needs input localizations um and in this framework we also propose uh a way to gener

Generate these um localizations of the artifacts automatically by finding um this uh artifact in the latent space the direction then we can generate heat mes specifically for the SC then we can use these localizations for model correction but also for evaluation okay let’s come to to conclusion so we have seen now that

There are uh quite powerful explain of AI tools to discover biases for example uh summarizing methods for example prototypical explanations like PCX or methods that allow us to study the latent representations for example D disk or CRP and then uh here we should check not only but also

Outliers and there are also already various correction methods available however uh one part uh so the evaluation I think is still really difficult and uh there should be more focus on it U because in research papers you uh sometimes only see controlled settings and um or where we have localized bioses

And I think uh it’s time to make also better use of generative models however this is not very easy but I think um I hope we can see more more methods soon and another thing that’s not really uh talked about in research papers is whether the evaluation or the method

Then or the model then is uh good enough to be regulatory compliant and uh here I also heard lately that there’s also some field called admissible machine learning maybe if you’re interested in that you should check it out so and I think for the future will be very

Crucial that we have some Frameworks or tool boxes that combine all these steps and various methods so uh uh you can easily understand the model and correct it and there are already first candidates for example reveal to revise or there’s also a unified explanatory Interactive Learning typology by

Fredrich and I think uh these Frameworks or toolboxes then also allow for a large scale Benchmark of different methods because that’s also something I haven’t seen yet um some comparison um of not only fine-tuning and post model correction but also uh uh like uh methods requiring full retraining so the whole

Spectrum if you have questions uh or want to collaborate for example apply our methods to your data domain um to not only find misbehavior but maybe also discover new knowledge uh feel free to reach out thank you very much for your attention thank you very much Max this

Was very interesting talk so let’s give our speaker a virtual Round of Applause first and we are slightly over time but if somebody has any if our audience has any questions I think we can accommodate a few so Ramon you have a question do you want to go ahead and

Ask uh yes uh so in the example where you’re showing U multiple um heat maps for the attribution methods um you had this example where you added some texture noise onto the image um was that like a localized noise or a global noise and how would your method be able to

Distinguish um that as being like a noisy concept versus like an actual useful concept because sometimes for classifications there might be useful changes in texture let’s say um yes exactly so in this case uh it’s called as far as I remember least significant bit attack um um and this is an artificial artifact

That we added and we know shouldn’t have any uh good information so we then specifically looked for this artifact and uh we knew this is bious so um I hope this uh addresses your question and this um um least significant bit attack is spread over

The whole input so it um kind of um creates this texture over the whole input so you actually did it only for one single class to create the bias or you did it for the whole data set uh we only uh did it for one class one class

Okay and the same thing for your imaginate also like when you add the time stamp probably you did it only for one class yes um for the time stamp yeah only for one class um but this time stamp is also chosen specifically because there are also imag classes like

Time clock right right which kind of uh resembles the similar shape yeah yeah so for time clock it’s actually a sensible concept and um so as a side note with uh yeah with gradient based methods you can more precisely control um that it should only be corrected for one output class

One output right right okay because I think uh you are thinking the bias more on the class specific way because there’s also a bias because that’s what my confusion at the beginning that there was a also a bias related to the data set right so imagine

That this noise can be present in one data set collected from one institution versus another institution which doesn’t have that noise you know so that there there can be also a data set specific bias rather than the class specific bias H that’s true yeah yeah okay yeah it look it’s nice

Thanks do we have any more questions for present if not then let’s thank our speaker again and thank you very much for this interesting talk and we’ll put this up on our YouTube channel um hope to see you all next week we are changing our regular time from Thursdays to

Mondays uh 1 to 2: p.m. specific time so hopefully see you all next week thank you very much thank you very much Max thank you very much have a good day

MedAI #104: Reveal to Revise – How to Uncover and Correct Biases of Deep Models | Maximilian Dreyer