Title: Reveal to Revise: How to Uncover and Correct Biases of Deep Models in Medical Applications
Speaker: Maximilian Dreyer
Abstract:
Deep Neural Networks are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions. This poses risks when deploying these models for high-stakes decision-making, such as in medical applications. In this talk, we will explore the latest techniques to reveal and revise model biases. To reveal model misbehavior, we will study Explainable AI methods of the next generation that communicate model behavior using human-understandable concepts (locally and globally). To revise biases, techniques based on full retraining, fine-tuning or no additional training (post-hoc) are discussed. At last, possible ways to evaluate the success of bias unlearning are presented.
Speaker Bio:
Maximilian Dreyer is a PhD student in the Explainable AI group led by Sebastian Lapuschkin and Wojciech Samek of the Fraunhofer Heinrich Hertz Institute in Berlin (Germany).
His research focuses, on the one hand, on developing XAI method that are human-understandable, insightful and yet require low human effort. Secondly, Maximilian works on frameworks that allow to improve AI models based on XAI insights. Specifically, his research focuses here on revealing and revising model (mis)-behavior. Maximilian obtained is B.Sc. in Physics at Humboldt-University of Berlin and M.Sc. in Computational Science at University of Potsdam.
——
The MedAI Group Exchange Sessions are a platform where we can critically examine key topics in AI and medicine, generate fresh ideas and discussion around their intersection and most importantly, learn from each other.
We will be having weekly sessions where invited speakers will give a talk presenting their work followed by an interactive discussion and Q&A.
Our sessions are held every Monday from 1pm-2pm PST.
To get notifications about upcoming sessions, please join our mailing list: https://mailman.stanford.edu/mailman/listinfo/medai_announce
For more details about MedAI, check out our website: https://medai.stanford.edu. You can follow us on Twitter @MedaiStanford
Organized by members of the Rubin Lab (http://rubinlab.stanford.edu) and Machine Intelligence in Medicine and Imaging (MI-2) Lab:
– Nandita Bhaskhar (https://www.stanford.edu/~nanbhas)
– Amara Tariq (https://www.linkedin.com/in/amara-tariq-475815158/)
Hi everyone welcome to 105th Stanford Med Group exchange session this week we have Max uh Trier from University of Berlin here with us to talk about his work on uncovering and correcting biases in deep models Max is a PhD student in explainable AI group uh and his research
Is focused on developing explainable AI methods that are human understandable insightful and yet require very low human effort so max thank you very much for joining us today before we start do you have any preference on how you would like to take questions uh can we
Interrupt you in the middle or do we have like dedicated breaks in your talk for questions um feel free to just interrupt me if you have a question thank you very much uh everyone let’s try to make this session as interactive as possible and without further Ado let me hand it over to
Max Thank you very very much for the introduction amera and thanks also to Nandita for inviting me today to uh present some of my methods and also to talk about how to uncover and correct biases and deep neur networks for medical applications um let’s begin so first I want to just
Say that in this talk I focus mostly on deep neuron networks meaning convolutional neuron Networks multi-layer perceptron and in principle or Transformer based architectures and um regarding the data types I work mostly with image data but most of the methods I’m going to present are also applicable to time series or text data
So um recently the trend continues and deep new networks are very popular they are um very successful in playing games for example beating humans or also an image and text generation as we can see with J gbt for example uh deep n networks are also successfully applied for medical task like skin cancer
Detection however they are not perfect they also do mistakes for example they can be manipulated or they can rely on spous artifacts in the data and this is especially uh problematic for safety critical applications like in medicine the origin of such Spirit behavior is often the data
Itself and here I show you some spous correlations I encountered for example in um some open Benchmark data sets and maybe you can already guess what some of these correlations are there are for example skin markings um medic instruments band aids in the image net
Data set or also in the real world cats usually uh or in some case correlate with cartons then there might be also color shifts low resolution uh training samples shifts and brightness dual for example a different medical um devices or uh blurry backgrounds whereas these artifacts here on the left they are
Localized the ones here on the right they are unlocalized meaning that they spread over uh the over the entire input features or pixels in this case and now when we have biases or spous cations in the data then during training the model is also likely to
Become become biased in some way this is because these biases or artifacts they correlate with one of the output classes or targets and then and um this is the model becomes especially biased when this when these correlations or are easier to detect than to to perform the actual
Task so and uh usually deep Nur networks they can be seen as a sequence of layers and in these layers we can identify subunits that um I refer to as neurons here and these neurons they act kind of as feature extractors and usually we can always Identify some neurons that correspond to a
Bias the question my talk addresses today is first how to how can we reveal such model misbehavior then how can we correct the model and last how can we evaluate if we have have success successfully corrected the model so let’s begin with the first one how can we uncover
Biosis and here I show you four examples of medical tasks for deep new networks for example the first two are skin cancer detection tasks uh the third one is uh gastrointestinal track classification and the fourth one is um bone age classification of radiog gra and for the first two for example
The uh neural network says that it’s benign so there’s no melanoma for the third one it’s also harmless and for the fourth one uh this hand is supposedly corresponding to a human uh with an age of less than 46 months the question that now can arise
Uh or important also in practice is why why is this prediction taking place uh what are the relevant features here and here the field of explainable AI addresses this question and tries to open up the blackbox of deep Mur networks and the first generation of explainable AI are so-called um heat Maps or
Also called attribution Maps these give for each prediction um such a heat map and here I show heat maps from the lrp method um layerwise relevance propagation which is also from my lab U popular alternatives are also red Cam deeplift or integrated gradients and what we can see in these
Heat Maps is um are usually um areas here marked with a a dark red color that are relevant for for the prediction and in a blue color here are features marked that speak against the prediction and in all these four cases we can uh clearly see some um series
Behavior for example in the first one it focuses uh quite on the background uh on in the second one on this band eight and the third one the model seems to FOC focus on uh the green patch and for the last one on this lead lead marker which
Indicates that this is the left hand so these heat maps are very easy to understand and they’re very simple and this makes them also very popular in practice however heat Maps can be ambiguous because they only tell us where something is relevant but not what exactly the model sees there for example
It’s not clear whether um here the texture is relevant the form of an object or the color and also in order to understand the Model Behavior on the whole data set um looking at individual heat Maps is quite a challenging task uh because I mean the bias could be present in only
1% of the samples alternatively to heat Maps which are also callede local explanations um there Al there’s also Global um explainable Ai and one popular technique is called feature visualization which tries to reveal the role of individual neurons and here for example we can take um a specific neuron and then collect
Over the data set the most activating image patches and in this case we can identify uh one neuron that corresponds to this band a artifact and there’s also one corresponding to a skin markings so these feature visualizations they kind of allow um a global model understanding um however um there are also
Problems because these neurons they are not necessarily very human interpretable they can be for example polysemantic or also redundant which kind of makes the interpretation more difficult further and which is also quite important is that we do not gain an understanding for individual samples so we do not know how these features are
Necessarily used in combination as a side note here on the right there’s also the idea of concept activation vectors these vectors they don’t refer to single neurons but they describe a direction in the latent space so they can be seen as kind of a superposition of neurons and here usually you uh
Beforehand Define a set of predefined Concepts that you um have in form of data sampls for example here you have sampit with this with stripe texture and then you use these samps to find a characteristic Direction in the latent space um the advantage is is that you
Kind of know what these Concepts should represent however um you’re potentially missing Concepts and when you want a probe for spirous buyers you the likelihood is also high that uh you don’t know what you’re looking for so these Global explanations um again they give you a global overview
But you do not know exactly what happens for individual samps how these features are used in combination can you combine this global view and the local view um yes you can and this is also what we’ve or I’ve recently worked on so here on the left you can see the the local
Explanations explanation side where we compute the heat map and on the right you can see the feature visualization and such a heat map is computed as follows um we start at the output and then we propagate relevances through the layers until we reach the input and then uh in the input
Here we received this heat map and in the backward pass here usually neurons that contributed strongly for the output in the forward PA also receive a lot of relevance in the backward path but what is important here is that when we do this propagation once we have
For free also relevances not in the input but also here in the latent space so we for free also get the relevance of these latent concepts of these latent neurons so we know for for one sample which neuron is relevant and we can do even more we can
Restrict this backward path and we can compute a heat map specifically for one neuron by only propagating the relevance FL flow through this neuron and stopping the relevance flow through all other neurons okay this summarized um is given by this by by our method called concept relevance propagation which combines the where and
The what question and with CRP we um get for each concept um so in this case it’s each neuron um we get a localization a heat map that allows us to see where this concept is present we get also Global relevance scores for each latent concept for
Example here we see that the snout and the fur concept are most relevant and we also have these feature visualizations what does it mean in medical cases for example here I give you um examples where I only show the um the uh two most relevant neurons and um here for example we see
Uh two that kind of um uh point to um the outside and to this red color um for the second one we have one concept that focuses on the mo but also one that focuses on the band Ed um for the third one here we have also
One that focuses on a seemingly good concept so this the so in the middle but also one that clearly points to this um green patch and then the last one of course this uh lead marker all right quickly ask a question here yes so you uh are mentioning neuron
Number like neuron 220 neuron 332 but then then these neurons they are corresponding to image patches yeah let me go back so um here a neuron is a convolutional channel for example and um here um usually in De networks uh we have hundreds of neurons so I I show you the
Neurons that are most relevant and these are two and and um actually I um you can see uh here I showed you the feature visualization for uh three neurons and in this case we have five visualizations um in here I only show you the first one so you can uh think of
Having your a lot of other samps that look like this one so I just cropped it to make it a little bit simpler but the when you would look at the feature visualization for neuron 220 it would be kind of you would see red color with
Hair okay thank you and uh we have one question in the chat uh these examples are for incorrect predictions is that right all of the examples that you’re showing are these for incorrect predictions no these are correct predictions okay so the um red color for example or the band aid
Correlates with no melanoma but this would of course be uh problematic if you really had a um a malignant um patch and a bandaid for example and then it would classify it’s harmless but actually it is harmful so but in this case it’s um no problem so it’s correct but it’s
The behavior is kind of suspicious in some cases for example in the last example uh model is looking at that letter L instead of looking at the you know the actual bones but it’s still uh making a right prediction yes okay thank you so yeah with this um next generation
Of um expandable ey methods um that operate in the concept space you get much more information um however you still have to look at individual samples so exactly so exp this was I was about to ask that uh even if gives you like more intuitive explanation but still you have to look
At each and individual sample to actually derive some semantic explanation right exactly so for example like the first one when you say like no melanoma and neuron 220 you said that that is because of the red color but it can be the ede it can be the hair I don’t know how you
Interpret that it is because of the red color yes exactly so in this case um I looked at not only this one feature visualization but at um add more and then in this in this case there were only red color patches with hair yeah so exactly so the the problem
Of looking at individual samples remains and I mean also these Concepts they give you much more information but they also um enlar in the complexity but uh we can make it easier um the idea now is to summarize similar explanations so there are a lot of samples in the data set
They are kind of similar and where you receive is a similar explanation and so the idea is to summarize them into prototypes this is um also a recent work um we call it prototypical concept based explanations PCX and again the idea is to summarize similar explanations via prototypes and
Here I show you a umap embedding where I um where I show uh all concept based explanations um on the training set and we can see um that we get three distinct clusters so and again to remind you each point represents one prediction or one and the explanation for this
Prediction so let’s begin with prototype one here I show you on the hor horizontal axis for representative samples that are in the center of this cluster and now we can study for this cluster or this prototype the relevant Concepts so we can go to the concept level and
Understand what’s going on in this cluster also in detail and here we can see that the most characteristic Concepts uh which uh in this case are also neurons but could be also different concept bases uh um they correspond to this green patch alternatively so we can also look here
At Prototype 3 for example and here we can also see it in the representative samples but also on the most relevant Concepts in this cluster that this is um focusing on this uh instrument for Prototype 2 um it looks a bit better um these Concepts seem to make a bit more sense
Um the second one here for example could also correspond to for example these uh Reflections um so uh at this point uh for me um I’m not a medical expert so it’s for me easier to detect uh when something um obviously spurious is happening but um here for example it would be of
Course necessary to have also a medical expert to look at these Concepts and um also as a side note um here we understand these prototypes as a a com a combination or composition of Concepts and their relevances and not as some um parts of instances uh which is used for protopet
Maybe some of you have heard about protet so in order to describe you shortly the the whole pipeline um you take samples of one class and then you feed it into your deep neural network you compute uh explanations on the concept level so you get uh for each
Concept that you defined a relevant score for each sample and you get a vector and this Vector you can uh basically cluster and find your prototypes for now these uh Max have a question in our chat how are the prototypes defined um I’m not sure if the question
Is still valid or it’s already answered um I can say a bit more maybe on how you can compute prototypes uh basically you can use any uh any clustering method for example uh the most basic approach would be to use K means um in this case um I like to use a
Gorge mixture models which are a bit more complex but they um also have nice properties and um then you would see the mean of this gorion mixture or one of the gorion in the mixture as a prototype but the important part is that using or looking at this this kind of
Embedding and finding these clusters um decreases the work workload uh significantly and it it not only allows you to find spurious Behavior but it also allows you to gain an understanding of what the model has learned um uh on the H data set so it would also be nice to
Use for example discovering new new knowledge um yes um one parameter that is all still free in this approach is the number of prototypes um so I would say one should always also look at the embedding besides um uh besides the prototypes so you know how good you kind of uh um um
Describe your your um explanation space and it’s better to have too much too many prototypes which may might be redundant than too few okay so next I want to just briefly cover the topic of studying inliers versus outliers Max can I have a quick question yes can you go to the previous
Slide once so this concept that you are defining these are like predefined concept right uh yes and um usually I Define each neuron as a concept as a concept okay so so it’s not really like these concept are mapped to any kind of particular semantic in property of the U
Image data right like they are like just kind of random concept um yeah the these when you study individual neurons uh you kind of study the concepts that the model has learned intrinsically right right so it’s not nothing I was just curious like if it is somehow like related to the semantic of
The data no right like it’s just a neuron that you are defining as a concept yeah yes exactly in principle um there are also now more methods that um um also are more um kind of kind of summarize multiple neurons into more interpretable Concepts um you can also use in
Principle these approaches but in this case I show you individual neurons because it’s the most simple approach okay because I was thinking that um that if the individual this concepts are kind of like interpretable then you don’t need to follow the next step of uh clustering and identifying the Prototype
Individually right so I don’t know why didn’t you try the interpretable neuron instead of just taking the neuron and cluster and you mean like looking at if you have for example 500 neurons let you just look at these 500 neurons neuron and the corresponding um semantic mapping of the semantic property of the
Data so imagine like you are going with like dog versus cat so just going with semantic properties like and map this concept with a semantic property so that it doesn’t need any other further exploration of the space um can you elaborate a little bit more yeah because there are like some
Techniques now available that that we can Define each of the neuron as a single interpretable neuron and by combining those neuron you can actually map them to the semantic embedding space right what do you mean with semantic embedding space so imagine like um so your bone density uh data set right
Where you have the kids uh age so they have like a lot of semantic property that the gender of the um baby then like what is the body part of the like what is the actual like um x-ray protocol of the uh X-ray and all these things so those are kind of like
Properties of the data set right that you can map with individual images okay and those are actually can be defining as your um combination a combination of those can be represented as your prototype right so some of them has lateral marker some of them don’t have lateral marker some of them has um
A male patient versus a female patient something like that okay um then um as far as I understood you you still need kind of these um label information yes definitely yeah yeah but then in that case your label is actually related to your input data rather than exploring the space right okay
Um because the main problem that I see currently with the architecture that still you need to explore the space I understand that you don’t need to do it individually you can do on a cluster basis but still like this kind of cluster exploration is some sometime not feasible from the computer science
Developers right because they don’t know really how to map this prototype to a clinical concept yes that’s true um so if you don’t have a medical expert I think yeah it’s always good if you have some um okay more labels in your data to kind of
Also map it to the latent semantics as you said okay um but in this case I I assume we don’t have any labels besides the out output classes yeah okay makes sense thank you okay so there exist for example methods that kind of try to Cluster multiple explanations for example
PCX um but also there exist methods that uh kind of cluster uh Concepts or neurons uh in the latent space and usually these methods they they focus on outliers and in these outliers you often find spous behavior for example here for Dora you find um a neuron that corresponds to Chinese waterm
Marks um and in the um PCX cluster for example you can see here an outlier cluster with with the cats in the C in cartons and you can also um see that there are cat features relevant in this cluster however uh in this uh for this carton class there’s also this Chinese
Watermark uh concept present in the uh overall prototype so in the normal behavior of the model uh this um Watermark is used so um where outlier tend to be spurious we should Al always also check um the inliers so here I give you again an overview of um uncovering spous Behavior using
Explanation based methods and uh the first approach is kind of look at individual samples for example at heat Maps or at concept based explanations for example CRP are also craft by Thomas fail um this approach is very thorough however it’s quite expensive or unfeasible for large data sets then alternatively you can
Look at or use methods that kind of summarize these local explanations for example on the heat map level with spray or on the concept based level with PCX which uh resides in a much smaller workload and still captures uh kind of the full Model Behavior and um alternatively you can
Also study the latent representations and here you can have predefined Concepts maybe if you have more labits you can also uh kind of incorporate this into your Concepts to um identify good behavior easy more easily um looking at latent representations is also often uh requiring a smaller workload because you
Have um maybe a couple of hundred Concepts uh or hopefully less versus thousands of input samples however it is to note that looking at neurons is also um not output specific so you often don’t know how exactly are these features used you just know that the model has has learned
Them okay so hopefully using some of these methods you’ve uh found or I don’t know if I can call it hopefully but maybe you found some spous correlations um and some bias in the model and you want to correct it let’s come to the next step how can we unlearn
Biases and here I roughly group these methods into three groups uh the first one requiring a a full retraining then um methods that are based on fine-tuning and posttop model correction and whereas posttop model correction usually usually requires to modify the model itself fine tuning regularizes the model in some way and
For for retraining often the data is modified so for retraining is computationally of course uh the most costly however it’s also Al on the other hand a little bit more flexible because um um usually posttop model correction is mod specific and also once you’ve cleaned your data for example you can um take
Any new model architecture and just train it on the data let’s begin with the first method which is based on full retraining for discover and cure and here the idea is to add the bias to the data uniformly so in this example we generate or we need some
Samples of the Biers in this case it’s a blanket and a bad because in the data set uh cats correlate with the bads or blankets and then the idea is to add the sample to other classes and um here um they do they just overlay these sampit with the bias over
Sampit of other classes uh via so-called mixup which looks maybe a little bit weird but um it um Works remarkably well however uh for this method you need these bias samples and here they propos to generate them via stable diffusion but if you have some very special for example instruments that correlate with
Your data data for example medical instruments it’s maybe not possible to generate such samples with stable diffusion another uh method based on full retraining is called a fast dime and here the idea is to not add the buyers to every sample but to remove the buyers entirely from the
Data and they um propos to use uh diffusion based models and so you first have to train a diffusion model on your input data and secondly you also have to train um a bias classifier um that operates also on the input image and you further need in this
Case also localization masks for uh the artifact and then the idea is to kind of take a biased sample and then to U generate via the diffusion model a new sample which is or reconstruct this image and then you tell it via this um um mask and um by um uh optimizing to
Um change the prediction of the bias detector to Falls um so this way you generate then um a counteract rle that should mimic the original image but doesn’t have the bias anymore so as you can see this method is is quite complex but when it works it’s it’s quite
Awesome okay so these were methods for full retraining now I will present some methods for fine tuning and one popular method is called right for the right reasons here the idea is to uh penalize the use of bias features in the input and so we add an extra regularization term that basically tells
The model to not use a specific part of the input data and for this uh you need to compute um an input gradient and also uh you need to have um localization of your artifact in the input for example here with the spand aid but then with fine tuning where at first
Here the vanilla is the uncorrected model and you can see the heat map here so where at first uses the information according to the explanation it then after fine-tuning uh focuses much more strongly on the mole right for Right Reasons is quite intuitive however it’s only really applicable for localized biases and it
Requires also a localization so the three methods I just presented they operate basically in the input space and now I will also present methods that operate in the latent space on in concept space and one framework that is quite popular is called um quite big is called class artifact
Compensation there are several methods and the first one I present is called augmentive Clark and here the idea is um similar to the one of disk to add the bias uniformly but now not in input space but in Laten space and um the intuition I show you
Here so here I show you a umap embedding of latent activations for two sets of samples one one uh samples with without the spous correlation and one with the bias and in this case it’s this color artifact that correlates with a man having dark hair
Color and so the idea now is that we transform in latent space sampls from this clean cluster towards this artifact cluster so we add the information of the bias and for this we need a concept activation Vector so the directional Laten space it kind of allows us to
Transform from from the one to the other to just add the information of the bios so as a side note how can we compute these cares we need these two sets of samples ideally we have really one set of samples with the bias and one that only
Differs in the bias and not any other attribute but this is often not possible but this is the idea and then uh when we collect the latent activations of some intermediate layer um we compute here the SC and the US usual approach is to use classifiers for example as M or logistic
Regression however we found out that these classifier based cuffs are not optimal they perform quite bad because these classifiers they try to find a good separation between these clusters but their goal is not to model the concept well so to describe this transformation from one cluster to the
Other so often classifier based CS they diverge why exactly uh feel free to check out also a a recent paper uh from Frederick from our group who um talked you a little bit more about this problem and here on the right I can also show you some quantitative experiments
Where we used signal uh classifier based calves also for model correction and they perform in this case or here I show you the uh we have controlled settings where we know the true Direction and we compute the coine similarity between the calves and the true Direction and we can
See that they um they have much uh um they have they don’t have a very high Cent similarity so they don’t model the concept uh fully which when we want to correct the model uh only partially uh removes the bias and yeah in this paper from uh fedick parle he
Also proposes a different approach for Signal C okay now to another method based on fine-tuning this one is called right reason C dark and here the idea is similar to right for the right reasons we but we don’t penalize the bias in the input space but now again in the latent
Space and um for this the idea is um that we want to add a regularization term during fine tuning where uh when we add or remove the concept slightly in the activations it shouldn’t change the model output and this we can achieve by regularizing the latent gradient and the
Dot product with this C which is uh for some of you who know the TF testing concept activation Vector paper which is quite similar to the TF score and um yeah this paper we also present next week at the triple AI conference so when you are by chance also there feel
Free to reach out okay now uh the last method that is not based on training it’s just post talk applicable it’s uh this one is called projective Clark and here the idea is to remove the bias in latent space and um again this time you don’t add the bias information in later space
But you project it out and there are also similar methods um available or that have a similar idea they are called Spix and editing classifiers here editing classifiers kind of incorporates this operation into the weights of the model okay um at this point I just want to
Introduce another new point on this you can also see it as uh input space correction versus concept space correction and here in input space we are more model independent and it’s much more interpretable so we we better see what’s going on or how how well we’ve erased or added the
Bias however it’s also much more difficult to do because we need for example diffusion models alternatively there also Concepts based model correction approaches that are usually more lightweight and also more Universal we can apply them to localized artifacts but also to unlocalized artifacts however they are a lot less
Interpretable because you often need to find this directional lat and space and you really what really difficult to say if you’ve modded the the bias correctly or if you have also incor or whether in this direction there’s also good information present or other correlations so you only
Know uh how well you um found this direction afterwards when you evaluate at this point I want to give you oh want to show you a small experiment uh we did also for here for data sets for the ISC data set where we we added um a texture over the input
Sampls for a bone age estimation we added a brightness increase for image net we added a time stamp artifact and for C A we use this uh color bias now the task was to correct these uh these biases in the model and we tried out different methods based on
Fine-tuning and post talk model correction and here I show you some heat maps and maybe we can begin here in the second row for this for the uncorrected model and we can see that it focuses on the background for the unlocalized artifacts um on the artifacts themselves
Also for the localized ones so the time stamp and the colar and uh then we can continue to the U correction methods um here we’ve seen or observed that the methods operating on the concept level and activations tend to only partially unlearn the biases because we do not directly enforce it on the
Model here methods based on the gradient perform better uh right for the right reason is very good for the localized artifacts however when we cannot describe it via input Mass anymore for the unlocalized ones um some weird behavior um develops in the model here um um gradient based correction on the
Concept level with right reason Clark um performed uh the most um reliable however here we just look at heat Maps now the question is how can we evaluate it also quantitatively and one possible way is to evaluate performance uh here the question is how does adding the bios for example uh affect the
Model and here for example if we have the bias in all classes we can look at subgroups of the data and when here for example um the bias correlates with class A then usually in the subset of Class A with the bias the performance is higher versus when we have it in another
Class um and the buy is present the performance usually goes down and now the the task would be to uh kind of measure this difference between uh the set with bias and the set without and uh ideally this would be as small as possible so the bias doesn’t have an
Influence this evaluation from the idea is quite easy however there could be also other influencing factors in each subgroup so especially when we do not have perfect pairs of samples so one clean and one poisoned one and so this setting is also not always given perfectly um alternatively there are also
Approaches that artificially at this bias for example here um for localized biases it’s sometimes possible to crop uh the Biers out of um bias sample and then to add it to as overlay of the clean sample and then we can measure the the change in the uh output um
Alternatively one could also leverage diffusion based methods for example you have fast dime um however uh here you you need this diffusion model and the bias detector so these methods they are more direct which is great um so you directly measure this influence however they require um kind of this bias model where
You can just add the bias to a clean sample as a third alternative uh what what you what one can also do is we uh take biased samples and then we compute an explanation and um the idea is to measure the relevance on the bias and this is also quite uh easier for
Localized um biases because you can just um um yeah measure the the spatial relevance um on in the heat map and to few the relevance the better here the advantage is also that you do not have to uh change uh the input so no input transformation is
Needed however you require in this case localized biases okay so uh if we only had one big toolbox covering all these steps uh this would be quite nice if we also had a lot of methods and we could just try out multiple methods and if we have nice
Evaluation schemes also we can then just choose the best performing one um yeah in order to go in this direction uh we also uh published one paper last year about a method called reveal to revise which kind of um aims to incorporate this whole explainable AI life cycle
And uh here’s one quick overview so R revise supposed to be um a highly automated framework for bias correction so we included two approaches for revealing biases uh one U it kind of summarizes these local explanations and one that looks uh at the latent representations
And then we also have uh in this case three mod correction approaches right for Right Reasons uh Clark and CD and um we also have evaluation uh schemes however when you remember right for right reasons for example needs input localizations um and in this framework we also propose uh a way to gener
Generate these um localizations of the artifacts automatically by finding um this uh artifact in the latent space the direction then we can generate heat mes specifically for the SC then we can use these localizations for model correction but also for evaluation okay let’s come to to conclusion so we have seen now that
There are uh quite powerful explain of AI tools to discover biases for example uh summarizing methods for example prototypical explanations like PCX or methods that allow us to study the latent representations for example D disk or CRP and then uh here we should check not only but also
Outliers and there are also already various correction methods available however uh one part uh so the evaluation I think is still really difficult and uh there should be more focus on it U because in research papers you uh sometimes only see controlled settings and um or where we have localized bioses
And I think uh it’s time to make also better use of generative models however this is not very easy but I think um I hope we can see more more methods soon and another thing that’s not really uh talked about in research papers is whether the evaluation or the method
Then or the model then is uh good enough to be regulatory compliant and uh here I also heard lately that there’s also some field called admissible machine learning maybe if you’re interested in that you should check it out so and I think for the future will be very
Crucial that we have some Frameworks or tool boxes that combine all these steps and various methods so uh uh you can easily understand the model and correct it and there are already first candidates for example reveal to revise or there’s also a unified explanatory Interactive Learning typology by
Fredrich and I think uh these Frameworks or toolboxes then also allow for a large scale Benchmark of different methods because that’s also something I haven’t seen yet um some comparison um of not only fine-tuning and post model correction but also uh uh like uh methods requiring full retraining so the whole
Spectrum if you have questions uh or want to collaborate for example apply our methods to your data domain um to not only find misbehavior but maybe also discover new knowledge uh feel free to reach out thank you very much for your attention thank you very much Max this
Was very interesting talk so let’s give our speaker a virtual Round of Applause first and we are slightly over time but if somebody has any if our audience has any questions I think we can accommodate a few so Ramon you have a question do you want to go ahead and
Ask uh yes uh so in the example where you’re showing U multiple um heat maps for the attribution methods um you had this example where you added some texture noise onto the image um was that like a localized noise or a global noise and how would your method be able to
Distinguish um that as being like a noisy concept versus like an actual useful concept because sometimes for classifications there might be useful changes in texture let’s say um yes exactly so in this case uh it’s called as far as I remember least significant bit attack um um and this is an artificial artifact
That we added and we know shouldn’t have any uh good information so we then specifically looked for this artifact and uh we knew this is bious so um I hope this uh addresses your question and this um um least significant bit attack is spread over
The whole input so it um kind of um creates this texture over the whole input so you actually did it only for one single class to create the bias or you did it for the whole data set uh we only uh did it for one class one class
Okay and the same thing for your imaginate also like when you add the time stamp probably you did it only for one class yes um for the time stamp yeah only for one class um but this time stamp is also chosen specifically because there are also imag classes like
Time clock right right which kind of uh resembles the similar shape yeah yeah so for time clock it’s actually a sensible concept and um so as a side note with uh yeah with gradient based methods you can more precisely control um that it should only be corrected for one output class
One output right right okay because I think uh you are thinking the bias more on the class specific way because there’s also a bias because that’s what my confusion at the beginning that there was a also a bias related to the data set right so imagine
That this noise can be present in one data set collected from one institution versus another institution which doesn’t have that noise you know so that there there can be also a data set specific bias rather than the class specific bias H that’s true yeah yeah okay yeah it look it’s nice
Thanks do we have any more questions for present if not then let’s thank our speaker again and thank you very much for this interesting talk and we’ll put this up on our YouTube channel um hope to see you all next week we are changing our regular time from Thursdays to
Mondays uh 1 to 2: p.m. specific time so hopefully see you all next week thank you very much thank you very much Max thank you very much have a good day