Are you struggling to understand user complaints about application performance? When Developers and IT Pros look at system performance, they tend to focus on server side metrics to help them optimize their applications and infrastructure. However, end users don’t care about your CPU metrics or memory utilization.
Hi everyone welcome to the latest installment of our ongoing editorial webinar Series coffee talk each hourlong information- packed episode organized by the hardworking Folks at Redmond magazine features the observations and insights of independent experts on a wide range of tech industry topics many thanks to the underwriting sponsor of
This episode controlup providing a digital employee experience management platform designed to equip ITP teams with the tools they need to quickly resolve issues proactively prevent tickets and minimize costs without their support this series would not be possible and thanks to you for joining us I’m John K Waters editor and chief of
The conver 360 group of 1105 media and I’ll be your moderator today’s topic is enduser experience monitoring 101 and our lead presenter is Joey dantony uh principal consultant at Denny cherry and Associates but before we get started just a bit of housekeeping this episode is being recorded for later access keep
An eye out for an email with a link to that recording it’ll be coming your way in the next few days uh we’ll make some time during the talk for questions please feel free to type your questions into the Q&A box at any time our sponsors Prov some extra resources you
Definitely won’t want to miss they’re available now on your console and as a small thank you that the first 200 attendees who stick with us to the end we will be sending you a $5 gift certificate to Starbucks it’s a cup of joe to go at the info now let’s meet our
First speaker with more than two decades of experience at Fortune 500 companies Joey dantony has become a well-known thought leader in crossplatform it he’s both a VMware B expert and a Microsoft data platform MVP he has worked extensively on database platforms and Cloud Technologies and he has a high
Level of expertise in performing Pardon Me performance tuning infrastructure and Disaster Recovery you guys are in for a great session take it away Joey thanks Sean and and hi everyone and and I’m Joey and uh like John mentioned Dem a consultant at Den train associat
Consulting and I do a lot of work with a wide variety of customers uh doing all sorts of various perform Performance Tuning task on every level but I I wanted to start this presentation out with a kind of a story because I think it’s always interesting that uh we can
Tell stories and share from our shared experiences so I’ve been working with dcac I think 2014 I can’t remember exactly uh but the last real job I had as in not being a consultant full-time was uh me uh I was a principal architect at Comcast which is my tit my title and I
Was responsible for database systems and big data so what that meant was I didn’t have to be on call which was really nice uh and I could tell the dbas when they were wrong and what to do which was my favorite part of the job no just kidding
Uh but it meant kind of we I was on you know a team a cross functional team where we had kind of the the experts of each uh of each uh area of infrastructure on our team uh what this meant was when we had big problems we
Would get brought in to kind of firefight things in addition to you know the planning stuff we did well comcast.com at the time had some performance problems and we did look at everything and you know evaluated hardware and this was kind of back a long time ago uh so 2010 20 20 uh
2012 2013 time frame and Comcast is actually a little bit slow to adapt the virtualization so most of our Hardware most of the hardware in question was physical Hardware so stuff running on bare metal infrastructure uh nothing in Cloud at the time um but we evaluated a
Lot of things so we we looked at databases servers storage and there were are some database performance issues and one of the things you’ll notice in the theme throughout my throughout this presentation is that a lot of times as it professionals we only think about the backend of systems we have all these
Tools and techniques for how we can look and analyze database performance server performance storage performance and we can even use tools like grafana and Cabana to combine all of those things into a to a single dashboard but one of the things we don’t always look at is
The end user experience and what is that enduser feeling uh and what kind of timings are the end user having so what we did was we did Deep dive performance analysis of the infrastructure layers like any good server admin and DBA would do however what no one mentioned until pretty late into the
Process was the website itself when you went to comcast.com it loaded 25 megabytes and I know that doesn’t sound like a lot uh especially in a world where we have like pedabytes that we can allocate to a VM in the cloud but 25 Meg
Is still a big chunk of data to load to a single web page especially your front page when you’re loading it directly from the web tier at the time we had no content delivery Network in place and I’m going to talk more about CDN in a
Bit uh but what that means is that was Landing in our that was hitting if you went to comcast.com you were hitting you know our DNS and then going through a load balancer and then pulling 25 megabytes of data from our web server in our data center in the Philadelphia area
To your browser uh I think we might have had goo balancing so if you’re on the west coast you might have been going to Denver uh but that that location was happening in the website there was no content delivery Network in place so the
Way a CDN works is I’m in New York right now I’m I’m speaking at a conference um and what would happen is if I’m in New York uh I might not I might not hit a CDN but we we might have a CDN in New York because the the the population here
Is large enough that like images and large files that were loaded on the website would be cached so the the second time someone from New York hit that endpoint uh the site would load a lot faster so there was no caching anywhere in this layer so we had all
Sorts of things wrong with our site and what happened is we were chasing as system admins we were chasing millisecond and microc seconds when we should have been chasing seconds and where you lose that a lot of times as an IT professional is when you focus too
Much on the backend systems and not enough on what your user is experiencing and that’s what we’re going to kind of talk about today and it’s it’s frequently important to look at the backend system but it’s also important to understand what your users are are seeing and how they’re experiencing that so
One of the trends I I guess that I I’ve pardon the pun observed in recent years is we used to call things monitoring and now we tend to call what we used to call monitoring observability and I I don’t really know when that Chang it seemed to be around when kubernetes got popular
But it’s conceptually the same thing I’m sure there’s a full definition that would explain the differences to you uh but to really understand your applications end perform Behavior you need to capture metrics at each layer from the user down to the storage layer and this is something I’ve always heard
From like Performance Tuning experts and as I always learning to be better at Performance Tuning this was something I always wanted to understand because when I was a young DBA I was like I can only see things in the database how do I know you know what this data is doing when it
Gets to the app here and then how is it being served to a user uh and that was when app were really simple we had a you know a big fat Client app in in a database now apps are far more complex so we have much more complex application
Topologies and and we’ll kind of talk through some of the topology of a modern application and how those tend to work but there tend to be a lot more layers like one of the things is a lot of systems now don’t even have a single database behind them they have multiple
Data soures depending on what the what the with the use case for that specific data in that app is but having that picture of end to end can help you isolate where any performance bottlenecks are and beyond that you need a way to evaluate this data holistically
So you need to be able to say okay it’s taking the end user you know 23 milliseconds to get to the end to get to the endpoint for example I was watching a bike race uh last week in Australia uh and the updates that we’re getting
Posted to Twitter were way ahead of my video on the screen because I’m was using a VPN to Australia and from the east coast to Australia in terms of latency is roughly about 330 milliseconds uh so not a great latency scenario but I could tell you exactly
Where my bottel neck was I had a 330 millisecond latency and text was faster but being able to evaluate that data holistically is really important it’s Network it’s multiple layers it’s interchanges between systems these are all challenges that that I find working with different customers we run into a
Lot and many Architects and administrators struggle with understanding this concept of looking at the whole picture and seeing where things are a lot of times I know I I’m guilty of this I’m I’m always quick to go hey what’s going on in the database and and it is always going to be the
First thing I’m going to look at because I do know that it is like the center of a lot of performance problems but you have to look at the whole picture because especially if you have some users that are having good experience es and other users that are having bad
Experiences is it down to a desktop problem or is it down to a network issue and there’s a lot of complexity there and we’ll talk through that in the next couple of slides monitoring enduser experience so this ties back into that it wants to default to monitoring backend
Performance uh I I know this because it’s what we know uh I meant to I actually dreamt about this presentation last which I don’t normally do but I did I I I dreed up this crazy uh Azure remote desktop demo that I was going to
Do which I don’t know how I was going to do that because I only have 30 minutes but anyway uh if we look inside of Windows and we look at performance Monitor and this is just Windows Linux has its own ecosystem and the same large number of performance monitoring tools
Uh we have thousands upon thousands of counters that we can enable Windows Windows to light up in fact one of the ways if we want to generate a whole bunch of data in a hurry for demo purposes is to turn on like as many counters as we can performance Monitor
And windows it’s the nature of it that we’re always thinking about the server side because the servers are kind of our babies uh those are the systems we know those are the ecosystems we know really well and to be absolutely honest with you those are the easiest things to
Monitor I can go in tomorrow or right now uh light up 45 permon counters and tell you with with you know 95% confidence what’s going on in your your SQL Server I can do something similar with Oracle on Linux like those are things that are pretty easy for it to do
Most observability tools are also built around server side moning so if you’re not building your own tools or using custom custom perf manuscripts uh you’re buying some tool from a third party vendor a lot of those tools are built around serers side monitoring they want to they want to be
Able to say what’s going on at the application tier what’s going on on the database tier and there are some tools uh like if you’ve used something like U Azure application insights or New Relic that do have some monitoring of like the time from the user uh the user going
Round tripping through the web tier those tools are really good from a developer perspective but they aren’t necessarily built around user experience and that’s something you really want to have some better understanding about you know what kind of lag time are your users having um what’s the round trip how quickly is the
The data getting returned not just from the the data story to the app but to your user and their device and let’s talk about why that’s complicated and this is really kind of simple I tried to get this all on one slide and I pretended like we were in
The cloud uh and I I made this kind of a simple app in reality there are a lot more layers in this and there’s a lot more complexity but when we talk about modern applications we have a user device uh that user device is is typically a mobile phone or tablet or a
Laptop or a PC or whatever uh but when we’re talking about that we’re talking about a wide Ray of devices so I right now I’m presenting to you from an an Apple MacBook Air with with apple silicone and I have my my iPhone 15 so
I’m kind of on some of the latest stuff so chances are if I’m I’m experiencing a performance issue on my device not going to be related to the capacity of my device but when you’re supporting a modern application you have to support all sorts of devices and you may have
Users who have slower devices older devices older versions of operating systems different operating systems and those are all something you have have to have some concerns with those users are all coming through an ISP if they’re mobile that could be a cell phone company and you don’t have a
Lot of ways to get insight into that cell phone the cell phone like if you’re large enough you can can get metrics from your your self providers uh but you don’t always have that or isps and some isps offer you better data than others I have a cooworker right now who’s
Struggling he recently upgraded his ISP plan to a gig and he started having dropouts on his router when he did that and he’s got some kind he does have professional grade writing equipment uh so these are things that if you’re an application owner we can’t see any of
This yet we can only see uh like response times and we don’t have standing around that because we don’t have anything on the user device here I I talk about the edge CDN and this is the way a lot of modern kind of app be developed where we’re having a
Content delivery Network um the big ones are aami Azure and Amazon both have them as part of their cloud services and this is going to land kind of large files of yours closer to your users wherever they are and and this isn’t is kind of a
Really basic way to get data in your applications closer to your users the other thing that a CDN does is reduces the load on your application in data store tiers because for example if you’re loading images onto your site let’s just say you’re storing them in S3
Or Bob storage instead of round tripping to blob storage every time you go to retrieve an image that’s on your homepage it’s going to be loaded to your content delivery Network and that content delivery network is going to be closer to your users than your app unless you have an app that’s completely
Globally distributed in which case you have entirely too much budget uh but even even when you have an app that’s cly distributed it’s going to help by offloading some of that then we’re typically going to have some kind of network connection from that CDN back into our back end I am making the
Assumption that you’re serving from the cloud but this Cloud could also represent your data center you’re going to have an application tier of servers and that’s kind of what that rack represents you’re going to have some kind of data store or data stores and typically this could be object stores
Storage uh this could be a database in all likelihood if you’re building a modern application It’s a combination of both and and maybe some other other things like key value stores so all of these layers have metrics and typically as a as an IT team we’re going to be looking at the stuff
That’s in that back end and maybe a little bit on the CDN to see what our utilization looks like but that leaves out that whole user experience aspect we don’t know what what those user devices are experiencing we may have some rough IDE is if we’re big enough that we may
Have enough data that we can get our ISP data from user isps uh but we don’t have any guarantee of that and the other thing we that’s always nice to know is if we have users who have bad internet connections how does the app perform and what is their experience because let’s
Say it we’re going to have users that are living off cell phones or in in various parts of the world or that maybe don’t have a good internet connection or maybe they’re at a conference or something and you want to see those sorts of things so being able to monitor
This kind of application architecture from end to end isn’t something that’s that’s done easily and it frankly it’s not something in my experience working with it teams all over the world it’s not something that a lot of it teams do they’re mainly going to be focused on
Those definitely the back end and maybe even the CDN networks are kind of everything uh we were we were just talking about this over launch with my team uh about various options for networking between Cloud regions uh I was I’m attending this Microsoft AI conference today uh
When I’m not doing this presentation and one of the things that came up this morning and you’ll read about this in my column on Redmond mag I think tomorrow probably uh is that Microsoft has laid 29,000 miles of infiniband connections uh which is infinance is high high speed low latency High throughput networking
Uh for chat for open Ai and in their data centers to try to improve their AI offerings to to improve performance as we build these large scale systems and clouds are all just large scale distributed systems networks are everything whether it’s it’s moving data between systems in the same Cloud region
Or moving data out of that region into into user devices or into app tiers U it’s it’s a big world and and there’s never enough bandwidth uh uh like we need it also costs us money so paying attention to that can be important monitoring those Edge networks like
CDN uh or other Edge connections are really critical to understanding your app’s performance you want to ensure that content delivery network doing what you think it is the other reason why that’s so important and this is kind of a cloud architecture thing more than just a performance thing is in most
Cases you’re going to be paying for cloud eress so data that leaves your cloud provider is going to be Med and you’re going to be paying on a per gigabyte or per terabyte basis depending on who your provider is uh and so you want that CDN to be successful so you’re
Returning orders of magnitude and less data so designing around those things and monitoring uh can help that another thing that’s really important is understanding that user device and connection Behavior Uh we have a lot more bring bring your own device now especially around mobile phones I think
Bringing your own laptop has faded a little bit but it’s still there because we have some we have some users who will use Remote Desktop uh and I know uh remote desktop Services is something we’ve gotten a lot lot of uh kind of work on recently in the last several
Months uh because of some mergers and Acquisitions that have happened in Industry customers are more interested in going to a cloud-based RDP solution so or a remote desktop solution rather but it’s impossible for you to test every device and ISP combo I was using was using this uh one particular piece
Of screen sharing software that I can’t name for disclosure reasons uh now it’s not the fault of the software and I don’t even remember the name to be honest with you but it required me to have a physical Windows machine to use it and I didn’t have at the time I
Didn’t have a physical Windows machine uh and it was temporary so I figured I’d just buy the cheapest one I could off Amazon and I bought like this $200 HP laptop that probably would have been a terrible machine in 1985 much less now uh and my experience was
Terrible and it wasn’t the fault of the application it was there wasn’t anything they could have done uh but there’s you’re not going to be able to test that you’re not going to be able to test every ISP so understanding what your user devices are experiencing and
Having some way to to monitor that is important because if you notice a correlation of users who are are seemingly having really bad metrics around your application you want to be able to know why and it’s not 2020 I know not everyone is working remotely but a lot
Of folks are working remotely still my wife works for a pretty conservative Financial Services firm and she still works from home two days a week but uh and we see that with lots of employees and lots of Industries lots of users uh still work remotely part-time at least and um
Having important networks and having users who may have dodgy internet connections are definitely something it has to deal with more than it did in the past so having tools to be able to to better understand what that looks like uh is is pretty important so why do we focus on the
Backend so much and and I I think this is something that’s kind of a universal it bias uh we all like the nerd out and go into the weeds uh and and and pull up things like debuggers and profilers and look at real time uh stuff that’s going
On in our systems and frankly it’s easy because the same people who who who write the applications that we use are the same people who like to nerd out about about such things uh and so they make it pretty easy for us to monitor like I said it’s really easy to go light
Up 50 or 60 different perfmon counters and let light up with Windows uh you can configure a multitude of meters to fully understand what’s going on in each layer my favorite part about this and something I always like to bring up when I’m kind of doing holistic Performance
Tuning is it’s also really important when you’re doing this and even if you’re focused exclusively on the back in to understand uh the granularity of each system we I remember uh working with a storage problem where the storage the sand was only aggregating data once every 15 minutes and we had data coming
In from window windows and SQL server and Oracle every 15 seconds so the sand data was basically useless to us but uh those things are all challenging and I say networking is hard and what do I mean by networking is hard uh they’re just a lot of moving pieces more
Importantly we don’t own all the pieces so once the bites start to arrive in our data center in our Cloud then we can monitor those bites or packets I guess is the right term to use as they travel through our Network uh but from the outside in we don’t have that we have
Very little visibility into what that means and what’s going on in in that world so I think those are a couple of the challenges we face and why you see so much Focus just from it perspective on the back end of systems rather than directly what the users are
Experiencing so there are some challenges for end user monitoring that we see uh and we we’ve talked through some of these but I just wanted to highlight them uh before Jeff kind of talks about control up and some of the things they can do um we have a wide
Variety of supported devices we have different operating systems different device formats different device types uh different sorts of connections so we have uh and it’s it’s kind of crazy how the the the technology world has evolved you know um for a while there in the the
80s and 90s we pretty much had a standard set of PC and we kind of knew how everything worked and then mobile phones really happened and I I remember getting some of my first data data phones and that was a real Adventure um but the other thing that I I think kind
Of happened and I I can’t really put a pinpoint on when this happened is devices kind of got good enough probably around the time that most computers started shipping with like eight gig of RAM that they lasted for a really long time and and solid state drives uh you
Don’t have that whole like every four years you need to get a new computer uh so you end up in a lot of ways with like devices that have older operating systems and you may be like why are people still running Windows 7 and and it’s like a they don’t know how unsecure
It is and then B it still works really good on their device and and they don’t know any better uh so why would they want to change that so you end up with all these different different formats and it’s less of a concern in the corporate environment but you still have
Customers right and you don’t know what your customers are going to are going to be what kind of devices they’re going to be using under understanding your user internet performance is a big challenge just because you can’t monitor it and there are some ways you can design
Around this and a lot of the ways we Design Systems using content delivery networks using caching at various layers is all kind of designed to try to minimize the impact of slow internet performance but websites keep growing because we have bigger and bigger connections and developers want to give
Richer experiences so that’s a challenge and understanding that and all of these factories will impact your metrics at at multiple layers and you can’t always correlate these to to necessarily find a bottleneck and that that’s something that uh when you find a bottleneck in Performance Tuning it’s it’s kind of
Like the Holy Grail you’re like okay we found the problem now we can tune around it or we can figure out what the next bottleneck is going to be and we can figure out if we’re need to going to make need to make any architectural changes around that but getting to that
Point especially when it’s happening external to your organization is something that’s challenging even for the best performance teams and a good example of this is virtual desktops uh some of these live in the cloud some of these live on PR uh we have users that may be coming in
Remotely I know I I worked with customers where I I had to come in through a virtual desktop and in a lot of cases that’s not a great experience um but it depends like some of the the newer Solutions are better um I because of partner reasons I had to get
Certified in Azure virtual desktop last year so I learned a lot about it um but it’s important to kind of understand what that user experience is like and what that incorporates in knowing how you should be able to monitor that so there’s no like kind of Holy Grail there
Uh I’m sure just going to tell you tell you how great control up is at it and it is uh but you need some kind of tool like that that gives you that comprehensive thing because there’s not like outside of having something like that there’s not necessarily A single
Metric you can look at uh when you’re having those users coming in on a on a ISP you don’t control from maybe a device you don’t manage uh you want to kind of understand what their experience is like and that’s really important to to help give them a great
Experience and with that I’m gonna take some Q&A before I hand off to Jeff um do we have any questions Sean we sure do um remember everybody you can type your questions into the Q&A box at any time we’ll do our best to get to all of them let’s start with this one
From from shaa who’s wondering uh Joey how can you identify the impact of an ISP versus a home network for user uh experience this is really challenging and this is the question uh this is the question that came up with my cooworker the other day he was like is this my ISP
Or is it my home network and typically as an application you vendor you don’t have uh even if you’re embedding um some call back some Telemetry in your application uh it’s going to be really challenging to isolate whether that that call back is is slow because the the ISP
Connection was slow or if you know they’re on a link swish router from from 2003 uh so being able to identify that is quite challenging uh hopefully you you can you can kind of isolate it and and where you tend to we’ll address that the next hour I’m
Sorry this is Jeff we’ll address that the next half hour yeah yeah I mean where it’s hard to do that good point Jeff and and where it’s hard to do that is uh to do that well you need to be able to correlate all your all your user activity and then
See if you can see whether it’s one user who’s standing out or whether it’s all the users and I’ll bag on Comcast since I worked there uh used to work there uh if it’s all the users that have pumpcast is their ISP uh that are having issues
Then you can kind of P it on an ISP but there’s data correlation and and that’s where tools can be really helpful like control up okay uh by the way that extra voice you heard there was Jeff Johnson he’s going to be our second speaker more
About him later meanwhile um we’ve got a question here from Lynn who is a regular attendee at at our webcast and uh Lynn is wondering what are the key metrics and indicators that organizations organizations should monitor to assess the end user experience effectively that’s a really good
Question L and I’m sure J’s going to tell you tell you some excellent stuff about what control up can do what I’m going to tell you is in your in in your applications what you need to do is be building like as part of your application stack a tary pipeline if
You’re trying to do this yourself so that you can collect that data so you can understand what the user experience is like and you’re collecting those metrics uh because there’s not like a single tool there there are some Telemetry tools that that are built into application Frameworks and libraries
That will allow you to do that but there’s no necessarily like single metric you can capture uh because remember most things you’re running have to be running in the context of a program or website that you’re shipping out to other people so you have to be collecting data that
Way okay we’ve got one here from Liz Lisa Lisa who’s wondering and you use this term many times in your presentation when you say network is everything does it mean that when there are poor Network experiences nothing will go on I don’t want to say nothing will go
On uh what what I mean by that is we’re and cloud has really accelerated this but we’re building applications that are more distributed and systems that are more distributed than we ever have systems used to be a lot more monolithic like we’d have one or two servers and or
Maybe you know a cluster of four but now we can have like containers and various things and when we run out of network bandwidth it can suddenly become a big bottleneck and when we talk about user experience we have a whole another Network that we’re not really in charge
Of so it’s it’s even harder because we have that that connection between the end user and our application so we can do everything we can on our side to like use the CDN and use C Edge cach to try to improve the user experience but if if
They’re like coming in over dialup uh that doesn’t really help us so uh it just means that there’s a there’s more complexity in in the stack okay here’s one from Keith who’s wondering does the location of users matter to their experience absolutely uh I uh I mentioned that Australia example but I I
For i’ had the good fortunity to travel all around the world uh and one of the ways I’ve done that is by being a consultant and supporting uh supporting my customers at home when I when I’m traveling so I I have been on I remember I I upgraded A customer from to seful
Server 2016 from like a train between the Netherlands and France uh and and their their server was in you know a data center in Washington state so it matters depending on the nature of the application and how things are constructed it can be really poorly impactful like you can have a really bad
Experience if the application designed pretty well uh like the I’ll give you a really basic example because I manage it or with Denny the dcac website we have a a web tier in in in the US and one in I think West Europe and Azure uh and we
Have a CDN and stuff uh so we can see load times from all over the world uh and we we pay attention to that because people care uh and we Geeks around performance tring so we can see all those things and if you design if you design things well it can work pretty
Well and and that distance can be less of an impact but it’s also going to depend on the nature of what you’re doing because if you’re trying to build something that’s really transactional in nature and you have to have a lot of what we call synchronous communication
Where a call can’t complete until it hears back uh and you have an extra 100 or 200 milliseconds of latency uh that’s going to be a painful experience okay looks like we’ve got time for one more question for Joey um this one Joey how can we help uh an
Application with users all around the world so just to kind of reiterate what we were just mentioning uh cdns are really helpful uh if you have budget you can put Azure has 54 regions Amazon has like 40 something regions you can put uh your Edge resources closer to your users
And if you have if you have critical systems critical transactions that absolutely makes all the sense in the world to do that