Unsure of how to use the PivotTable fields to slice and dice your data? Check out part 2 of the PivotTable tutorials by Sparx Analytics LLC.
Disclaimer: All data used by Sparx Analytics LLC is simulated or publicly available. The scenario in the video is used solely as an analytics example and should not be used as part of clinical decision making. Captions are autogenerated.
Hey everyone and welcome to Part 2 of the three-part intro to PivotTables by Sparx Analytics. In this video, we're going to focus on the various PivotTable fields and how they can be used to slice your data set to answer specific questions. So jumping back into our PivotTable from part one, let's say we want to explore any possible trend by clinic between the average age of patients who missed their visit compared to those who complete their appointment. To do this, let's talk about the various PivotTable fields and how we can make just a couple of adjustments to answer this question. To start off, the top pane is going to display all of the column names from the data set that we're used to create the PivotTable. Any column that's currently being used in one of the fields below will have a check next to it, so we're currently using Sparx Clinic within the rows and the Simulated_Patient column is currently in the values where we are doing an aggregation on that column. The filters pane below, any column that has this in it will allow us to filter by that column. So, let's say we wanted to actually just filter by Sparx Clinic. We could drag this here and you can see now we have a count of all of the visits within our data set. But we have this new field above that will let us just filter. So if we wanted to only see the count by Cardiology Clinic, we can filter here. I'm going to undo this filter and another way we can filter if this column is currently being displayed within the PivotTable is just clicking this dropdown header in the row labels and then selecting the column of interest. So if we wanted to do cardiology we can now see it displayed again. So really this filter is good only if you want to filter by a column that you're not going to be displaying within your PivotTable. Otherwise, you can filter directly on the PivotTable itself. The rows and columns help you to shape the dimensions of your PivotTable. So currently we have Sparx Clinics within the rows and we can now see a unique set of the values for the Sparx Clinic column displaying as rows. But, we could drag this over to the columns and we can now see those unique values again but just being displayed in a wide format, each value representing a different column. If I'm only going to have one column displayed in my PivotTable, or one of these columns listed above, I typically will put it into the rows field of the PivotTable. It's just a little cleaner to look at if there's a lot of entries and it spills vertically rather than horizontally. If I'm going to have multiple columns from our data set being displayed, so maybe one in the rows, one in the columns, I tend to have whatever variable will have the most unique values displayed as the rows and then the one with the fewest as columns. However, this can vary by the type of problem you're trying to solve. And especially if you're going to be converting this to some sort of chart, that will help to determine what is the X and Y axes. So, for example, if you are finding a trend by a time series, typically I'll put the time series in the rows one. It has a lot of different options for the values, especially if you're looking over a long period of time. But, also if you were to do a line graph as part of a PivotChart, the X axis is going to be based off of this rows field, so having the time series in the rows is beneficial. But again, a lot of this is personal preference and you can really mix and match depending on whatever scenario you're trying to solve for. So in this instance, we want to know the average age of patients who completed versus missed their appointments for each of the clinics. So I'm going to grab the visit status and move it into the column section. So you can see we now have a listing of all of our clinics by their visit status. And it's currently doing a count based off of simulated_patients which is really giving us the count of the visits. So I'm going to get rid of this because we want to know information about the patient age and because patient age is numeric data it is defaulting to the sum aggregation. But in this case we don't want the sum, we want the average. So I'm going to click and see this list of options and click value field settings and instead of using sum I'm going to change it to average and click OK. You can see now we have the average. I'm going to format these numbers just a little bit because I don't want it to have so many zeros. So let's just do it to a whole number and here we have it. We now have the average age by their visit status and we can see those who missed their visit compared to completed it are typically older. So if we look at our grand total, on average patients who missed their visits are 71 years old compared to those completed are 55. So this is an interesting trend that you might want to drill deeper into and then figure out what barriers do patients who are older populations have compared to maybe younger populations. Other things to note about pivot tables though are the data within it is static unless you update it. So, compared to functions that update automatically when you change values, if we were to change some of the values from our data set, so let's just say we want to make all of these visits completed. Our pivot table is still going to show the exact results before because that data set is stored in cache. We have to tell the pivot table to update. So I'm going to go to PivotTable Analyze within the ribbon and I'm going to click refresh and you can see now our age values update. And you can see because of how we changed that data, there's no longer any patients who have missed their visit. So for example, if we went to the endocrinology clinic and let's just default somebody's age that has missed to 0. So, let's create one here to missed and 0. Refresh this. We now get a different answer, so it's showing 37. However, if we actually just blank the cell out so I'll make it with no value and then refresh. You can see it goes back to being 73. So even though we have a patient for the Endocrinology clinic who missed their appointment because the value of their age is currently null, it's not calculating that in as part of the average formula. So keep that in mind as you are analyzing different data and understand how different functions as well as PivotTable handle missing data that could be erroneous or missing data that's intentional and make sure that it's actually fulfilling the calculation that you need. Hope this helps as you begin to answer questions with Pivot tables. Stay tuned for part three of the series where we'll go over formatting of PivotTable.