Computers and Technology
Working with Categorical Variables The columns cut, color, and clarity are categorical variables whose values represent discrete categories that the diamonds can be classified into. Any possible value that a categorical variable can take is referred to as a level of that variable. As mentioned at the beginning of these instructions, the levels of each of the variables have a natural ordering, or ranking. However, Pandas will not understand the order that these levels should be in unless we specify the ordering ourselves. Create a markdown cell that displays a level 2 header that reads: "Part 3: Working with Categorical Variables". Add some text explaining that we will be creating lists to specify the order for each of the three categorical variables. Create three lists named clarity_levels, cut_levels, and color_levels. Each list should contain strings representing the levels of the associated categorical variable in order from worst to We can specify the order for the levels of a categorical variable stored as a column in a DataFrame by using the pd. Categorical() function. To use this function, you will pass it two arguments: The first is the column whose levels you are setting, and the second is a list or array containing the levels in order. This function will return a new series object, which can be stored back in place of the original column. An example of this syntax is provided below: df.some_column = pd.Categorical(df.some_column, levels_list) Create a markdown cell explaining that we will now use these lists to communicate to Pandas the correct order for the levels of the three categorical variables. Use pd. Categorical() to set the levels of the cut, color, and clarity columns. This will require three calls to pd. Categorical(). Create a markdown cell explaining that we will now create lists of named colors to serve as palettes to be used for visualizations later in the notebook. Create three lists named clarity_pal, color_pal, and cut_pal. Each list should contain a number of named colors equal to the number of levels found for the associated categorical variable. The colors within each list should be easy to distinguish from one-another.