How To Change Index Values In Pandas Dataframe

Pandas DataFrame is nothing but an in-memory representation of an excel sheet via Python programming language. An index object is an immutable array. Indexing allows us to access a row or column using the label.

Pandas DataFrame is a composition that contains two-dimensional data and its correlated labels. The DataFrame is a 2D labeled data structure with columns of a potentially different type. DataFrames are famously used in data science, machine learning, scientific computing, and many other data-intensive fields.

Pandas set index

Pandas set_index() method is used to set the list, series, or dataframe as an index of the dataframe. It takes keys, drop, append, inplace, andverify_integrityas parameters and returns the data frame with index using one or more existing columns.

To set the DataFrame index using existing columns or array in Pandas, use the set_index() method. The set_index() function sets the DataFrame index using existing columns. The index can replace the existing index or expand on it.

Syntax

DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Set the DataFrame index (row labels) using one or more existing columns. By default, it yields the new object.

Parameters

keys:Column name or list of a column name.
drop: It's aBoolean value that falls the column used for the index if True.
append:It appends the column to the existing index column if True.
inplace: It makes the changes in the DataFrame if True.
verify_integrity: It checks the new index column for duplicates if True.

Example

We will use Real data that can be found on the following Google Docs link.

https://docs.google.com/spreadsheets/d/1zeeZQzFoHE2j_ZrqDkVJK9eF7OH1yvg75c8S-aBcxaU/edit#gid=0

Pandas DataFrames are data structures that contain:

Data organized intotwo dimensions, which are rows and columns.
Labels that coincide with therows andcolumns

Now, open the Jupyter Notebook and import the Pandas Library first.

Write the following code inside the first cell in Jupyter Notebook.

import pandas as pd

Run the cell by hittingCtrl + Enter.

Okay, now we will use theread_csv()function of the DataFrame data structure in Pandas. So write the following code in the next cell.

data = pd.read_csv('data.csv', skiprows=4) data

We have used the read_csv()function and skipped the first four rows, and then display the remaining rows. Run the cell and see the output. It will show the first 30 rows and the last 30 rows if there are so many rows.

In our data file, there are above 29,000 rows. That is why we can see the first and last 30 rows.

Pandas Set Index Example | Python DataFrame.set_index() Tutorial

If you get the above output, then you have successfully imported the data.

The first row holds the column labels (City, Edition, Sport, Discipline, Athlete, NOC, Gender, Event, Event_gender, and Medal). The first column holds the row labels (0, 1, 2, and so on). All other cells are filled with data values.

There are several ways that you can create the Pandas DataFrame. In most cases, you will use the DataFrame constructor and fill out the data, labels, and other information. Sometimes, you will import the data from a CSV or Excel file. Then, you can pass the data as the two-dimensional list, tuple, or NumPy array. You can also pass it as the dictionary or Pandas Series instance or as one of many other data types not covered in this example.

Now, let's see the type of index object.

Okay, in the next cell, type the following code to see the type of index object.

type(data.index)

See the below output.

Python DataFrame.set_index() Tutorial

Here you can see that the index has its type.

Remember that the index data is immutable, and we can not change that in any circumstances.

Pandas DataFrame set_index() Example

Now, we will set an index for the Python DataFrame using theset_index() method.

There are two ways to set the DataFrame index.

Use the parameterinplace=Trueto set the current DataFrame index.
Assign the newly created DataFrame index to a variable and use it further to use the Indexed result.

Let's see the first way. But, first, let's choose the Athleteas an index and set that column as an index.

Write the following code in the next cell and see the output.

data.set_index('Athlete',inplace=True)

Run the cell and now display the DataFrame using the following code in the next cell.

data

We can see that in the output that the DataFrame is indexed based on the Athlete Names.

Pandas DataFrame set_index() Example

Here, in the code, we have passed theinplace=Trueas a parameter, which means assigning the Athleteindex to the current DataFrame.

Pandas DataFrames can sometimes be very large, making it absurd to look at all the rows at once. You can use.head() to show the first few elements and .tail() to show the last few elements.

Each column of the Pandas DataFrame is an instance of Pandas Series, a structure that contains one-dimensional data and its labels. Thus, you can get a single element of a Series object the same way you would with a dictionary using its label as the key.

The attributes .ndim, .shape, and .size return the number of dimensions, the number of data values across each dimension, and a total number of data values, respectively.

Reset Index in Pandas DataFrame

Pandas reset_index() method resets an index of a Data Frame. reset_index() method sets a list of integers ranging from 0 to length of data as an index. We can use thereset_index()function to reset the index. Let's see the following code.

data.reset_index(inplace=True) data

See the output below.

Reset Index in Pandas DataFrame

Now, see the second way to use the set_index() method.

Write the following code in the next cell.

indexedData = data.set_index('Athlete') indexedData

See the below output.

Pandas Set Index Example | Python DataFrame.set_index() Tutorial For Beginners

Here, we can see that we have not passed thesecondparameter, and also, we have saved the data to the other variable and display that data into the Jupyter Notebook.

So, in this tutorial, we have seen both the methods to use any column as an index and also see how we can reset that index using the reset_index() method.

Other Examples of Python Set Index

Python is an extraordinary language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages.

Python's Pandas is one of those packages and makes importing and analyzing data much more comfortable.

Pandas set_index() is the method to set a List, Series, or Data frame as an index of a DataFrame.

Index column can be set while making the data frame too. But sometimes, the data frame is made out of two or more data frames, and hence later, the index can be changed using the set_index() method.

            >>>                        df            =            pd            .            DataFrame            ({            'month'            :            [            1            ,            4            ,            7            ,            10            ],            ...                        'year'            :            [            2012            ,            2014            ,            2013            ,            2014            ],            ...                        'sale'            :            [            55            ,            40            ,            84            ,            31            ]})            >>>                        df                          month  year  sale            0      1  2012    55            1      4  2014    40            2      7  2013    84            3     10  2014    31

Set the index to become the 'month' column:

            >>>                        df            .            set_index            (            'month'            )                          year  sale            month            1      2012    55            4      2014    40            7      2013    84            10     2014    31

Create the MultiIndex using columns' year' and 'month':

            >>>                        df            .            set_index            ([            'year'            ,            'month'            ])                          sale            year  month            2012  1     55            2014  4     40            2013  7     84            2014  10    31

Create the MultiIndex using an Index and a column:

            >>>                        df            .            set_index            ([            pd            .            Index            ([            1            ,            2            ,            3            ,            4            ]),            'year'            ])                          month  sale                          year            1  2012  1      55            2  2014  4      40            3  2013  7      84            4  2014  10     31

Create a MultiIndex using two Series:

            >>>                        s            =            pd            .            Series            ([            1            ,            2            ,            3            ,            4            ])            >>>                        df            .            set_index            ([            s            ,            s            **            2            ])                          month  year  sale            1 1       1  2012    55            2 4       4  2014    40            3 9       7  2013    84            4 16     10  2014    31

Python Dataframe set_index not setting

Let's say you have a data frame and set the index to the column 'Timestamp'.

Currently, the index is just a row number. For example, the Timestamp's format is 2019-10-02 15:42:00.

You need to write the following code to set_index.

df.set_index('Timestamp', inplace=True, drop=True)

You need to either specify inplace=True or assign the result to the variable.

Convert index of pandas DataFrame into column

You can access a column in a Pandas DataFrame the same way you would get a value from a dictionary.

Let's figure out how to convert an index of the data frame to a column.

From our example, let's set the index to the column sales.

For that, we need to write the following code snippet.

df['sales'] = df.index

Or, we can reset_index().

df.reset_index(level=0, inplace=True)

Pandas set index to multiple columns.

In this example, two columns will be made as index columns.

The drop parameter is used to Drop the column, and the append parameter is used to append the passed columns to the already existing index column.

df.set_index(["Month", "Year"], inplace = True,                              append = True, drop = False)

With df.reset_index(level=df.index.names, inplace=True) one can convert a given whole multiindex into columns.

You can change the index as explained already using the set_index() method.

You don't need to swap rows with columns manually; there is a Pandas transpose() method in pandas that does it for you.

How to assign multi-index in Pandas DataFrame

You can use the set_index() function so that multiple columns can be assigned as multi-index. By specifying a list of column names in the first argument keys, multiple columns are assigned as multi-index.

Let's say we have this data: people.csv

Okay, let's create a DataFrame from the CSV file.

import pandas as pd  data = pd.read_csv('people.csv')  df = pd.DataFrame(data) print(df.head(10))

Output

Name Sex  Age  Height  Weight 0  Alex   M   41      74     170 1  Bert   M   42      68     166 2  Carl   M   32      70     155 3  Dave   M   39      72     167 4  Elly   F   30      66     124 5  Fran   F   33      66     115 6  Gwen   F   26      64     121 7  Hank   M   30      71     158 8  Ivan   M   53      72     175 9  Jake   M   32      69     143

Okay, now let's set two columns as an index. See the following code.

import pandas as pd  data = pd.read_csv('people.csv')  df = pd.DataFrame(data) df10 = df.head(10) df_mul_index = df10.set_index(['Sex', 'Age']) print(df_mul_index)

Output

          Name  Height  Weight Sex Age M   41   Alex      74     170     42   Bert      68     166     32   Carl      70     155     39   Dave      72     167 F   30   Elly      66     124     33   Fran      66     115     26   Gwen      64     121 M   30   Hank      71     158     53   Ivan      72     175     32   Jake      69     143

From the output, you can see that we have assigned a multi-index.

Sorting with the sort_index() function makes it displayed neatly.

import pandas as pd  data = pd.read_csv('people.csv')  df = pd.DataFrame(data) df10 = df.head(10) df_mul_index = df10.set_index(['Sex', 'Age']) df_mul_index.sort_index(inplace=True) print(df_mul_index)

Output

Name  Height  Weight Sex Age F   26   Gwen      64     121     30   Elly      66     124     33   Fran      66     115 M   30   Hank      71     158     32   Carl      70     155     32   Jake      69     143     39   Dave      72     167     41   Alex      74     170     42   Bert      68     166     53   Ivan      72     175

Now, it's neat and clean.

Pandas set index: change index to another column.

If you set another column with set_index(), the original index will be deleted. If you want to keep the original index as a column, use reset_index() to reassign the index to a sequential number starting from 0. See the code.

import pandas as pd  data = pd.read_csv('people.csv')  df = pd.DataFrame(data) df10 = df.head(10) df_mul_index = df10.set_index(['Sex', 'Age']) df_re_index = df_mul_index.reset_index() print(df_re_index)

Output

Sex  Age  Name  Height  Weight 0   M   41  Alex      74     170 1   M   42  Bert      68     166 2   M   32  Carl      70     155 3   M   39  Dave      72     167 4   F   30  Elly      66     124 5   F   33  Fran      66     115 6   F   26  Gwen      64     121 7   M   30  Hank      71     158 8   M   53  Ivan      72     175 9   M   32  Jake      69     143

Select rows and elements using index

You can select rows and elements by the name index using loc[].

import pandas as pd  data = pd.read_csv('people.csv')  df = pd.DataFrame(data) df10 = df.head(10) df_index = df10.set_index(['Name']) daloc = df_index.loc['Gwen'] print(daloc)

Output

Sex         F Age        26 Height     64 Weight    121 Name: Gwen, dtype: object

Finally, the Pandas Set Index Example is over.