Subsets of DataFrames#
Let’s load the DataFrames that we have been working with.
import pandas as pd
small_data = {
    'ticker': ['AAPL', 'MSFT', 'GOOG', 'AMZN'],
    'price': [137, 229, 1677, 3535],
    'name': ['Apple Inc', 'Microsoft Corp', 'Alphabet Inc', 'Amazon.com Inc'],
    'assets_current': [162.82, 181.92, 152.58, 96.33],
    'liab_current': [105.72, 72.31, 45.22, 87.81]
}
small_df = pd.DataFrame(small_data)
small_df.set_index('ticker', inplace=True)
big_data = {
    'year': [2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014],
    'ticker': ['AAPL', 'AAPL', 'AET', 'AET', 'BA', 'BA', 'BUD', 'BUD', 'CAT', 'CAT', 'COKE', 'COKE', 'CVX', 'CVX', 'DAL', 'DAL', 'DIS', 'DIS', 'DOW', 'DOW', 'DTV', 'DTV', 'GM', 'GM', 'HOG', 'HOG', 'IBM', 'IBM', 'LMT', 'LMT', 'LUV', 'LUV', 'MSFT', 'MSFT', 'PFE', 'PFE', 'TAP', 'TAP', 'TWC', 'TWC', 'VZ', 'VZ'],
    'assets_current': [73286, 68531, 9718.9, 11764, 65074, 67785, 18690, 18541, 38335, 38867, 239.215, 287.001, 50250, 42232, 9651, 12465, 14109, 15176, 24977, 24267, 5953, 8819, 81501, 83670, 3988.803, 3948.095, 51350, 49422, 13329, 12329, 4456, 4404, 101466, 114246, 56244, 57702, 1537.7, 1578.9, 2144, 2316, 70994, 29623],
    'liab_current': [43658, 63448, 12602.9, 15356.5, 51486, 56717, 25627, 27208, 27297, 27877, 208.816, 227.42, 33018, 31926, 14152, 16879, 11704, 13292, 11971, 11593, 6530, 6959, 62412, 65701, 2509.586, 2389.286, 40154, 39600, 11120, 11112, 5676, 5923, 37417, 45625, 23366, 21631, 2142.1, 2325.3, 5226, 4497, 27050, 28064],
    'price': [476.75, 100.75, 68.59, 88.83, 136.49, 129.98, 106.46, 112.32, 90.81, 91.53, 73.19, 88.03, 124.91, 112.18, 27.47, 49.19, 64.49, 89.03, 44.4, 45.61, 69.06, 86.7, 40.87, 34.91, 69.24, 65.91, 187.57, 160.44, 148.66, 192.57, 18.84, 42.32, 34.545, 41.7, 30.63, 31.15, 56.15, 74.52, 135.5, 152.06, 49.14, 46.78],
}
big_df = pd.DataFrame(big_data)
big_df.set_index( ['ticker', 'year'], inplace=True)
Selecting a single cell/value#
small_df
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| AAPL | 137 | Apple Inc | 162.82 | 105.72 | 
| MSFT | 229 | Microsoft Corp | 181.92 | 72.31 | 
| GOOG | 1677 | Alphabet Inc | 152.58 | 45.22 | 
| AMZN | 3535 | Amazon.com Inc | 96.33 | 87.81 | 
If you want to get a single value from a dataframe, use the .at() method.
the_name = small_df.at['AAPL', 'name']
the_name
'Apple Inc'
We can also use .at() to set a new value in a specific cell. For example, if you want to change the price of MSFT to $200.
small_df.at['MSFT', 'price'] = 200
small_df
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| AAPL | 137 | Apple Inc | 162.82 | 105.72 | 
| MSFT | 200 | Microsoft Corp | 181.92 | 72.31 | 
| GOOG | 1677 | Alphabet Inc | 152.58 | 45.22 | 
| AMZN | 3535 | Amazon.com Inc | 96.33 | 87.81 | 
How to do it with a multi index:
big_df.head(10)
| assets_current | liab_current | price | ||
|---|---|---|---|---|
| ticker | year | |||
| AAPL | 2013 | 73286.0 | 43658.0 | 476.75 | 
| 2014 | 68531.0 | 63448.0 | 100.75 | |
| AET | 2013 | 9718.9 | 12602.9 | 68.59 | 
| 2014 | 11764.0 | 15356.5 | 88.83 | |
| BA | 2013 | 65074.0 | 51486.0 | 136.49 | 
| 2014 | 67785.0 | 56717.0 | 129.98 | |
| BUD | 2013 | 18690.0 | 25627.0 | 106.46 | 
| 2014 | 18541.0 | 27208.0 | 112.32 | |
| CAT | 2013 | 38335.0 | 27297.0 | 90.81 | 
| 2014 | 38867.0 | 27877.0 | 91.53 | 
Need to use a tuple as the first argument.
my_price = big_df.at[ ('BUD', 2014), 'price']
my_price
112.32
Selecting columns#
We can select a single column by “indexing” that column. This is very similar to how we referenced items in dictionaries.
small_df['name']
ticker
AAPL         Apple Inc
MSFT    Microsoft Corp
GOOG      Alphabet Inc
AMZN    Amazon.com Inc
Name: name, dtype: object
Since we only grabbed one column, it returns a pandas series, which is basically a list. We could also get multiple columns at the same time by indexing a list of columns and storing the result in fewer_cols.
col_list = ['name', 'price']
fewer_cols = small_df[col_list]
fewer_cols
| name | price | |
|---|---|---|
| ticker | ||
| AAPL | Apple Inc | 137 | 
| MSFT | Microsoft Corp | 200 | 
| GOOG | Alphabet Inc | 1677 | 
| AMZN | Amazon.com Inc | 3535 | 
Selecting rows#
The loc method#
Starting with the small_df DataFrame.
small_df
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| AAPL | 137 | Apple Inc | 162.82 | 105.72 | 
| MSFT | 200 | Microsoft Corp | 181.92 | 72.31 | 
| GOOG | 1677 | Alphabet Inc | 152.58 | 45.22 | 
| AMZN | 3535 | Amazon.com Inc | 96.33 | 87.81 | 
Basic usage#
Suppose you just want the Amazon data. It is very easy to do this with the .loc method. Notice that with this function we use square brackets.
Side Note:
Anytime we select a single row (or a single column), we create something called a pandas Series. It is not a pandas DataFrame. It is more like a list.
amzn_data = small_df.loc['AMZN']
amzn_data
price                       3535
name              Amazon.com Inc
assets_current             96.33
liab_current               87.81
Name: AMZN, dtype: object
Rather that just getting AMZN, let’s grab MSFT and GOOG.
firm_list = ['MSFT', 'GOOG']
two_firms = small_df.loc[firm_list]
two_firms
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| MSFT | 200 | Microsoft Corp | 181.92 | 72.31 | 
| GOOG | 1677 | Alphabet Inc | 152.58 | 45.22 | 
Selecting ranges#
We can select a range of index values by using the colon :
For example if we want everything from MSFT to AMZN.
small_df.loc['MSFT':'AMZN']
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| MSFT | 200 | Microsoft Corp | 181.92 | 72.31 | 
| GOOG | 1677 | Alphabet Inc | 152.58 | 45.22 | 
| AMZN | 3535 | Amazon.com Inc | 96.33 | 87.81 | 
Columns with .loc#
We can also pass a second argument to .loc to specify the columns we want.
firm_list = ['MSFT', 'GOOG']
col_list = ['assets_current', 'liab_current']
two_firms = small_df.loc[ firm_list, col_list ]
two_firms
| assets_current | liab_current | |
|---|---|---|
| ticker | ||
| MSFT | 181.92 | 72.31 | 
| GOOG | 152.58 | 45.22 | 
col_list = ['assets_current', 'liab_current']
two_firms = small_df.loc[ 'AAPL':'GOOG', col_list ]
two_firms
| assets_current | liab_current | |
|---|---|---|
| ticker | ||
| AAPL | 162.82 | 105.72 | 
| MSFT | 181.92 | 72.31 | 
| GOOG | 152.58 | 45.22 | 
We can use .loc to select rows using the index. To select rows using columns that are not the index, see the Filters section below.
.loc with multi index#
For a DataFrame with a multi index, the process is similar. We can still select rows for AAPL since that is the first level of the index:
big_df.head()
| assets_current | liab_current | price | ||
|---|---|---|---|---|
| ticker | year | |||
| AAPL | 2013 | 73286.0 | 43658.0 | 476.75 | 
| 2014 | 68531.0 | 63448.0 | 100.75 | |
| AET | 2013 | 9718.9 | 12602.9 | 68.59 | 
| 2014 | 11764.0 | 15356.5 | 88.83 | |
| BA | 2013 | 65074.0 | 51486.0 | 136.49 | 
big_df.loc['AAPL']
| assets_current | liab_current | price | |
|---|---|---|---|
| year | |||
| 2013 | 73286.0 | 43658.0 | 476.75 | 
| 2014 | 68531.0 | 63448.0 | 100.75 | 
Like we did earlier, we can store this AAPL subset in a new DataFrame if we want to use it later:
aapl_df = big_df.loc['AAPL']
aapl_df
| assets_current | liab_current | price | |
|---|---|---|---|
| year | |||
| 2013 | 73286.0 | 43658.0 | 476.75 | 
| 2014 | 68531.0 | 63448.0 | 100.75 | 
We can also use ranges the same way.
big_df.loc[ 'BA':'DAL' ]
| assets_current | liab_current | price | ||
|---|---|---|---|---|
| ticker | year | |||
| BA | 2013 | 65074.000 | 51486.000 | 136.49 | 
| 2014 | 67785.000 | 56717.000 | 129.98 | |
| BUD | 2013 | 18690.000 | 25627.000 | 106.46 | 
| 2014 | 18541.000 | 27208.000 | 112.32 | |
| CAT | 2013 | 38335.000 | 27297.000 | 90.81 | 
| 2014 | 38867.000 | 27877.000 | 91.53 | |
| COKE | 2013 | 239.215 | 208.816 | 73.19 | 
| 2014 | 287.001 | 227.420 | 88.03 | |
| CVX | 2013 | 50250.000 | 33018.000 | 124.91 | 
| 2014 | 42232.000 | 31926.000 | 112.18 | |
| DAL | 2013 | 9651.000 | 14152.000 | 27.47 | 
| 2014 | 12465.000 | 16879.000 | 49.19 | 
But if we want to use deeper index levels in a multi index — e.g., data for AAPL in 2013 — then we can pass a tuple to .loc.
specific_row = ('AAPL', 2013)
big_df.loc[specific_row]
assets_current    73286.00
liab_current      43658.00
price               476.75
Name: (AAPL, 2013), dtype: float64
We can also pass a tuple of lists for multiple rows:
firm_list = ['BA', 'GM']
tuple_with_list = ( firm_list, 2013 )
big_df.loc[ tuple_with_list, : ]
| assets_current | liab_current | price | ||
|---|---|---|---|---|
| ticker | year | |||
| BA | 2013 | 65074.0 | 51486.0 | 136.49 | 
| GM | 2013 | 81501.0 | 62412.0 | 40.87 | 
Notice the difference between a list of tuples and a tuple of lists:
list_of_tuples = [
    ('BA', 2013),
    ('GM', 2014),
]
big_df.loc[list_of_tuples, :]
| assets_current | liab_current | price | ||
|---|---|---|---|---|
| ticker | year | |||
| BA | 2013 | 65074.0 | 51486.0 | 136.49 | 
| GM | 2014 | 83670.0 | 65701.0 | 34.91 | 
tuple_of_lists = (
    ['BA', 'GM'],
    [2013, 2014],
)
big_df.loc[tuple_of_lists, :]
| assets_current | liab_current | price | ||
|---|---|---|---|---|
| ticker | year | |||
| BA | 2013 | 65074.0 | 51486.0 | 136.49 | 
| 2014 | 67785.0 | 56717.0 | 129.98 | |
| GM | 2013 | 81501.0 | 62412.0 | 40.87 | 
| 2014 | 83670.0 | 65701.0 | 34.91 | 
But if we want to reference only the second index — year — we need the IndexSlice.
# we want all tickers, but just the year 2014
just_2014 = pd.IndexSlice[ :, 2014 ]
# pass this "slice" to .loc[], and also specify the columns
big_df.loc[ just_2014, : ]
| assets_current | liab_current | price | ||
|---|---|---|---|---|
| ticker | year | |||
| AAPL | 2014 | 68531.000 | 63448.000 | 100.75 | 
| AET | 2014 | 11764.000 | 15356.500 | 88.83 | 
| BA | 2014 | 67785.000 | 56717.000 | 129.98 | 
| BUD | 2014 | 18541.000 | 27208.000 | 112.32 | 
| CAT | 2014 | 38867.000 | 27877.000 | 91.53 | 
| COKE | 2014 | 287.001 | 227.420 | 88.03 | 
| CVX | 2014 | 42232.000 | 31926.000 | 112.18 | 
| DAL | 2014 | 12465.000 | 16879.000 | 49.19 | 
| DIS | 2014 | 15176.000 | 13292.000 | 89.03 | 
| DOW | 2014 | 24267.000 | 11593.000 | 45.61 | 
| DTV | 2014 | 8819.000 | 6959.000 | 86.70 | 
| GM | 2014 | 83670.000 | 65701.000 | 34.91 | 
| HOG | 2014 | 3948.095 | 2389.286 | 65.91 | 
| IBM | 2014 | 49422.000 | 39600.000 | 160.44 | 
| LMT | 2014 | 12329.000 | 11112.000 | 192.57 | 
| LUV | 2014 | 4404.000 | 5923.000 | 42.32 | 
| MSFT | 2014 | 114246.000 | 45625.000 | 41.70 | 
| PFE | 2014 | 57702.000 | 21631.000 | 31.15 | 
| TAP | 2014 | 1578.900 | 2325.300 | 74.52 | 
| TWC | 2014 | 2316.000 | 4497.000 | 152.06 | 
| VZ | 2014 | 29623.000 | 28064.000 | 46.78 | 
Important
When using multi indexes with .loc, you should always include the columns argument (even if you want to include all of the columns).
For example, do this:
big_df.loc[ ('AAPL', 2013), : ]
Instead of this:
big_df.loc[ ('AAPL', 2013) ]
Filters#
The .loc method is great for selecting rows using the DataFrame’s index. But in order to select rows based on values in the other columns (using conditional statements), we should use filters.
small_df
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| AAPL | 137 | Apple Inc | 162.82 | 105.72 | 
| MSFT | 200 | Microsoft Corp | 181.92 | 72.31 | 
| GOOG | 1677 | Alphabet Inc | 152.58 | 45.22 | 
| AMZN | 3535 | Amazon.com Inc | 96.33 | 87.81 | 
This is what a filter returns on its own. Notice, it is essentially just a list of True/False values.
price_filter = ( small_df['price'] > 500 )
price_filter
ticker
AAPL    False
MSFT    False
GOOG     True
AMZN     True
Name: price, dtype: bool
Rule of Thumb
When using conditionals with pandas DataFrames, it is best practice to always wrap conditional statements in parentheses.
Now let us index our DataFrame using that filter:
small_df[price_filter]
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| GOOG | 1677 | Alphabet Inc | 152.58 | 45.22 | 
| AMZN | 3535 | Amazon.com Inc | 96.33 | 87.81 | 
.contains()#
Here is another filter that checks whether a string is contained in a certain column:
fancy_filter = ( small_df['name'].str.contains('Inc') )
small_df[fancy_filter]
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| AAPL | 137 | Apple Inc | 162.82 | 105.72 | 
| GOOG | 1677 | Alphabet Inc | 152.58 | 45.22 | 
| AMZN | 3535 | Amazon.com Inc | 96.33 | 87.81 | 
Here is another filter that combines multiple filters.
fancy_filter2 = ( ( small_df['name'].str.contains('Inc') ) & (small_df['price'] > 1000) )
small_df[fancy_filter2]
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| GOOG | 1677 | Alphabet Inc | 152.58 | 45.22 | 
| AMZN | 3535 | Amazon.com Inc | 96.33 | 87.81 | 
Note
See here for a discussion on & versus and for logical expressions in pandas.
TLDR; Use & in pandas.
Suppose we have the following DataFrame which contains student data:
Note
None is a way of storing a missing or empty value in python.
student_data = {
  'student': ['981', '428', '762', '304', '698', '170'],
  'quiz1': [90, None, 80, 85, 75, 95],
  'major': ['FIN', 'FIN', 'ACT', None, 'FIN', 'MGMT']
}
student_df = pd.DataFrame(student_data)
student_df.set_index('student', inplace=True)
student_df
| quiz1 | major | |
|---|---|---|
| student | ||
| 981 | 90.0 | FIN | 
| 428 | NaN | FIN | 
| 762 | 80.0 | ACT | 
| 304 | 85.0 | None | 
| 698 | 75.0 | FIN | 
| 170 | 95.0 | MGMT | 
.between()#
If we want to select only the students that got 90-100 on the quiz, we can use the .between() method.
grade_filter = ( student_df['quiz1'].between(90, 100) )
student_df[grade_filter]
| quiz1 | major | |
|---|---|---|
| student | ||
| 981 | 90.0 | FIN | 
| 170 | 95.0 | MGMT | 
.isin()#
If we want only the students that are finance or accounting majors, we can use the .isin() method.
major_list = ['FIN', 'ACT']
major_filter = ( student_df['major'].isin(major_list) )
student_df[major_filter]
| quiz1 | major | |
|---|---|---|
| student | ||
| 981 | 90.0 | FIN | 
| 428 | NaN | FIN | 
| 762 | 80.0 | ACT | 
| 698 | 75.0 | FIN | 
.isna()#
If we want only the students for whom we do not have data about their major, we can use the .isna() method:
missing_filter1 = ( student_df['major'].isna() )
student_df[missing_filter1]
| quiz1 | major | |
|---|---|---|
| student | ||
| 304 | 85.0 | None | 
.notna()#
If we want only the students that are not missing data for their major or their quiz score, we can use the .notna() method:
missing_filter2 = ( student_df['major'].notna() & student_df['quiz1'].notna() )
student_df[missing_filter2]
| quiz1 | major | |
|---|---|---|
| student | ||
| 981 | 90.0 | FIN | 
| 762 | 80.0 | ACT | 
| 698 | 75.0 | FIN | 
| 170 | 95.0 | MGMT | 
The query method#
Another way to select rows is the .query() method. In many situations it is easier to implement than filters or .loc.
small_df
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| AAPL | 137 | Apple Inc | 162.82 | 105.72 | 
| MSFT | 200 | Microsoft Corp | 181.92 | 72.31 | 
| GOOG | 1677 | Alphabet Inc | 152.58 | 45.22 | 
| AMZN | 3535 | Amazon.com Inc | 96.33 | 87.81 | 
small_df.query('name == "Apple Inc"')
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| AAPL | 137 | Apple Inc | 162.82 | 105.72 | 
small_df.query('price > 1000')
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| GOOG | 1677 | Alphabet Inc | 152.58 | 45.22 | 
| AMZN | 3535 | Amazon.com Inc | 96.33 | 87.81 | 
.query() also works with named indexes.
small_df.query('index == "GOOG"')
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| GOOG | 1677 | Alphabet Inc | 152.58 | 45.22 | 
Or similarly, using the name of the index instead of index:
small_df.query('ticker == "GOOG"')
| price | name | assets_current | liab_current | |
|---|---|---|---|---|
| ticker | ||||
| GOOG | 1677 | Alphabet Inc | 152.58 | 45.22 | 
student_df
| quiz1 | major | |
|---|---|---|
| student | ||
| 981 | 90.0 | FIN | 
| 428 | NaN | FIN | 
| 762 | 80.0 | ACT | 
| 304 | 85.0 | None | 
| 698 | 75.0 | FIN | 
| 170 | 95.0 | MGMT | 
If we want to use functionality like .isin() or .isna() inside .query(), then we need to include one more option — engine='python'.
student_df.query("student.isin( ['428', '698'] )", engine='python')
| quiz1 | major | |
|---|---|---|
| student | ||
| 428 | NaN | FIN | 
| 698 | 75.0 | FIN | 
student_df.query('major.isna()', engine='python')
| quiz1 | major | |
|---|---|---|
| student | ||
| 304 | 85.0 | None | 
It is also possible to reference other/external variables in the query statement by using the @ character.
best_major = 'FIN'
student_df.query('major == @best_major')
| quiz1 | major | |
|---|---|---|
| student | ||
| 981 | 90.0 | FIN | 
| 428 | NaN | FIN | 
| 698 | 75.0 | FIN | 
A final example#
Here is an animal example (from the pandas documentation) of a DataFrame with three levels to the multi index. How would we reference the various elements?
d = {
  'num_legs': [4, 4, 2, 2, 2],
    'num_wings': [0, 0, 2, 2, 2],
    'category': ['mammal', 'mammal', 'mammal', 'bird', 'bird'],
    'animal': ['cat', 'dog', 'bat', 'penguin', 'robin'],
    'locomotion': ['walks', 'walks', 'flies', 'walks', 'flies'],
    'pet': [True, True, False, False, False],
}
df = pd.DataFrame(d)
df.set_index( ['category', 'animal', 'locomotion'], inplace=True )
df
| num_legs | num_wings | pet | |||
|---|---|---|---|---|---|
| category | animal | locomotion | |||
| mammal | cat | walks | 4 | 0 | True | 
| dog | walks | 4 | 0 | True | |
| bat | flies | 2 | 2 | False | |
| bird | penguin | walks | 2 | 2 | False | 
| robin | flies | 2 | 2 | False | 
Using .loc#
If we want just the mammals:
df.loc['mammal']
| num_legs | num_wings | pet | ||
|---|---|---|---|---|
| animal | locomotion | |||
| cat | walks | 4 | 0 | True | 
| dog | walks | 4 | 0 | True | 
| bat | flies | 2 | 2 | False | 
If we want just the animals that fly:
fly_slice = pd.IndexSlice[ :, :, 'flies' ]
df.loc[ fly_slice, :]
| num_legs | num_wings | pet | |||
|---|---|---|---|---|---|
| category | animal | locomotion | |||
| mammal | bat | flies | 2 | 2 | False | 
| bird | robin | flies | 2 | 2 | False | 
If we want just the dog and penguin:
animal_list = ['dog', 'penguin']
animal_slice = pd.IndexSlice[ :, animal_list ]
df.loc[ animal_slice, : ]
| num_legs | num_wings | pet | |||
|---|---|---|---|---|---|
| category | animal | locomotion | |||
| mammal | dog | walks | 4 | 0 | True | 
| bird | penguin | walks | 2 | 2 | False | 
Or, in one line (even though it is harder to read):
df.loc[ pd.IndexSlice[:, ['dog', 'penguin'] ], : ]
| num_legs | num_wings | pet | |||
|---|---|---|---|---|---|
| category | animal | locomotion | |||
| mammal | dog | walks | 4 | 0 | True | 
| bird | penguin | walks | 2 | 2 | False | 
If we want mammals that fly:
fly_mam_slice = pd.IndexSlice[ 'mammal', :, 'flies' ]
df.loc[ fly_mam_slice, : ]
| num_legs | num_wings | pet | |||
|---|---|---|---|---|---|
| category | animal | locomotion | |||
| mammal | bat | flies | 2 | 2 | False | 
Using .query#
See below for how to do the same things with the .query method.
df.query('category == "mammal"')
| num_legs | num_wings | pet | |||
|---|---|---|---|---|---|
| category | animal | locomotion | |||
| mammal | cat | walks | 4 | 0 | True | 
| dog | walks | 4 | 0 | True | |
| bat | flies | 2 | 2 | False | 
df.query('locomotion == "flies"')
| num_legs | num_wings | pet | |||
|---|---|---|---|---|---|
| category | animal | locomotion | |||
| mammal | bat | flies | 2 | 2 | False | 
| bird | robin | flies | 2 | 2 | False | 
df.query('animal.isin(["dog", "penguin"])', engine='python')
| num_legs | num_wings | pet | |||
|---|---|---|---|---|---|
| category | animal | locomotion | |||
| mammal | dog | walks | 4 | 0 | True | 
| bird | penguin | walks | 2 | 2 | False | 
df.query('category=="mammal" & locomotion=="flies"')
| num_legs | num_wings | pet | |||
|---|---|---|---|---|---|
| category | animal | locomotion | |||
| mammal | bat | flies | 2 | 2 | False |