Reading and writing data#

import pandas as pd

Reading data#

Reading dictionaries#

We have already seen examples of reading data into pandas from a python dictionary.

big_data = {
    'year': [2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014],
    'ticker': ['AAPL', 'AAPL', 'AET', 'AET', 'BA', 'BA', 'BUD', 'BUD', 'CAT', 'CAT', 'COKE', 'COKE', 'CVX', 'CVX', 'DAL', 'DAL', 'DIS', 'DIS', 'DOW', 'DOW', 'DTV', 'DTV', 'GM', 'GM', 'HOG', 'HOG', 'IBM', 'IBM', 'LMT', 'LMT', 'LUV', 'LUV', 'MSFT', 'MSFT', 'PFE', 'PFE', 'TAP', 'TAP', 'TWC', 'TWC', 'VZ', 'VZ'],
    'assets_current': [73286, 68531, 9718.9, 11764, 65074, 67785, 18690, 18541, 38335, 38867, 239.215, 287.001, 50250, 42232, 9651, 12465, 14109, 15176, 24977, 24267, 5953, 8819, 81501, 83670, 3988.803, 3948.095, 51350, 49422, 13329, 12329, 4456, 4404, 101466, 114246, 56244, 57702, 1537.7, 1578.9, 2144, 2316, 70994, 29623],
    'liab_current': [43658, 63448, 12602.9, 15356.5, 51486, 56717, 25627, 27208, 27297, 27877, 208.816, 227.42, 33018, 31926, 14152, 16879, 11704, 13292, 11971, 11593, 6530, 6959, 62412, 65701, 2509.586, 2389.286, 40154, 39600, 11120, 11112, 5676, 5923, 37417, 45625, 23366, 21631, 2142.1, 2325.3, 5226, 4497, 27050, 28064],
    'price': [476.75, 100.75, 68.59, 88.83, 136.49, 129.98, 106.46, 112.32, 90.81, 91.53, 73.19, 88.03, 124.91, 112.18, 27.47, 49.19, 64.49, 89.03, 44.4, 45.61, 69.06, 86.7, 40.87, 34.91, 69.24, 65.91, 187.57, 160.44, 148.66, 192.57, 18.84, 42.32, 34.545, 41.7, 30.63, 31.15, 56.15, 74.52, 135.5, 152.06, 49.14, 46.78],
}

big_df = pd.DataFrame(big_data)

index_cols = ['ticker', 'year']
big_df.set_index(index_cols, inplace=True)
big_df.head()
assets_current liab_current price
ticker year
AAPL 2013 73286.0 43658.0 476.75
2014 68531.0 63448.0 100.75
AET 2013 9718.9 12602.9 68.59
2014 11764.0 15356.5 88.83
BA 2013 65074.0 51486.0 136.49

Reading CSV files#

You can read data from CSV files by using the read_csv() method from pandas.

You can import data directly from the web if you know the URL. Here is some airline data from GitHub.

airline_link = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/airline-safety/airline-safety.csv'
df_airlines = pd.read_csv(airline_link)
df_airlines.tail()
airline avail_seat_km_per_week incidents_85_99 fatal_accidents_85_99 fatalities_85_99 incidents_00_14 fatal_accidents_00_14 fatalities_00_14
51 United / Continental* 7139291291 19 8 319 14 2 109
52 US Airways / America West* 2455687887 16 7 224 11 2 23
53 Vietnam Airlines 625084918 7 3 171 1 0 0
54 Virgin Atlantic 1005248585 1 0 0 0 0 0
55 Xiamen Airlines 430462962 9 1 82 2 0 0

Here is another example using data on traffic fatalities.

fatalities_link = 'https://vincentarelbundock.github.io/Rdatasets/csv/AER/Fatalities.csv'
df_fatal = pd.read_csv(
  fatalities_link,
  usecols=['beertax', 'drinkage', 'miles', 'fatal', 'state', 'year', 'pop'],
  index_col=['state', 'year'],
)
df_fatal
beertax drinkage miles fatal pop
state year
al 1982 1.539379 19.00 7233.887207 839 3.942002e+06
1983 1.788991 19.00 7836.347656 930 3.960008e+06
1984 1.714286 19.00 8262.990234 932 3.988992e+06
1985 1.652542 19.67 8726.916992 882 4.021008e+06
1986 1.609907 21.00 8952.853516 1081 4.049994e+06
... ... ... ... ... ... ...
wy 1984 0.049451 19.00 9994.155273 157 5.129998e+05
1985 0.047669 19.00 10611.010742 152 5.089996e+05
1986 0.046440 19.00 10619.331055 168 5.069999e+05
1987 0.045000 19.00 10953.049805 129 4.900005e+05
1988 0.043311 19.50 11812.115234 155 4.789997e+05

336 rows × 5 columns

You can do the same thing with data that is stored locally on your computer, but instead of passing the URL, you need to pass the file’s location/path.

A Mac (or Linux) path to a CSV file will look something like this: /Users/tyson/Desktop/my_file.csv.

A Windows path to a CSV file will look something like this: C:\Users\tyson\Desktop\my_file.csv. However, the backslash character — \ — is problematic is many programming languages. Instead, you should use forward slashes like this: C:/Users/tyson/Desktop/my_file.csv.

This is how you could load a local file instead of one from a website.

Warning

The following code will not work on your computer. You will need to provide your own unique path to the data on your machine.

other_data_path = '/Users/tyson/Desktop/other_data.csv'
df_other = pd.read_csv(other_data_path)

Writing data#

Writing CSV files#

Suppose you wanted to save a version of the airlines data to your computer. You could use pandas to write it to a CSV file.

Warning

Again, this will not work on your machine since you will have a different path.

out_file_name = 'C:/Users/tyson/airline_data.csv'
df_airlines.to_csv(out_file_name, index=False)

Notice that I also included index=False since I don’t care about the arbitrary numbers that were created for the index when I imported it.

Public data examples#