Reading and writing data
Contents
Reading and writing data#
import pandas as pd
Reading data#
Reading dictionaries#
We have already seen examples of reading data into pandas from a python dictionary.
big_data = {
'year': [2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014],
'ticker': ['AAPL', 'AAPL', 'AET', 'AET', 'BA', 'BA', 'BUD', 'BUD', 'CAT', 'CAT', 'COKE', 'COKE', 'CVX', 'CVX', 'DAL', 'DAL', 'DIS', 'DIS', 'DOW', 'DOW', 'DTV', 'DTV', 'GM', 'GM', 'HOG', 'HOG', 'IBM', 'IBM', 'LMT', 'LMT', 'LUV', 'LUV', 'MSFT', 'MSFT', 'PFE', 'PFE', 'TAP', 'TAP', 'TWC', 'TWC', 'VZ', 'VZ'],
'assets_current': [73286, 68531, 9718.9, 11764, 65074, 67785, 18690, 18541, 38335, 38867, 239.215, 287.001, 50250, 42232, 9651, 12465, 14109, 15176, 24977, 24267, 5953, 8819, 81501, 83670, 3988.803, 3948.095, 51350, 49422, 13329, 12329, 4456, 4404, 101466, 114246, 56244, 57702, 1537.7, 1578.9, 2144, 2316, 70994, 29623],
'liab_current': [43658, 63448, 12602.9, 15356.5, 51486, 56717, 25627, 27208, 27297, 27877, 208.816, 227.42, 33018, 31926, 14152, 16879, 11704, 13292, 11971, 11593, 6530, 6959, 62412, 65701, 2509.586, 2389.286, 40154, 39600, 11120, 11112, 5676, 5923, 37417, 45625, 23366, 21631, 2142.1, 2325.3, 5226, 4497, 27050, 28064],
'price': [476.75, 100.75, 68.59, 88.83, 136.49, 129.98, 106.46, 112.32, 90.81, 91.53, 73.19, 88.03, 124.91, 112.18, 27.47, 49.19, 64.49, 89.03, 44.4, 45.61, 69.06, 86.7, 40.87, 34.91, 69.24, 65.91, 187.57, 160.44, 148.66, 192.57, 18.84, 42.32, 34.545, 41.7, 30.63, 31.15, 56.15, 74.52, 135.5, 152.06, 49.14, 46.78],
}
big_df = pd.DataFrame(big_data)
index_cols = ['ticker', 'year']
big_df.set_index(index_cols, inplace=True)
big_df.head()
assets_current | liab_current | price | ||
---|---|---|---|---|
ticker | year | |||
AAPL | 2013 | 73286.0 | 43658.0 | 476.75 |
2014 | 68531.0 | 63448.0 | 100.75 | |
AET | 2013 | 9718.9 | 12602.9 | 68.59 |
2014 | 11764.0 | 15356.5 | 88.83 | |
BA | 2013 | 65074.0 | 51486.0 | 136.49 |
Reading CSV files#
You can read data from CSV files by using the read_csv()
method from pandas.
You can import data directly from the web if you know the URL. Here is some airline data from GitHub.
airline_link = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/airline-safety/airline-safety.csv'
df_airlines = pd.read_csv(airline_link)
df_airlines.tail()
airline | avail_seat_km_per_week | incidents_85_99 | fatal_accidents_85_99 | fatalities_85_99 | incidents_00_14 | fatal_accidents_00_14 | fatalities_00_14 | |
---|---|---|---|---|---|---|---|---|
51 | United / Continental* | 7139291291 | 19 | 8 | 319 | 14 | 2 | 109 |
52 | US Airways / America West* | 2455687887 | 16 | 7 | 224 | 11 | 2 | 23 |
53 | Vietnam Airlines | 625084918 | 7 | 3 | 171 | 1 | 0 | 0 |
54 | Virgin Atlantic | 1005248585 | 1 | 0 | 0 | 0 | 0 | 0 |
55 | Xiamen Airlines | 430462962 | 9 | 1 | 82 | 2 | 0 | 0 |
Here is another example using data on traffic fatalities.
fatalities_link = 'https://vincentarelbundock.github.io/Rdatasets/csv/AER/Fatalities.csv'
df_fatal = pd.read_csv(
fatalities_link,
usecols=['beertax', 'drinkage', 'miles', 'fatal', 'state', 'year', 'pop'],
index_col=['state', 'year'],
)
df_fatal
beertax | drinkage | miles | fatal | pop | ||
---|---|---|---|---|---|---|
state | year | |||||
al | 1982 | 1.539379 | 19.00 | 7233.887207 | 839 | 3.942002e+06 |
1983 | 1.788991 | 19.00 | 7836.347656 | 930 | 3.960008e+06 | |
1984 | 1.714286 | 19.00 | 8262.990234 | 932 | 3.988992e+06 | |
1985 | 1.652542 | 19.67 | 8726.916992 | 882 | 4.021008e+06 | |
1986 | 1.609907 | 21.00 | 8952.853516 | 1081 | 4.049994e+06 | |
... | ... | ... | ... | ... | ... | ... |
wy | 1984 | 0.049451 | 19.00 | 9994.155273 | 157 | 5.129998e+05 |
1985 | 0.047669 | 19.00 | 10611.010742 | 152 | 5.089996e+05 | |
1986 | 0.046440 | 19.00 | 10619.331055 | 168 | 5.069999e+05 | |
1987 | 0.045000 | 19.00 | 10953.049805 | 129 | 4.900005e+05 | |
1988 | 0.043311 | 19.50 | 11812.115234 | 155 | 4.789997e+05 |
336 rows × 5 columns
You can do the same thing with data that is stored locally on your computer, but instead of passing the URL, you need to pass the file’s location/path.
A Mac (or Linux) path to a CSV file will look something like this: /Users/tyson/Desktop/my_file.csv
.
A Windows path to a CSV file will look something like this: C:\Users\tyson\Desktop\my_file.csv
. However, the backslash character — \
— is problematic is many programming languages. Instead, you should use forward slashes like this: C:/Users/tyson/Desktop/my_file.csv
.
This is how you could load a local file instead of one from a website.
Warning
The following code will not work on your computer. You will need to provide your own unique path to the data on your machine.
other_data_path = '/Users/tyson/Desktop/other_data.csv'
df_other = pd.read_csv(other_data_path)
Writing data#
Writing CSV files#
Suppose you wanted to save a version of the airlines data to your computer. You could use pandas to write it to a CSV file.
Warning
Again, this will not work on your machine since you will have a different path.
out_file_name = 'C:/Users/tyson/airline_data.csv'
df_airlines.to_csv(out_file_name, index=False)
Notice that I also included index=False
since I don’t care about the arbitrary numbers that were created for the index when I imported it.