Reading and writing data

Reading and writing data#

import pandas as pd

Reading data#

Reading dictionaries#

We have already seen examples of reading data into pandas from a python dictionary.

big_data = {
    'year': [2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014, 2013, 2014],
    'ticker': ['AAPL', 'AAPL', 'AET', 'AET', 'BA', 'BA', 'BUD', 'BUD', 'CAT', 'CAT', 'COKE', 'COKE', 'CVX', 'CVX', 'DAL', 'DAL', 'DIS', 'DIS', 'DOW', 'DOW', 'DTV', 'DTV', 'GM', 'GM', 'HOG', 'HOG', 'IBM', 'IBM', 'LMT', 'LMT', 'LUV', 'LUV', 'MSFT', 'MSFT', 'PFE', 'PFE', 'TAP', 'TAP', 'TWC', 'TWC', 'VZ', 'VZ'],
    'assets_current': [73286, 68531, 9718.9, 11764, 65074, 67785, 18690, 18541, 38335, 38867, 239.215, 287.001, 50250, 42232, 9651, 12465, 14109, 15176, 24977, 24267, 5953, 8819, 81501, 83670, 3988.803, 3948.095, 51350, 49422, 13329, 12329, 4456, 4404, 101466, 114246, 56244, 57702, 1537.7, 1578.9, 2144, 2316, 70994, 29623],
    'liab_current': [43658, 63448, 12602.9, 15356.5, 51486, 56717, 25627, 27208, 27297, 27877, 208.816, 227.42, 33018, 31926, 14152, 16879, 11704, 13292, 11971, 11593, 6530, 6959, 62412, 65701, 2509.586, 2389.286, 40154, 39600, 11120, 11112, 5676, 5923, 37417, 45625, 23366, 21631, 2142.1, 2325.3, 5226, 4497, 27050, 28064],
    'price': [476.75, 100.75, 68.59, 88.83, 136.49, 129.98, 106.46, 112.32, 90.81, 91.53, 73.19, 88.03, 124.91, 112.18, 27.47, 49.19, 64.49, 89.03, 44.4, 45.61, 69.06, 86.7, 40.87, 34.91, 69.24, 65.91, 187.57, 160.44, 148.66, 192.57, 18.84, 42.32, 34.545, 41.7, 30.63, 31.15, 56.15, 74.52, 135.5, 152.06, 49.14, 46.78],
}

big_df = pd.DataFrame(big_data)

index_cols = ['ticker', 'year']
big_df.set_index(index_cols, inplace=True)
big_df.head()

		assets_current	liab_current	price
ticker	year
AAPL	2013	73286.0	43658.0	476.75
AAPL	2014	68531.0	63448.0	100.75
AET	2013	9718.9	12602.9	68.59
AET	2014	11764.0	15356.5	88.83
BA	2013	65074.0	51486.0	136.49

Reading CSV files#

You can read data from CSV files by using the read_csv() method from pandas.

You can import data directly from the web if you know the URL. Here is some airline data from GitHub.

airline_link = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/airline-safety/airline-safety.csv'
df_airlines = pd.read_csv(airline_link)
df_airlines.tail()

	airline	avail_seat_km_per_week	incidents_85_99	fatal_accidents_85_99	fatalities_85_99	incidents_00_14	fatal_accidents_00_14	fatalities_00_14
51	United / Continental*	7139291291	19	8	319	14	2	109
52	US Airways / America West*	2455687887	16	7	224	11	2	23
53	Vietnam Airlines	625084918	7	3	171	1	0	0
54	Virgin Atlantic	1005248585	1	0	0	0	0	0
55	Xiamen Airlines	430462962	9	1	82	2	0	0

Here is another example using data on traffic fatalities.

fatalities_link = 'https://vincentarelbundock.github.io/Rdatasets/csv/AER/Fatalities.csv'
df_fatal = pd.read_csv(
  fatalities_link,
  usecols=['beertax', 'drinkage', 'miles', 'fatal', 'state', 'year', 'pop'],
  index_col=['state', 'year'],
)
df_fatal

		beertax	drinkage	miles	fatal	pop
state	year
al	1982	1.539379	19.00	7233.887207	839	3.942002e+06
	1983	1.788991	19.00	7836.347656	930	3.960008e+06
	1984	1.714286	19.00	8262.990234	932	3.988992e+06
	1985	1.652542	19.67	8726.916992	882	4.021008e+06
	1986	1.609907	21.00	8952.853516	1081	4.049994e+06
...	...	...	...	...	...	...
wy	1984	0.049451	19.00	9994.155273	157	5.129998e+05
	1985	0.047669	19.00	10611.010742	152	5.089996e+05
	1986	0.046440	19.00	10619.331055	168	5.069999e+05
	1987	0.045000	19.00	10953.049805	129	4.900005e+05
	1988	0.043311	19.50	11812.115234	155	4.789997e+05

336 rows × 5 columns

You can do the same thing with data that is stored locally on your computer, but instead of passing the URL, you need to pass the file’s location/path.

A Mac (or Linux) path to a CSV file will look something like this: /Users/tyson/Desktop/my_file.csv.

A Windows path to a CSV file will look something like this: C:\Users\tyson\Desktop\my_file.csv. However, the backslash character — \ — is problematic is many programming languages. Instead, you should use forward slashes like this: C:/Users/tyson/Desktop/my_file.csv.

This is how you could load a local file instead of one from a website.

Warning

The following code will not work on your computer. You will need to provide your own unique path to the data on your machine.

other_data_path = '/Users/tyson/Desktop/other_data.csv'
df_other = pd.read_csv(other_data_path)

Writing data#

Writing CSV files#

Suppose you wanted to save a version of the airlines data to your computer. You could use pandas to write it to a CSV file.

Warning

Again, this will not work on your machine since you will have a different path.

out_file_name = 'C:/Users/tyson/airline_data.csv'
df_airlines.to_csv(out_file_name, index=False)

Notice that I also included index=False since I don’t care about the arbitrary numbers that were created for the index when I imported it.

Reading and writing data

Contents