Sqlite#

Useful resources#

Example database#

For this tutorial, you first need to download the example database. This database contains NBA data. Save the file as nba.sqlite in the same directory as your code.

Importing the packages#

import sqlite3
import pathlib
import pandas as pd
db_file = pathlib.Path('__file__').parent / 'nba.sqlite'
db_file.exists()
True

Connecting to a database#

First, we need to connect to the database. This is done using the sqlite3.connect() function. This function returns a connection object, which is used to interact with the database.

We then need to create a cursor object. This is used to execute SQL commands.

Once we have the cursor object, we can execute SQL commands using the execute() method. We just use it here to list the tables in the database.

Lastly, we close the cursor and the connection.

# connect to the database
con = sqlite3.connect(db_file)

# create a cursor object
# this is used to execute SQL commands
cur = con.cursor()

# the SQL command to list the tables in the database
sql_command = '''
SELECT name
FROM sqlite_master
WHERE type='table';
'''

# execute the SQL command, fetch all the results, and put them into the "tables" variable
tables = cur.execute(sql_command).fetchall()

# loop over the tables and print them
for table in tables:
    print(table)

# close the connections
cur.close()
con.close()
('game',)
('game_summary',)
('other_stats',)
('officials',)
('inactive_players',)
('game_info',)
('line_score',)
('play_by_play',)
('player',)
('team',)
('common_player_info',)
('team_details',)
('team_history',)
('draft_combine_stats',)
('draft_history',)
('team_info_common',)

Load table into a pandas DataFrame#

We can load a table from the database into a pandas DataFrame using the pd.read_sql_query() function. This function takes a SQL query and a connection object as arguments. It returns a DataFrame.

# connect to the database
con = sqlite3.connect(db_file)

# load the table into a pandas DataFrame
df = pd.read_sql_query('SELECT * FROM player', con)

# close the connection
con.close()

# show the first few rows of the DataFrame
df.head(10)
id full_name first_name last_name is_active
0 76001 Alaa Abdelnaby Alaa Abdelnaby 0
1 76002 Zaid Abdul-Aziz Zaid Abdul-Aziz 0
2 76003 Kareem Abdul-Jabbar Kareem Abdul-Jabbar 0
3 51 Mahmoud Abdul-Rauf Mahmoud Abdul-Rauf 0
4 1505 Tariq Abdul-Wahad Tariq Abdul-Wahad 0
5 949 Shareef Abdur-Rahim Shareef Abdur-Rahim 0
6 76005 Tom Abernethy Tom Abernethy 0
7 76006 Forest Able Forest Able 0
8 76007 John Abramovic John Abramovic 0
9 203518 Alex Abrines Alex Abrines 0