vendredi 28 août 2015

Incorrect number of rows returned by JOIN query in Python script

I wrote the following code in Python that connects to DB, creates two tables and joins them. Then it prints the result of the JOIN query.

The problem is that the number of rows is 3, though I expected to get 2 rows. Also, if I run the same query in a command prompt using sqlite>, then the number of rows returned by the JOIN is correct, i.e. 2.

import sqlite3 as lite
import pandas as pd

# Connecting to the database. The `connect()` method returns a connection object.
con = lite.connect('getting_started.db')

with con:
    cur = con.cursor()
    cur.execute("DROP TABLE IF EXISTS cities")
    cur.execute("DROP TABLE IF EXISTS weather")

    cur.execute("CREATE TABLE cities (name text, state text)")
    cur.execute("CREATE TABLE weather (city text, year integer, warm_month text, cold_month text, average_high integer)")

    # Filling 'cities' with the data
    cur.execute("INSERT INTO cities VALUES('Washington', 'DC')")
    cur.execute("INSERT INTO cities VALUES('Houston', 'TX')")

    # Filling 'weather' with the data
    cur.execute("INSERT INTO weather VALUES('Washington', 2013, 'July', 'January', 59)")
    cur.execute("INSERT INTO weather VALUES('Houston', 2013, 'July', 'January', 62)")

    # Joining data together
    sql = "SELECT name, state, year, warm_month, cold_month FROM cities " \
          "INNER JOIN weather " \
          "ON name = city"
    cur.execute(sql)

rows = cur.fetchall()
cols = [desc[0] for desc in cur.description]

# Loading data into pandas
df = pd.DataFrame(rows, columns=cols)

for index, row in df.iterrows():
    print("City: {0}, The warmest month: {1}".format(row['name'],row['warm_month']))

The result is:

City: Washington, The warmest month: July
City: Washington, The warmest month: July
City: Houston, The warmest month: July

Aucun commentaire:

Enregistrer un commentaire