Create , insert, read, modify, drop data from POSTGRESQL db using python
I have supplied here, some of the code that i used to interact with a db for a carpooling service.
So after creating the DB and the algorithm, i wanted to test my algorithm against live fuzzy data, so i created some functions that removed data from the DB, added new random data while respecting some rules.
Here we explore the rules that i saw fit for the website.
Here we explore the rules that i saw fit for the website.
def create_tables():
statements = [['DROP TABLE recommendations'],
['DROP TABLE offer CASCADE'],
['DROP TABLE request CASCADE'],
['DROP TABLE event CASCADE'],
['CREATE TABLE "event" (id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), lat NUMERIC NOT NULL, lon NUMERIC NOT NULL, event_time TIMESTAMP NOT NULL)'],
['CREATE TABLE "offer" (id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), lat NUMERIC NOT NULL, lon NUMERIC NOT NULL,capacity NUMERIC NOT NULL, event_id UUID REFERENCES event)']
['CREATE TABLE "request" (id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), lat NUMERIC NOT NULL, lon NUMERIC NOT NULL, event_id UUID REFERENCES event)']]
statements = [['DROP TABLE recommendations'],
['DROP TABLE offer CASCADE'],
['DROP TABLE request CASCADE'],
['DROP TABLE event CASCADE'],
['CREATE TABLE "event" (id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), lat NUMERIC NOT NULL, lon NUMERIC NOT NULL, event_time TIMESTAMP NOT NULL)'],
['CREATE TABLE "offer" (id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), lat NUMERIC NOT NULL, lon NUMERIC NOT NULL,capacity NUMERIC NOT NULL, event_id UUID REFERENCES event)']
['CREATE TABLE "request" (id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), lat NUMERIC NOT NULL, lon NUMERIC NOT NULL, event_id UUID REFERENCES event)']]
Examining the statements one by one,
DROP TABLE means remove that table, if any other table references this table then it will return an error.
DROP TABLE event CASCADE means remove the table event and ignore any references to it.
CREATE TABLE "event" means create a table named event and notice the double quotes.
id UUID PRIMARY KEY DEFAULT uuid_generate_v4() means create a column named id of type UUID, PRIMARY KEY means each id is unique and there mustn't be duplicate ids, DEFAULT uuid_generate_v4() means if no value is supplied then create a new id automatically using this function uuid_generate_v4().
lat NUMERIC NOT NULL means create a column named lat that containes number and it can't be null.
event_id UUID REFERENCES event means create a column named event_id that holds a uuid value and that value must exist in the column that has the property "primary key" in the table event.
DROP TABLE means remove that table, if any other table references this table then it will return an error.
DROP TABLE event CASCADE means remove the table event and ignore any references to it.
CREATE TABLE "event" means create a table named event and notice the double quotes.
id UUID PRIMARY KEY DEFAULT uuid_generate_v4() means create a column named id of type UUID, PRIMARY KEY means each id is unique and there mustn't be duplicate ids, DEFAULT uuid_generate_v4() means if no value is supplied then create a new id automatically using this function uuid_generate_v4().
lat NUMERIC NOT NULL means create a column named lat that containes number and it can't be null.
event_id UUID REFERENCES event means create a column named event_id that holds a uuid value and that value must exist in the column that has the property "primary key" in the table event.
# postgresql://Algoguy:Ilovealgorithms@127.0.0.1/carpool
HOSTNAME = 127.0.0.1
USERNAME = Algoguy
PASSWORD = Ilovealgorithms
DATABASE = carpool
with psycopg2.connect(host = HOSTNAME,
user = USERNAME,
password = PASSWORD,
dbname = DATABASE) as conn:
with conn.cursor() as cur:
for statement in statements:
cur.execute(statement[0])
Then we execute the statements one by one using the cursor.
Now that we have created our tables, we can insert data into the tables
Lets start by inserting random events.
First we generate IDs such that there are no duplicate ones.
import uuid
def getID(no):
ID_list = []
for i in range(no):
ID = str(uuid.uuid4())
while ID in ID_list: # generate a new ID if the generated one exists
ID = str(uuid.uuid4())
ID_list.append(ID)
return ID_list
import io
import sys
import numpy as np
import datetime
Event_ID = getID(Events_no)
Event_lat = np.random.uniform(51.3, 51.7,
(1,Events_no))[0]
Event_lon = np.random.uniform(0.61, 0.37,
(1,Events_no))[0]
Event_time = np.fromfunction(np.vectorize(lambda i,j:(datetime.datetime.now() + datetime.timedelta(days=0)).strftime('%Y-%m-%d %H%M%S')),(1,Events_no))
(1, Events_no) is the desired array shape such that its only one row holding a number of "Events_no" random values .
[0] chooses the first row of the array which is the only row that exists and the one we need.
np.fromfunction uses a function to create each value within the array
np.vectorize returns an array aware function, for example:
def dumb_func(a):
return a+1
dumb_func([1,2,3,4,5]) will return an error as it expects a number not an array
if we do,
dumb_vect = np.vectorize(dumb_func) then try
dumb_vect([1,2,3,4,5]) it will return [2,3,4,5,6]
lambda i,j: creates a function that takes i,j as input, basically np.fromfunction passes two values to the function: i which is the row number and j which is the column number, we don't need them but our function must take them as input.
datetime.datetime.now() + datetime.timedelta(days=0)).strftime('%Y-%m-%d %H%M%S') is the function that returns the date that we wish, i didn't need datetime.timedelta as i just created events that will happen now but i included it in case you wanted to make that a variable as well, strftime controls the format, for my case it was, %Y year, %m month, %d day, %H hour, %M minute, %S second.
(1, Events_no) is the desired array shape.
("{}\t{}\t{}\t{}\t{}".format(Event_ID[i], Event_lat[i], Event_lon[i], Event_time[i]) for i in range(Events_no)) creates a generator object which can be iterated over to print all the values, "\t" is a tab space.
Next we have two methods to write to the db,
Method 1 which is extremely slow
with psycopg2.connect(host=HOSTNAME,
user=USERNAME,
password=PASSWORD,
dbname=DATABASE) as conn:
with conn.cursor() as cur:
for ID, lat, lon, time in zip(Event_ID, Event_lat, Event_lon, Event_time):
cur.execute("INSERT INTO event (id, lat, lon, event_time) VALUES('{}','{}','{}','{}')".format(ID, lat, lon, time))
Basically we iterate over the values and insert them in rows one by one,user=USERNAME,
password=PASSWORD,
dbname=DATABASE) as conn:
with conn.cursor() as cur:
for ID, lat, lon, time in zip(Event_ID, Event_lat, Event_lon, Event_time):
cur.execute("INSERT INTO event (id, lat, lon, event_time) VALUES('{}','{}','{}','{}')".format(ID, lat, lon, time))
Or Method 2 which is much faster
with psycopg2.connect(host = HOSTNAME,
user = USERNAME,
password = PASSWORD,
dbname = DATABASE) as conn:
with conn.cursor() as cur:
cur.copy_from(f, # input 'event', # table name
columns =('id','lat','lon','event_time'))
https://gist.github.com/jsheedy/ed81cdf18190183b3b7d
return a file like object for reading those strings """
Next we insert data into the offers table which are the people offering rides along the way.
f = IteratorFile(("{}\t{}\t{}".format(Event_ID[i], Event_lat[i], Event_lon[i], Event_time[i]) for i in range(Events_no)))
class IteratorFile(io.TextIOBase):
""" given an iterator which yields strings,return a file like object for reading those strings """
def __init__(self, it):
self._it = it
self._f = io.StringIO()
def read(self, length=sys.maxsize):
try:
while self._f.tell() < length:
self._f.write(next(self._it) + "\n")
except StopIteration as e:
# soak up StopIteration. this block is not necessary because of finally, but just to be explicit
pass
except Exception as e:
print("uncaught exception: {}".format(e))
finally:
self._f.seek(0)
data = self._f.read(length)
# save the remainder for next read
remainder = self._f.read()
self._f.seek(0)
self._f.truncate(0)
self._f.write(remainder)
return data
def readline(self):
return next(self._it)
offers_no = 2000
offer_ID = getID(offers_no)
offer_lat = [randint(512,518)/10 for i in range(offers_no)] # this is a different method to create random numbers between 51.2 and 51.8
offer_lon = [randint(-62,38)/100 for i in range(offers_no)]
offer_capacity = [randint(1,2) for i in range(offers_no)]
offer_evID = [Event_ID[randint(0,Events_no-1)] for i in range(offers_no)] #choose a random event id from with the Event_ID list
f = IteratorFile(("{}\t{}\t{}\t{}\t{}".format(offer_ID[i], offer_lat[i], offer_lon[i], offer_capacity[i], offer_evID[i]) for i in range(oL)))
with psycopg2.connect(host=HOSTNAME,
user=USERNAME,
password=PASSWORD,
dbname=DATABASE) as conn:
with conn.cursor() as cur:
cur.copy_from(f,
'offer',
columns=('id','lat','lon','capacity','event_id'))
requests_no = 2000
request_ID = getID(requests_no)
request_lat = [randint(512,518)/10 for i in range(requests_no)]
request_lon = [randint(-62,38)/100 for i in range(requests_no)]
request_evID = np.fromfunction(np.vectorize(lambda i,j:np.random.choice(Event_ID)),(1,Events_no))[0] # a different method to choose random values from Event_id and returns a numpy array of the shape (1,Events_no)
for ID, lat, lon, trID, evID in zip(request_ID,request_lat,request_lon,request_tripID,request_evID):
with psycopg2.connect(host = HOSTNAME,
user = USERNAME,
password = PASSWORD,
dbname = DATABASE) as conn:
with conn.cursor() as cur: # create a named server-side cursor
cur.execute("INSERT INTO request (id,lat,lon,trip_id,event_id) VALUES('{}',{},{},'{}','{}')".format(ID,lat,lon,trID,evID))
Comments
Post a Comment