Create , insert, read, modify, drop data from POSTGRESQL db using python

I have supplied here, some of the code that i used to interact with a db for a carpooling service.
So after creating the DB and the algorithm, i wanted to test my algorithm against live fuzzy data, so i created some functions that removed data from the DB, added new random data while respecting some rules.
Here we explore the rules that i saw fit for the website.

def create_tables():
           statements = [['DROP TABLE recommendations'],
                                 ['DROP TABLE offer CASCADE'],
                                 ['DROP TABLE request CASCADE'],
                                 ['DROP TABLE event CASCADE'],
                                 ['CREATE TABLE "event" (id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), lat NUMERIC NOT NULL, lon NUMERIC NOT NULL, event_time TIMESTAMP NOT NULL)'],
                                 ['CREATE TABLE "offer" (id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), lat NUMERIC NOT NULL, lon NUMERIC NOT NULL,capacity NUMERIC NOT NULL, event_id UUID REFERENCES event)']
                                 ['CREATE TABLE "request" (id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), lat NUMERIC NOT NULL, lon NUMERIC NOT NULL, event_id UUID REFERENCES event)']]

            
Examining the statements one by one,
DROP TABLE means remove that table, if any other table references this table then it will return an error.
DROP TABLE event CASCADE means remove the table event and ignore any references to it.
CREATE TABLE "event"  means create a table named event and notice the double quotes.
id UUID PRIMARY KEY DEFAULT uuid_generate_v4()  means create a column named id of type UUID, PRIMARY KEY means each id is unique and there mustn't be duplicate ids, DEFAULT uuid_generate_v4() means if no value is supplied then create a new id automatically using this function uuid_generate_v4().

lat NUMERIC NOT NULL  means create a column named lat that containes number and it can't be null.

event_id UUID REFERENCES event means create a column named event_id that holds a uuid value and that value must exist in the column that has the property "primary key" in the table event.

# postgresql://Algoguy:Ilovealgorithms@127.0.0.1/carpool

HOSTNAME = 127.0.0.1
USERNAME = Algoguy
PASSWORD  = Ilovealgorithms
DATABASE   = carpool

with psycopg2.connect(host          =  HOSTNAME,
                                      user          =  USERNAME,
                                      password  =  PASSWORD,
                                      dbname     = DATABASE) as conn:
          with conn.cursor() as cur:
                     for statement in statements:
                                 cur.execute(statement[0])
Now we use psycopg2 which can be installed through pip3 install psycopg2  to interact with the database, we open a database cursor using with conn.cursor() as cur , we can open it this way as well cur = conn.cursor() but using "with" ensures that the connection is terminated in the right way even if an error occured on our side, as opposed to us closing it on our own which we can easily forget.
Then we execute the statements one by one using the cursor.

Now that we have created our tables, we can insert data into the tables
Lets start by inserting random events.
First we generate IDs such that there are no duplicate ones.


import uuid

def getID(no):
     ID_list = []
     for i in range(no):
          ID = str(uuid.uuid4())
          while ID in ID_list:      # generate a new ID if the generated one exists
                    ID = str(uuid.uuid4())
          ID_list.append(ID)
     return ID_list

Next we start generating and inserting the random events


import io
import sys
import numpy as np 
import datetime

Events_no   = 100                                   #number of events that we wish to insert
Event_ID    = getID(Events_no)
Event_lat    = np.random.uniform(51.3, 51.7,
                                                     (1,Events_no))[0]
Event_lon   = np.random.uniform(0.61, 0.37,
                                                     (1,Events_no))[0]
Event_time = np.fromfunction(np.vectorize(lambda i,j:(datetime.datetime.now() + datetime.timedelta(days=0)).strftime('%Y-%m-%d %H%M%S')),(1,Events_no))

np.random.uniform(51.3,51.7) generates a numpy array having random values between 51.3 and 51.7 (location in the UK) 
(1, Events_no) is the desired array shape such that its 
only one row holding a number of "Events_no" random values .
[0] chooses the first row of the array which is the only row that exists and the one we need.
np.fromfunction uses a function to create each value within the array
np.vectorize returns an array aware function, for example:

def dumb_func(a):

      return a+1

dumb_func([1,2,3,4,5]) will return an error as it expects a number not an array
if we do,
dumb_vect = np.vectorize(dumb_func) then try
dumb_vect([1,2,3,4,5]) it will return [2,3,4,5,6]

lambda i,j:  creates a function that takes i,j as input, basically np.fromfunction passes two values to the function: i which is the row number and j which is the column number, we don't need them but our function must take them as input.
datetime.datetime.now() + datetime.timedelta(days=0)).strftime('%Y-%m-%d %H%M%S')    is the function that returns the date that we wish, i didn't need datetime.timedelta as i just created events that will happen now but i included it in case you wanted to make that a variable as well, strftime controls the format, for my case it was, %Y year, %m month, %d day, %H hour, %M minute, %S second.
(1, Events_no) is the desired array shape.
("{}\t{}\t{}\t{}\t{}".format(Event_ID[i],  Event_lat[i], Event_lon[i], Event_time[i]) for i in range(Events_no)) creates a generator object which can be iterated over to print all the values, "\t" is a tab space.

Next we have two methods to write to the db, 
Method 1 which is extremely slow
with psycopg2.connect(host=HOSTNAME,
                                      user=USERNAME,
                                      password=PASSWORD,
                                      dbname=DATABASE) as conn:
                   with conn.cursor() as cur:
                              for ID, lat, lon, time in zip(Event_ID, Event_lat, Event_lon, Event_time):
                                     cur.execute("INSERT INTO event (id, lat, lon, event_time) VALUES('{}','{}','{}','{}')".format(ID, lat, lon, time))
Basically we iterate over the values and insert them in rows one by one,

Or Method 2 which is much faster
 with psycopg2.connect(host          = HOSTNAME,
                                       user          = USERNAME,
                                       password  = PASSWORD,
                                       dbname    = DATABASE) as conn:
              with conn.cursor() as cur:
                     cur.copy_from(f,                           # input  
                                              'event',                  # table name
                                              columns  =('id','lat','lon','event_time')) 

Next we define the Iteratorfile class

https://gist.github.com/jsheedy/ed81cdf18190183b3b7d
f = IteratorFile(("{}\t{}\t{}".format(Event_ID[i],  Event_lat[i], Event_lon[i], Event_time[i]) for i in range(Events_no)))

class IteratorFile(io.TextIOBase):
    """ given an iterator which yields strings,
    return a file like object for reading those strings """

    def __init__(self, it):
        self._it = it
        self._f = io.StringIO()

    def read(self, length=sys.maxsize):
        try:
            while self._f.tell() < length:
                self._f.write(next(self._it) + "\n")
             
        except StopIteration as e:
            # soak up StopIteration. this block is not necessary because of finally, but just to be explicit
            pass

        except Exception as e:
            print("uncaught exception: {}".format(e))
         
        finally:
            self._f.seek(0)
            data = self._f.read(length)

            # save the remainder for next read
            remainder = self._f.read()
            self._f.seek(0)
            self._f.truncate(0)
            self._f.write(remainder)
            return data

    def readline(self):
       return next(self._it)
Next we insert data into the offers table which are the people offering rides along the way. 
offers_no = 2000
offer_ID = getID(offers_no)
offer_lat = [randint(512,518)/10 for i in range(offers_no)]   # this is a different method to create random numbers between 51.2 and 51.8
offer_lon = [randint(-62,38)/100 for i in range(offers_no)]
offer_capacity = [randint(1,2) for i in range(offers_no)]
offer_evID = [Event_ID[randint(0,Events_no-1)] for i in range(offers_no)]  #choose a random event id from with the Event_ID list
         
f = IteratorFile(("{}\t{}\t{}\t{}\t{}".format(offer_ID[i], offer_lat[i], offer_lon[i], offer_capacity[i], offer_evID[i]) for i in range(oL)))
with psycopg2.connect(host=HOSTNAME, 
                                      user=USERNAME, 
                                      password=PASSWORD, 
                                      dbname=DATABASE) as conn:
             with conn.cursor() as cur:
                    cur.copy_from(f, 
                                             'offer',
                                             columns=('id','lat','lon','capacity','event_id'))
Next we insert data into the requests table which contains people requesting rides.


requests_no      = 2000
request_ID       = getID(requests_no)
request_lat       = [randint(512,518)/10 for i in range(requests_no)]
request_lon      = [randint(-62,38)/100 for i in range(requests_no)]
request_evID   = np.fromfunction(np.vectorize(lambda i,j:np.random.choice(Event_ID)),(1,Events_no))[0] # a different method to choose random values from Event_id and returns a numpy array of the shape (1,Events_no)
for ID, lat, lon, trID, evID in zip(request_ID,request_lat,request_lon,request_tripID,request_evID):
       with psycopg2.connect(host          = HOSTNAME,
                                             user          = USERNAME, 
                                             password  = PASSWORD, 
                                             dbname    = DATABASE) as conn:
             with conn.cursor() as cur:    # create a named server-side cursor
                     cur.execute("INSERT INTO request (id,lat,lon,trip_id,event_id) VALUES('{}',{},{},'{}','{}')".format(ID,lat,lon,trID,evID))
Now we are done with creating the db , which the algorithm will use.

Comments

Popular posts from this blog

Create a route optimization algorithm with zero costs using google's OR-tools and OSRM Part 3

Learn python programming through algorithms - Binpacking part 2

Create a route optimization algorithm with zero costs using google's OR-tools and OSRM Part 1