Create a route optimization algorithm with zero costs using google's OR-tools and OSRM Part 1

This is a tutorial for an algorithm that has riped over the years of experience in doing algorithms and websites for carpooling, Uber like service, Freight routing, and many other applications.

I will be stating the current algorithm and i will discuss my decisions on certain code implementations as well as some of the previous decisions that i took and proved to be wrong, slow or error prone.

This algorithm is capable of optimizing for thousands of locations, and made to be highly customizable.

Features of the algorithm are:

Reading data from and exporting data to an excel sheet or a Database.
Multiple time windows per location.
Multiple vehicle types, capacities, buffers, depots.
Custom optimization parameters like distance, time, fuel consumption, etc.
Pickup and dropoff.
Custom loading and unloading time per stop.
Custom service time at certain locations.
Custom max idle time waiting for the next stop's time window.
Custom distributed workload amongst all vehicles.
Custom vehicle start and End time.
Max Distance, stops and time per vehicle.

Graphical interface:

Can be done using google maps or any of the many free services.
Routes for each vehicle can be colored differently and markers's icons can be changed as well and can be dynamic, bouncy for instance.
An algorithm to get over google's limitations over the number of waypoints per route which can cost thousands for free.

Sample of the route optimization algorithm

Sample of the optimized routes using leaflet

In case you want to create a Database, read from and write to it, checkout my other tutorial first.

First we create a python file named initializers that holds all our constants to keep our main code clean.

initializers.py

import logging
from logging.handlers

import RotatingFileHandler

#------------------------------------------ database -----------------------------------------------------------

DB_READ_INTERVAL = 10 # secs

ZERO_LIMIT = 60 * 60 # secs

UNABLE_LIMIT = 30 * 60 # secs

REQUEST_LIMIT = 2000

HOSTNAME = '127.0.0.1'

PASSWORD = 'Ilovealgorithms'

DATABASE = 'carpool'

USERNAME = 'Algoguy'

#-----------------------------------------------------------------------------------------------------------------

# --------------------------------------- algorithm ------------------------------------------------------------

OSRM_URL = 'http://127.0.0.1:5000'

OSM_READ_INTERVAL = 5 # secs
TIME_FACTOR = 60 # mins

IGNORE_TIME = True

KM_LIMIT = 500 # km

#------------------------------------------------------------------------------------------------------------------

#--------------------------------------------- Algorithm logging --------------------------------------------

LOGFILE = 'Algorithm_log.log'

MAX_LOG_SIZE = 5 # MB

MAX_LOG_NO = 20

log_formatter = logging.Formatter('%(asctime)s, func_name: %(funcName)s %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s','%Y-%m-%d -- %H:%M:%S')

app_handler = RotatingFileHandler(LOGFILE,

mode='a',

maxBytes = MAX_LOG_SIZE*1024*1024,

backupCount = MAX_LOG_NO,

encoding = None,

delay = 0)

app_handler.setFormatter(log_formatter)

app_handler.setLevel(logging.INFO)

app_log = logging.getLogger('algorithm')

app_log.setLevel(logging.INFO)

app_log.addHandler(app_handler)

#-----------------------------------------------------------------------------------------------------------------

#------------------------------------------------ Monitor -----------------------------------------------------

MONITOR_TIME = 0.1 # Seconds
MEMORY_USAGE_LIMIT = 90 # Percent
CPU_USAGE_LIMIT = 90 # Percent

#---------------------------------------------------------------------------------------------------------------

#----------------------------------------------- Monitor logging -------------------------------------------

LOGFILE = 'Monitor_log.log'

MAX_LOG_SIZE = 5 # MB

MAX_LOG_NO = 20

Log_formatter = logging.Formatter('%(asctime)s, func_name: %(funcName)s %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s','%Y-%m-%d -- %H:%M:%S')

my_handler = RotatingFileHandler(LOGFILE,

mode='a',

maxBytes = MAX_LOG_SIZE*1024*1024,

backupCount = MAX_LOG_NO,

encoding = None,

delay = 0)

my_handler.setFormatter(Log_formatter)

my_handler.setLevel(logging.INFO)

monitor_log = logging.getLogger('monitor')

monitor_log.setLevel(logging.INFO)

monitor_log.addHandler(my_handler)

#----------------------------------------------------------------------------------------------------------

#----------------------------------------------- Send_mail ---------------------------------------------

RETRY_SEND_LIMIT = 10

filenames = ['Algorithm_log.log',

'Monitor_log.log',

'serveme_log.log']

We will explain each part of the initializers as we go through the algorithm.
logging.Formatter('%(asctime)s, func_name: %(funcName)s %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s','%Y-%m-%d -- %H:%M:%S')
initializes the format of our output to the log file so each line starts with the date and time, the function name within which the logging function was called, then the severity(level) of the message then the filename that contains the function and the line number.

First we start reading the data from the db and building our events list.
We have two ways to do it,

This method lets the db do all the hard work, and the db could crash or return weird results if the db is big.

app_log.info("Reading request table to prepare event_arr")

time_now = datetime.datetime.now().strftime("%Y-%m-%d") + ' 00:00:00'

time_future = (datetime.datetime.now() + datetime.timedelta(days = 1)).strftime("%Y-%m-%d") + ' 23:59:00'

Zero_time = False

while True:

try:

with psycopg2.connect(host=HOSTNAME,

user=USERNAME,

password=PASSWORD,

dbname=DATABASE) as conn:

with conn.cursor(name='my_cursor',

cursor_factory=psycopg2.extras.DictCursor) as cur: # create a named server-side cursor

cur.execute("SELECT offer.lat, offer.lon, offer.id, offer.capacity, request.lat, request.lon, request.id, request.trip_id, offer.event_id, event.event_time, event.lat, event.lon FROM ((offer INNER JOIN request ON request.event_id = offer.event_id)INNER JOIN event ON offer.event_id = event.id) WHERE event.lat > 51.2 AND event.lat < 51.8 AND event.lon > -0.62 AND event.lon < 0.38 AND event.event_time between '{}' and '{}';".format(time_now, time_future))

df = pd.DataFrame(np.array(cur.fetchall()))

if len(df) > 0:

df.columns = ['offer_lat', 'offer_lon', 'offer_acc_id', 'offer_capacity',

'request_lat', 'request_lon', 'request_acc_id',

'event_id', 'event_time', 'event_lat', 'event_lon']

Zero_time = False

else:

app_log.warning("returned 0 results")

if Zero_time == False:

Zero_time = time()

else:

if time() - Zero_time > ZERO_LIMIT:

Send_mail()

Zero_time = False

continue

except psycopg2.OperationalError as e:

if "violate" in str(e):

app_log.critical(e)

Send_mail()

sleep(DB_READ_INTERVAL)

exit()

app_log.error("warning: Unable to connect")

app_log.error("Retrying in %s Secs"%DB_READ_INTERVAL)

app_log.debug(str(e))

if Zero_time == False:

Zero_time = time()

else:

if time() - Zero_time > ZERO_LIMIT:

Send_mail()

Zero_time = False

sleep(DB_READ_INTERVAL)

else:

app_log.info("Done reading request table")

event_arr = []

for ID in np.unique(df['event_id']).tolist():

event_arr.append(df[df['event_id']==ID].reset_index(drop=True))

return event_arr

continue

INNER JOIN event ON offer.event_id = event.id returns the matching rows from the two tables on offer['event_id'] = event['id']
What comes after the JOIN keyword is called the secondary table in some texts.

SELECT offer.lat, offer.lon, offer.id, offer.capacity, request.lat, request.lon, request.id, request.trip_id, offer.event_id, event.event_time, event.lat, event.lon FROM Those are the main tables and we specify which columns to return from each respectively after we get the INNER JOIN results with the secondary tables.

We don't need to query the db for all the days in one go since the future offers and requests will change anyways, and when the db grows, that will be a resources intensive query with no obvious benefit.
So we query between the time_now which is the current datetime.datetime.now().strftime("%Y-%m-%d")  Year - month - day starting at midnight ' 00:00:00' and time_future which is one day later datetime.timedelta(days = 1) , we get the year - month - day as well but before midnight + ' 23:59:00'.
Zero_time is basically the time period where i get undesirable results from the db, let that be zero results or the db is not responding at all.

cur.fetchall()  executes the query on the server and returns the result of the query.
np.array transforms the result into a numpy array.
pd.DataFrame transforms the numpy array into a pandas datafrom.
pandas has a function pd.read_sql which can read a table easily without all the fuss but when you need to execute certain queries, that function won't be helpful.
If there is any data returned if len(df) > 0: then we give column names to the pandas dataframe df.columns = [....] as data returned by the sql query respects the order of the SELECT method but it doesn't return the names, just data.
if no data is returned then we check if this is the first time if Zero_time == False: to start the timer else we check to see if more than ZERO_LIMIT time has passed, if True then it sends me an email Send_mail(), we will discuss that function later and restarts the time again Zero_time = False ,otherwise continue the querying db loop again.
All of the above functions lie within the try: block,
If an exception happens except psycopg2.OperationalError as e: which handles connection errors as the only thing that could happen here and not to be considered dangerous is a connection timeout.
I couldn't find good documentation for psycopg2 but sometimes if a violation happens, it can invoke an operationalerror so i look for the violate in the error returned so my code intentionally exits on that if "violate" in str(e): and gives me a critical error in the log.
If an exception happens then we do the Zero_time loop check again but now we sleep for a certain  sleep(DB_READ_INTERVAL)  instead of swarming the server with queries and making it slow.
If everything in the try: block works then we go into the else: block
np.unique(df['event_id']).tolist() returns a list ignoring any duplicates in df['event'_id'] , thats because the dataframe will hold duplicate event ids corresponding to respective requests and offers.
we loop over that list to generate an events array such that each array holds requests and offers with the same event_id
df['event_id']==ID will return an array of the same size but with all the rows with event_id == ID set to True and the others set to False.
df[df['event_id']==ID] will return only rows that have same event_id == ID.
reset_index(drop=True)  will reset the original indices of the rows and reindex them starting from 0 so we can call them by index incrementally later.

The method above was method 1 and as you can see, it doesn't involve much coding, but it lets the db do all the work.

Next is method2

def read_events():

time_now = datetime.datetime.now().strftime("%Y-%m-%d") + ' 00:00:00'

time_future = (datetime.datetime.now() + datetime.timedelta(days = 1)).strftime("%Y-%m-%d") + ' 23:59:00'

Zero_time = False

while True:

app_log.info('Reading events to places')

try:

with psycopg2.connect(host=HOSTNAME,

user=USERNAME,

password=PASSWORD,

dbname=DATABASE) as conn:

with conn.cursor(name='my_cursor',
cursor_factory=psycopg2.extras.DictCursor) as cur: # create a named server-side cursor

cur.execute("SELECT event.id, event.event_time, place.lat, place.lon FROM event WHERE event.lat > 51.2 AND event.lat < 51.8 AND event.lon > -0.62 AND event.lon < 0.38 AND event.event_time between '{}' and '{}';".format(time_now,time_future))

df = pd.DataFrame(np.array(cur.fetchall()))

if len(df) > 0:

Zero_time = False

df.columns = ['event_id','event_time', 'place_lat','place_lon']

app_log.info('found '+str(len(df))+' events')

else:

app_log.warning("returned 0 results for events to places")

if Zero_time == False:

Zero_time = time()

else:

if time() - Zero_time > ZERO_LIMIT:

Send_mail()

Zero_time = False

app_log.info("Retrying in %s Secs"%DB_READ_INTERVAL)

sleep(DB_READ_INTERVAL)

continue

except psycopg2.OperationalError as e:

if "violate" in str(e):

app_log.critical(e)

Send_mail()

sleep(DB_READ_INTERVAL)

continue

app_log.error("Unable to connect to db")

app_log.info("Retrying in %s Secs"%DB_READ_INTERVAL)

app_log.debug(str(e))

if Zero_time == False:

Zero_time = time()

else:

if time() - Zero_time > UNABLE_LIMIT:

Send_mail()

Zero_time = False

sleep(DB_READ_INTERVAL)

continue

else:

return df

First we read all the events within a certain latitude and longtitude taking place between today and tomorrow.
Next we use those events to query for the offers and requests going to those events.

def read_offers(Events):
Zero_time = False
reading = time()
while True:
try:
Final_df = pd.DataFrame({'offer_lat': [],
'offer_lon': [],
'offer_acc_id': [],
'offer_capacity': [],
'event_id': []})
spliced_arr = splice_arr(Events)
for arr in spliced_arr:
statement = 'SELECT lat, lon, id, capacity, event_id FROM offer WHERE'
for event in arr:
statement += " event_id = '" + event + "' OR"
statement = statement[:-3]
with psycopg2.connect(host=HOSTNAME,
user=USERNAME,
password=PASSWORD,
dbname=DATABASE) as conn:
with conn.cursor(name='my_cursor', cursor_factory=psycopg2.extras.DictCursor) as cur:
cur.execute(statement)
df = pd.DataFrame(np.array(cur.fetchall()))
if len(df) > 0:
df.columns = ['offer_lat', 'offer_lon', 'offer_acc_id', 'offer_capacity', 'event_id']
Final_df = Final_df.append(df,ignore_index=True)
if len(Final_df) > 0:
app_log.info('Got '+str(len(Final_df))+ ' offers')
else:
app_log.warning("got 0 offers for the events\n Retrying in %s Secs"%DB_READ_INTERVAL)
if Zero_time == False:
Zero_time = time()
else:
if time() - Zero_time > ZERO_LIMIT:
Send_mail()
Zero_time = False
sleep(DB_READ_INTERVAL)
return False
except psycopg2.OperationalError as e:
if "violate" in str(e):
app_log.critical(e)
Send_mail()
sleep(DB_READ_INTERVAL)
continue
app_log.error("Unable to connect to get offer_to_event")
app_log.info("Retrying in %s Secs"%DB_READ_INTERVAL)
app_log.debug(str(e))
if Zero_time == False:
Zero_time = time()
else:
if time() - Zero_time > UNABLE_LIMIT:
Send_mail()
Zero_time = False
sleep(DB_READ_INTERVAL)
return False
else:
app_log.info('Time reading offers%i'%int(time()-reading))
app_log.info("Done reading offers going to events")
return Final_df

Instead of querying for offers going to all of the events in one go, we can splice them such that we query for a number of REQUEST_LIMIT events per query,
thats what splice_arr does, and we does that so as not to bog down the db.
Assuming REQUEST_LIMIT is 2 and we have 5 events ['a','b','c','d','e'] then splice_arr will return [['a','b'], ['c','d'],['e']].
Next we iterate over that array to create the query statement, if we take the first one ['a', 'b'] our statement will be "SELECT lat, lon, id, capacity, event_id FROM offer WHERE event_id = 'a' OR event_id = 'b' OR" then we execute statement = statement[:-3] which removes the last unnecessary part in the end of the statement OR .
Then we query for that statement and save the result into Final_df
Next we query for the requests using the events of the offers, as querying for all the events is useless, we only need events which have offers.

def read_requests(Events):
app_log.info("reading requests going to events")
Zero_time = False
reading = time()
while True:
try:
Final_df = pd.DataFrame({'request_lat': [],
'request_lon': [],
'request_acc_id': [],
'request_trip_id': [],
'event_id': []})
spliced_arr = splice_arr(Events)
for arr in spliced_arr:
statement = 'SELECT lat, lon, id, trip_id, event_id FROM request WHERE'
for request in arr:
statement += " event_id = '" + request + "' OR"
statement = statement[:-3]
with psycopg2.connect(host=HOSTNAME,
user=USERNAME,
password=PASSWORD,
dbname=DATABASE) as conn:
with conn.cursor(name='my_cursor', cursor_factory=psycopg2.extras.DictCursor) as cur:
cur.execute(statement)
df = pd.DataFrame(np.array(cur.fetchall()))
if len(df) > 0:
df.columns = ['request_lat', 'request_lon', 'request_acc_id', 'request_trip_id', 'event_id']
Final_df = Final_df.append(df,ignore_index=True)
if len(Final_df) > 0:
app_log.info('Got '+str(len(Final_df))+ ' requests')
Zero_time = False
else:
app_log.warning("got 0 requests for the events\n Retrying in %s Secs"%DB_READ_INTERVAL)
if Zero_time == False:
Zero_time = time()
else:
if time() - Zero_time > ZERO_LIMIT:
Send_mail()
Zero_time = False
sleep(DB_READ_INTERVAL)
return False

except psycopg2.OperationalError as e:
if "violate" in str(e):
app_log.critical(e)
Send_mail()
sleep(DB_READ_INTERVAL)
continue
app_log.error("Unable to connect to get request_to_event")
app_log.info("Retrying in %s Secs"%DB_READ_INTERVAL)
app_log.debug(e)
if Zero_time == False:
Zero_time = time()
else:
if time() - Zero_time > ZERO_LIMIT:
Send_mail()
Zero_time = False
sleep(DB_READ_INTERVAL)
return False
else:
app_log.info('total request time %i'%int(time()-reading))
app_log.info("Done reading requests going to events")
return Final_df

Next we prepare the data that we will Feed into OR-tools in our next tutorial

Search This Blog

The algorithms portal

Create a route optimization algorithm with zero costs using google's OR-tools and OSRM Part 1

Comments

Post a Comment

Popular posts from this blog

Create a route optimization algorithm with zero costs using google's OR-tools and OSRM Part 3

Learn python programming through algorithms - Binpacking part 2