Learn python programming through algorithms - Binpacking
Bin packing algorithm for regular and irregular shape
Part 1: theory and data preparation
Part 1: theory and data preparation
This is not intended to be just a programming tutorial that has the one “right” route to solving a problem, I will state the philosophy behind choosing something, and by philosophy I mean that everything has a good side and an underside and I will be detailing them both as well as some of the mistakes and weird undocumented python behaviors that I learned from.
for example :
Most of my clients ask for fitting the largest items first but it almost always fails to bring good results
for example :
Most of my clients ask for fitting the largest items first but it almost always fails to bring good results
I will be explaining the theory and giving examples from my code as I go instead of just stating the theories and explaining them in one bulk.
So many clients asked me to do a variant of an algorithm to pack items for them into the tightest space possible as an alternative for softwares like Fastcam’s fastcut optimizer or Sketchcut and many others as each had its own limitation/problem.
Saving space/ cutting on waste materials is the ultimate goal.
Saving space/ cutting on waste materials is the ultimate goal.
Packing items in the tightest space possible is termed bin packing.
Cutting shapes out of a larger sheet can be termed a cutting stock problem and if the cutting pattern has to be in mind, as in, the cutting patterns must be aligned as much as possible while cutting, then thats called a guillotine problem.
Applications were :
Cutting doors from big slabs of wood along the grain and take notice of the saw’s waste (will be explained later).
Cutting wooden frames for windows , horizontal or vertical to the slabs provided.
Cutting L-shapes and rectangles out of marble/stone for kitchens, some shapes were to be glued together later for the same kitchen, so I added an option to link those shapes such that they are cut from the same slab to have the same shade of color.
Cutting frames and shapes for steel reinforcements and doors.
My first variant of that algorithm was for an extremely nice postmodern artist and the smartest client that I ever had by a large margin, coralie vogelaar , who wanted this for a project of her own involving mixing A.I. with art to pack 3D .stl sculptures into the tightest space possible which I will discuss more later.
Sample of my output including L-shapes |
So first we need to define shapes in space in a form that an algorithm can understand taking in mind precision vs performance, which is a major field in computer graphics.
When talking about performance and taking in mind the current modern era processors’s computing power,
I work with a certain philosophy,
using RAM is better than using the HDD/ SSD ,
using the cpu’s cache is better than using RAM.
Modern day processors have very fast ALUs so I found out that recomputing something from the product of cached values is always faster than storing that product then restoring it later.
For instance,
For instance,
Assume we have a rectangle and we need to compute its area many times every now and then, and our cache can only store its length and width, which will force us to store the area product on RAM to recall it later, Calling the length and width from cache and multiplying them to get the area is always faster (less expensive) than calling the area data from the memory.
The reason is that ram is much slower than a modern day cpu, for example:
DDR4 RAM’s memory frequency is about 2400 MHZ.
DDR4 RAM’s memory frequency is about 2400 MHZ.
Cache frequency is the same as a processor’s frequency which can be like 3.9 GHZ or more.
Not to mention the number of wasted cycles to request and deliver data from RAM vs cache and don’t forget about cache misses.
Normally a cpu caches the frequently used variables automatically,
Using keywords like constant and static in C++ can bias the cpu towards storing some data in cache.
Using keywords like constant and static in C++ can bias the cpu towards storing some data in cache.
Python doesn’t have that unfortunately, there are workarounds, but python unfortunately is not well documented sometimes.
Now there are many methods to represent shapes (meshing):
Binary trees:
some developers resort to quadtrees or octrees, basically a volume is split into 4 or 8 nodes/leafs (subparts) if we need to represent something with a finer resolution.
Quadtrees are generally used for 2D spaces while octrees are used for 3D spaces, and there is a general consensus that they are for static objects, but that doesn't make it true.
Thats textbook talk but we can apply that in many different ways to meet our purposes as we have to balance between performance vs precision.
As an example,
Check this lotus flower, if we would like to determine whether it will collide with something or would like to pack a second lotus flower next to it, we can just contain it in one big square, so if the other lotus’s square touches this one then thats a collision, we can make use of space a bit better and split the two bottom rectangles again , that way we will decrease the wasted space as we will release the two smaller empty squares coloured red inside each parent square, if we need more precision then we can split the upper two squares as well or we can split down again, its a matter of performance vs precision.
From my experience, tuple instantiation and access is much faster than for lists
Check this lotus flower, if we would like to determine whether it will collide with something or would like to pack a second lotus flower next to it, we can just contain it in one big square, so if the other lotus’s square touches this one then thats a collision, we can make use of space a bit better and split the two bottom rectangles again , that way we will decrease the wasted space as we will release the two smaller empty squares coloured red inside each parent square, if we need more precision then we can split the upper two squares as well or we can split down again, its a matter of performance vs precision.
A “parent” is called a “node” and each square inside it is called a “leaf” if it has no leafs inside
Nodes or leafs don’t have to be squares, they can be rectangles or triangles as well, and they don’t have to be split by the center (loose trees),
The reason why I used squares is performance as I wanted to minimize memory usage.
The reason why I used squares is performance as I wanted to minimize memory usage.
Imagine we have a class Node, each node has to have data about the area it bounds which is basically its vertices or AABB(axis aligned bounding box).
I will be calling each square a node, let it be a parent or not.
I will be calling each square a node, let it be a parent or not.
Each node can store a vertix:
We store everything as tuples (x,y) instead of lists [x,y] as tuples are immutable(constant), accordingly they are stored and referenced when needed as opposed to lists that have to be built at runtime.
From my experience, tuple instantiation and access is much faster than for lists
Class Node():
def __init__(self, x1, y1,
x2, y2):
self.A = (x1, y1)
self.B = (x2, y1)
self.C = (x2, y2)
self.D = (x1, y2)
def __init__(self, x1, y1,
x2, y2):
self.A = (x1, y1)
self.B = (x2, y1)
self.C = (x2, y2)
self.D = (x1, y2)
Each node can store a starting vertex A = (x1, y1) and the opposite vertex C = (x2, y2)
Class Node():
D & B can be inferred from the locations of A & C easily when needed, that way we save on memory. def __init__(self, x1,y1,x2,y2):
self.A = (x1, y1)
self.C = (x2, y2)
B = C[0], A[1]
D = A[0], C[1]
We can also flatten the whole cartesian space into a series of points such that each point will have one value for its location,
Imagine each line has 1024 points, then point 0 is below point 1024 and the distance between the two points A, C vertically will be (C//1024) - (A//1024) and the distance between them horizontally will be (C%1024) – (A%1024) and collision detection between a shape and a point will be:
If A %1024 < point < C%1024 and A//1024 < point < C//1024:
return "collision!"
return "collision!"
Class Node():
def __init__(self, p1, p2):
self.A = p1
self.C = p2
def __init__(self, p1, p2):
self.A = p1
self.C = p2
Now we can let the length and width of a box be multiples of 2 as a division or a multiplication by 2 is as cheap as moving a bit forwards or backwards.
Ray tracing:
Always keep in mind that I am discussing the easiest/ encompassing of all methods, not the only method, there are many methods and philosophies for traversal and construction.
For example:
Here is a very beautiful bamboo flower, did you know that a bamboo ,no matter how long it lives, grows its flower only when its about to die.
Here is a very beautiful bamboo flower, did you know that a bamboo ,no matter how long it lives, grows its flower only when its about to die.
We start by drawing two horizontal lines (rays) such that they touch the outmost points of the shape, then we draw another two vertical lines touching the outmost points as well.
Now we have constructed the outmost bounding box, we start traversing the shape to construct a tighter bounding space, starting with one of the outmost points and we start a ray till it intersects with the shape at a point , then we start a second ray till it touches the shape again and we go on till we construct a tighter space.
The smaller the angle that the emerging ray makes with the shape, the tighter the bounding box, the better the precision, the worse the performance.
But check the blue line that is coloured red in the end, and the sharp edges ignored here, for performance reasons we can just let one line line pass straight through these sharp edges, so although some of the edges might be out of the lines, its accepted as we have sacrificed minimal precision for a huge performance gain as drawing and detecting lines containing all of those edges is heavy computationally and pointless for most applications.
Next we go to the representation,
Normally linked lists are used to construct a graph, some store the constructed tree as is and some sort the tree in many different ways with every insertion, that way traversing the tree will be less expensive computationally but sorting can be expensive as well so there is always the balance, the application, and testing.
Normally linked lists are used to construct a graph, some store the constructed tree as is and some sort the tree in many different ways with every insertion, that way traversing the tree will be less expensive computationally but sorting can be expensive as well so there is always the balance, the application, and testing.
Stuttering in some games’s graphics is the result of not mastering that balance.
Now discussing my algorithm for packing 2D shapes:
My clients normally ask me to read data from an excel sheet, so I use pandas to read the excel sheet into a numpy array and xlrd is needed by pandas to select tabs from within the excel sheet which can be installed through pip,
My preferred OS is debian but I have been using ubuntu recently as it has more support/ more recent packages that work instantly with no headache.
To install python, pip, pandas and xlrd
sudo apt-get update
sudo apt-get -y install python3 python3-pip
pip3 install pandas xlrd
sudo apt-get -y install python3 python3-pip
pip3 install pandas xlrd
Next I read the items to be inserted from the excel sheet from a tab named “Items” and I read the containers to choose the best from, from a tab named “Containers”
import pandas as pd
FILE = ‘pack_data.xlsx’ # file name to read data from
ITEMS_TAB = ‘Items’
CONTAINERS_TAB = ‘Containers’
containers_table = pd.read_excel(FILE, sheet_name = CONTAINERS_TAB, index_col = False)
items_table = pd.read_excel(FILE, sheet_name = ITEMS_TAB, index_col=False)
FILE = ‘pack_data.xlsx’ # file name to read data from
ITEMS_TAB = ‘Items’
CONTAINERS_TAB = ‘Containers’
containers_table = pd.read_excel(FILE, sheet_name = CONTAINERS_TAB, index_col = False)
items_table = pd.read_excel(FILE, sheet_name = ITEMS_TAB, index_col=False)
One of my clients wanted items to be linked, as in, must be within the same container.Items belonging to Linked group 0 don’t have to be linked, otherwise all items belonging to the same group number must be linked together.
Some clients required me trying to fit largest area items first,
So first I calculate the area of each item
Some clients required me trying to fit largest area items first,
So first I calculate the area of each item
table['area'] = table['Length'].mul(table['Width'])
np.mul : numpy multiplies each length value with the corresponding width value.
linked_values = sorted(list(set(table['Linked'].values)))
table[‘linked’].values returns all the values inside the linked column as a list.
set() ignores identical values and returns a list with each value only once.
list() then we convert to a list again.
sorted() sorts them in ascending order.
We seperate each linked group of items inside a table_list.
The client can choose to sort using LAFF by setting it to True, otherwise, he sets it to False, and we just use the sheet’s arrangement.
For now, table_list = []
if LAFF == False:
for i,j in enumerate(linked_values): # iterates over the linked_values where i is incremented by 1 with each value and j is the value itself
x = table[table.Linked == j]
if not x.empty: # although checking if the table is empty might be useless here I decided to put it
table_list.append(x)
else: # if LAFF is true then sort them by area, and descending
for i,j in enumerate(linked_values):
x = table[table.Linked == j].sort_values(by=[‘area'], ascending=[False, (not LAFF )]).reset_index(drop=True)
if not x.empty:
table_list.append(x)
x= table[table.Linked == j]
table.Linked and table[‘Linked’] are the same thing,table.Linked== j returns a column, same size as the linked column and each value inside it is replaced with a True or a False depending on whether its equal to j or not, putting table[] around it returns a table containing the rows corresponding to the places where a value of True is found,
the rows in that table have the indices of the original table so we add reset_index(drop=True) to remove the indices of the original table added and start indexing the values from 0 again.
Bin packing is a NP hard combinatorial problem which means there is no right way to do it and the time needed to get to the optimal solution rises exponentially with the number of bins and items to pack.
The most naive method is first fit which is the fastest, and its greedy, basically it is, place the item in the first container that fits otherwise, open a new container and fit the item inside.
A modification of this method is the modified first fit decreasing(MFFD), this method classifies items into sizes like small, medium, large, Xlarge or more classes if you would like, then it starts by placing the items within each class starting with the largest class into containers, and allocating a new container if none fits, then it starts fitting the smaller classes one by one into the already allocated containers.
Another method and the one always suggested and almost worshiped by my clients is the largest area fit first (LAFF) and that is placing the items inside each container by size, the largest one first.
From my experience, LAFF doesn’t produce the best result most of the time and is not necessarily even close.
On to the next section of the code, if the client wants to randomize the code and not pack the items in the order supplied by the file then he sets the variable Randomize = True
if Randomize == True:
for i in range(len(linked_values)):
table_list[i] = table_list[i].sample(frac=1).reset_index(drop=True)
for i in range(len(linked_values)):
table_list[i] = table_list[i].sample(frac=1).reset_index(drop=True)
Here, I iterate over all the table_list and randomize each array inside separately to respect them being linked,
sample(frac=1) means return a random sample from the dataframe and frac=1 means return all items within.
Every saw used to cut shapes out of materials has a thickness, that thickness must be added to the size of the shape as its always wasted by the saw so the length and width increase by the saw_blade value.
def saw_part(length):
return length+saw_blade
One of my clients wanted L-shapes , so I solved this problem in two methods which I will detail later but for now deleted length and deleted width are used to know the dimensions of the L-shape.
Next we define a wrapper class that holds data about each item to be used later for printing.
class wrapper:
def __init__(self,
ID,
No,
length,
width,
remainder = False):
self.ID = ID
self.No = No
self.length = length
self.width = width
self.remainder = remainder
ID is the id of the shape to be inserted,def __init__(self,
ID,
No,
length,
width,
remainder = False):
self.ID = ID
self.No = No
self.length = length
self.width = width
self.remainder = remainder
No is the number of times the shape has to be inserted,
remainder holds the deleted length and deleted width for the L-shapes in a tuple.
for i in range(len(table_list)):
stone_list.append([])
if i == 0:
for j in range(len(table_list[i]['Linked'].values)):
stone_list[i].append(wrapper(table_list[i]['Part_number'][j],
table_list[i]['Sets'][j],
saw_part(table_list[i]['Length'][j]),
saw_part(table_list[i]['Width'][j]),
( table_list[i]['Deleted_length'][j],
table_list[i]['Deleted_width'][j] ) ))
else:
for l in range(table_list[i]['Sets'][0]):
stone_list.append([])
Iter +=1
for j in range(len(table_list[i]['Linked'].values)):
stone_list[Iter].append(wrapper(table_list[i]['Part_number'][j],
1,
saw_part(table_list[i]['Length'][j]), saw_part(table_list[i]['Width'][j]),
( table_list[i]['Deleted_length'][j],
table_list[i]['Deleted_width'][j] ) ))
stone_list.append([])
if i == 0:
for j in range(len(table_list[i]['Linked'].values)):
stone_list[i].append(wrapper(table_list[i]['Part_number'][j],
table_list[i]['Sets'][j],
saw_part(table_list[i]['Length'][j]),
saw_part(table_list[i]['Width'][j]),
( table_list[i]['Deleted_length'][j],
table_list[i]['Deleted_width'][j] ) ))
else:
for l in range(table_list[i]['Sets'][0]):
stone_list.append([])
Iter +=1
for j in range(len(table_list[i]['Linked'].values)):
stone_list[Iter].append(wrapper(table_list[i]['Part_number'][j],
1,
saw_part(table_list[i]['Length'][j]), saw_part(table_list[i]['Width'][j]),
( table_list[i]['Deleted_length'][j],
table_list[i]['Deleted_width'][j] ) ))
Iterating over the tables list,
0 is the id of the non linked group so we just put in the list as is with the number of times to be inserted,
if there is a linked group then we put the items together for the number of times they are to be inserted.
Next we get the containers’s list
container_list = []
table = pd.read_excel(File,sheet_name="Containers",index_col=False)
table['vol'] = table['Length'].mul(table['Width'])
table = table.sort_values(by=['vol'], ascending=True).reset_index(drop=True)
for i in range(len(table['Length'].values)) :
container_list.append(Node( (table['Length'][i], table['Width'][i] ), 0) )
table = pd.read_excel(File,sheet_name="Containers",index_col=False)
table['vol'] = table['Length'].mul(table['Width'])
table = table.sort_values(by=['vol'], ascending=True).reset_index(drop=True)
for i in range(len(table['Length'].values)) :
container_list.append(Node( (table['Length'][i], table['Width'][i] ), 0) )
Now we go to the algorithm itself,
In another post
In another post
Comments
Post a Comment