Optimize Food Bank Supply Chain

Abstract:


The United Nations' recent data shows that one in every nine people on the planet is not getting enough food to lead healthy and active lives. Ending hunger is the second of the UN's 17 goals for making the world a better place. The unfortunate reality is that we currently produce more than enough food to feed everyone, yet 35% of the edible food supply is wasted. This research aims to tackle this issue through a two-pronged approach. First, an intelligent system called "Foodle" is implemented to make the donation and distribution of food more efficient. The system consists of a mobile app and sensor-enabled donation basket, allowing anyone to take a picture of surplus food and place it in the basket for someone in need to pick up and find via the mobile app. Second, the research focuses on utilizing food banks in the United States as a means of redistributing excess food from commercial establishments to those in need. However, there are not enough food banks in each state to serve all food-insecure individuals. A machine learning algorithm is used to identify the best location for a new food bank in Kentucky, with the recommendation being Clay County to serve the most food-insecure individuals and make the most of excess food in the system. This method can be applied to identify new food bank locations anywhere in the US.

Food insecurity is the ​​condition of not having access to sufficient food, or food of adequate quality, to meet one's basic needs. According to the United Nation’s last estimate, 828 million people in the world are food insecure. Every 6 seconds there is 1 child who dies of hunger. The US, the most developed country in the world has 49 million people who are food insecure. 



Disparity: Food Waste

40% of food produced is wasted and this totals to a whopping  80 billion pounds of food waste per year.

And the sadder part is that It’s estimated that the world produces enough food to feed its populations 2.5 times over.

 It feels paradoxical and frustrating. Why cannot this excess food, the good edible food be just given to the food insecure people and solved once and for all? I dived deeper to understand what can be done to achieve this. 

Solution: Redistribute food before it is wasted

Now I will summarize the gap my research is filling. Food insecurity and food waste are on a rise in the United Stated. Only 5% of current research addresses food redistribution. The current research focused on making the food bank supply chain more efficient are very limited in it's scope. Additionally, no studies have suggested establishing new food banks near food waste sources, nor have they utilized machine learning techniques to optimize food bank locations based on food waste sources and food-insecure populations. My goal is to explore intelligent solutions to improve the efficiency of the food bank supply chain

Now I want to give a brief overview of the methods. So to do this, I will first gather and preprocess data on food waste sources. Then I will create a machine learning-based clustering model that will be able to create k clusters of all the food waste sources in a state where each cluster represents the jurisdiction of one food bank. Then I will test it by applying the algorithm to Kentucky to find a new food bank and see the effects the algorithm has.


Now I will go into the method justification. The justification for using the dataset I will be using is the fact that 62% of the food that a food bank receives is donated by non-commercial sources like farmers, manufacturers, processors, and retailers. These donations are usually food that would have been wasted if not donated. This data will be used in the research. Also, I was thinking of using other types of algorithms like Density-Based Spatial Clustering but as I was looking through the body of knowledge these other types of clustering algorithms weren't really used for clustering geographic data. However, the pattern I did notice is that many of these papers that involved clustering geographic data are that they made use of machine learning. in particular, one mentor source from Oleg  Moskvichev uses a machine learning-based clustering approach to finding the ideal number and locations for container storage and distribution units and then evaluates this model using the weighted distance. His method shows statistically significant results and thus this provides a premise for this sort of approach to be successful in my case. Also, I utilized another mentor source heavily for the evaluation of the model I created. This research utilized a case study to evaluate the efficiency of grouping customers of a retail store. Initially, I was planning on doing a case study for each and every state in the US and then finding some kind of way to find the average of the food waste averted and the food insecure population served but with the current resources I had, it was really difficult to do and I had to thus narrow the scope to Kentucky in particular. 

Now I will talk about data collection. To do that I first collected food waste data in Kentucky. I got the data from EPA and there are 15 different datasets showing where the most food waste was happening at the commercial level. 

The data is then preprocessed and cleaned. I did this in 3 main phases: the inspection, cleaning, and verification phases. The inspection phase consisted of finding the columns that were relevant and irrelevant and what keys could be used to combine the datasets. The cleaning phase consists of removing incorrect data, and geolocation to convert addresses to latitude and longitude using the google maps API.  There are the associate codes and I pasted and also I made all my code in this project publicly accessible on GitHub.

During the verification phase, it was found that the addresses were not compatible with the mapping libraries so then I had to convert each address to latitude and longitude points. Unfortunately, these points were still not compatible with one of the libraries I was using so I had to convert these latitudes and longitude points with respect to the cartesian coordinate plane using the haversine distance formula which essentially takes the earth's curvature to scale the lat and long points to the cartesian coordinate plane. These are some associated code snippets for this. So I did this using the google maps api and as I was dealing with thousands of data points, I called the api key one to many times and actually got charged $100 dollars but I was able to talk to the Google support team and explain my situation and was able to revoke the charges.

As mentioned, the goal is to divide the state into clusters of food waste sources. I used an unsupervised machine learning technique, called K-means clustering to achieve that. In K-Means, the number of clusters, "k", is predetermined. The algorithm then assigns each data point to one of the k clusters based on its similarity to the centroid of that cluster. Initially, the centroids are randomly assigned, and the algorithm iteratively updates the cluster assignments until centroid positions convergence.

Now I will apply k-means to my case of finding optimal locations of food banks. So each data point here represents the location of a food waste source like a grocery store for example with respect to the cartesian coordinate plane.  In this case, each cluster centroid will represent the location of a food bank and the cluster will represent the jurisdiction of that food bank. The value of k is always set to one more than the number of food banks currently in a state. Then the centroid with no neighboring food bank is determined as the new food bank.

To evaluate the clustering algorithm for a state, we make use of 2 metrics: the food waste averted by the food bank and the food insecure population served by the food bank. The food waste averted is calculated by summing up the food waste sources in a 50-mile radius around the food bank and the food insecure population served is calculated by summing the food insecure population around a 75-mile radius. I had to make 2 principal assumptions for this.

Assumptions:

The first assumption was the fact that people will receive food from food banks no farther than 50 km away.  This assumption is justified by research by Gruber et al. that found that the avarage distance between food banks and food pantries tends to be 50 km. So this is a justifiable assumption to make.
The second assumption was the fact that Food Banks are not willing to travel more than 75 km to collect edible food waste. And this assumption is justified by research by Bisong et al. that found that food sources typically do not travel farther than the nearest city for their supply which is approximatley 75 miles so this is also a fair assumption to make.

 So as mentioned previously, we are conducting a case study on Kentucky to evaluate our model. Kentucky was a good choice as it has only 5 food banks and 1 in 8 people face hunger and it is the 15th worst state in terms of food insecurity. So in order to fully understand the significance of the results in our case study, we need to figure out the current situation in Kentucky in terms of food waste, food insecurity, and food banks. 

This map shows all the food waste in Kentucky. The bigger the dots are, the bigger the culprits are in terms of the amount of food wasted. 

To see how current food banks are helping to rescue the excess food from these regions, I laid over the current food banks on the excess food locations in Kentucky. Currently, there are 5 food banks in Kentucky as seen by the black markers. 

Now I wanted to plot the food insecurity situation in Kentucky. I got food insecure population data from Feeding America after making a request to them as it’s not publicly available. I plotted the food insecurity data in Kentucky using a heatmap. Here the darker areas represent more food insecurity.

Now that I have established more of the context on the current food banking situation in Kentucky, I will now go through the results of the case study. The value of k is set to 6 as there are 5 food banks already and was trying to find the most optimal location for the additional food bank.


It is found from the algorithm that the most optimal location is in Clay County and by adding this one food bank, based on the procedure described earlier in the presentation, we can reduce food insecurity by 24% and food waste by 41% in Kentucky.

Future Direction:

An interactive website could be created for policymakers and other relevant stakeholders to have easy and efficient access to the algorithm.

Work on an intelligent system like an app that connects users with other users directly instead of an intemdiary being required to help account for the wasted food in the residential areas.

Create an algorithm to optimize the value of k using some existing methodologies like the elbow method.