The Digital Supply Chain podcast

Deep Reinforcement Learning - What Is It, And What Are Its Uses In Supply Chain? A Chat with Pathmind CEO Chris Nicholson

July 02, 2021 Tom Raftery / Chris Nicholson Season 1 Episode 144
The Digital Supply Chain podcast
Deep Reinforcement Learning - What Is It, And What Are Its Uses In Supply Chain? A Chat with Pathmind CEO Chris Nicholson
Show Notes Transcript

A relatively new field of AI called Deep Reinforcement Learning is starting to open up and shows a lot of promise.

One company making use of Deep RL in the supply chain area is Pathmind. I invited the founder and CEO of Pathmind, Chris Nicholson to come on the podcast to explain what Deep RL is, why it is better than other forms of AI/ML, and how it can be used in the supply chain context.

We had an excellent conversation and, as is often the case, I learned loads, I hope you do too...

If you have any comments/suggestions or questions for the podcast - feel free to leave me a voice message over on my SpeakPipe page or just send it to me as a direct message on Twitter/LinkedIn. Audio messages will get played (unless you specifically ask me not to).

To learn more about how Industry 4.0 technologies can help your organisation read the 2020 global research study 'The Power of change from Industry 4.0 in manufacturing' (https://www.sap.com/cmp/dg/industry4-manufacturing/index.html)

And if you want to know more about any of SAP's Digital Supply Chain solutions, head on over to www.sap.com/digitalsupplychain and if you liked this show, please don't forget to rate and/or review it. It makes a big difference to help new people discover it. Thanks.

And remember, stay healthy, stay safe, stay sane!

Chris Nicholson:

You want two things you want to minimize the cost of carrying those extra goods, and you want to maximize the number of happy customers. if everybody had infinite safety stock, your customers would always be happy but your costs will be infinite. Right? So it's a trade off and that that optimization problem is finding the right point where you're meeting both of those objectives in a dynamic and highly complex scenario. So that that I think is one crucial supply chain use case that deep RL can be applied to.

Tom Raftery:

Good morning, good afternoon, or good evening wherever you are in the world. This is the digital supply chain podcast, the number one podcast focusing on the digitization of supply chain. And I'm your host, global vice president of SAP. Tom Raftery. Hi, everyone. Welcome to the digital supply chain podcast. My name is Tom Raftery with SAP and with me on the show today I have my special guest, Chris. Chris, welcome to the podcast. Would you like to introduce yourself?

Chris Nicholson:

Hi, Tom. Thanks for that. So my name is Chris Nicholson. I'm the founder and CEO of a startup called path mind. Path mind applies a type of AI called Deep reinforcement learning to problems in industrial control and supply chain. The interesting thing about deep reinforcement learning is I'll call that deep RL just to save some time. It's on the forefront of a lot of current research. And it's capable of making strategic decisions that can improve operations. So we all suffered through a lot of supply chain breakdowns in the last year or two. And we anticipate more on the horizon. This is a kind of AI that can help make many of those physical systems more resilient. And that's what makes it interesting now.

Tom Raftery:

Okay, I mean, we're familiar with AI. Theoretically, we've all heard about the way AI is helping in supply chain. How is this deep reinforcement learning or deep RL, as you call it? How is that different from the AI that we've all come to know and love? In the last few years?

Chris Nicholson:

It has had similarities with a lot of the machine learning that's been in the news for the last decade or so, because it uses deep neural networks, just like a lot of other machine learning does these days. But it does, does uses them in a different context. So a lot of the machine learning you will hear about is engaged in perceptive tasks. So things that you and I can do in under a second, like recognize an image. So I see your face I see that's Tom, you see me You say that's Chris. But that happens really quickly, snap, deep RL is often engaged in sequential decision making so so a path of decisions that takes place over time, or the later decisions depend on the earlier decisions you make. image recognition isn't like that. So you know, it's the sequential decisions, we can begin to solve strategic problems, right? So what do you do as you proceed into deeper into a chess game? Right? What do you do as you proceed deeper into, say, processing an item as it goes through a factory, or rooting that item across the world? So those sequential decisions actually open up a whole array of applications that are pretty interesting, and that we humans have a hard time doing well, all the time. And when you do them? Well, you're often considered smart. So these algorithms are helping us, I guess, be smarter and smarter or seem smarter, at least.

Tom Raftery:

Okay. We know, you know, from having listened to a lot of people on the podcast here, and from having worked with these things, that the machine learning algorithms and AI in general typically requires large data sets to train off. Is that similar for deep RL?

Chris Nicholson:

Yeah, it's it's hard to avoid. A lot of work is being done on that. But yeah, you do need data of one sort or another. Now, the interesting thing about the sequential sequential decision making same physical systems is that you don't have infinite data based on Infinite historical scenarios, you have a narrow corridor of historical data, what happened? Right. And that's, that's what you can train on if you're constrained to historical data. And as people who watch the stock market are fond of saying, the future is not the same as the past, new things always happen, right? So you can't necessarily predict all of the future simply by training on the past. And that's where deep reinforcement learning tends to rely on simulations. So a simulation is a virtual model of a physical system. It's, some people call them What if machines, you know, they're the you set them up to explore possibilities and in that simulation, that replica of the real world, you can they can variability, right, you can copy the physical system and then you can say, but what if demand varied by 50% over this period? How would The system respond, what were the bottlenecks be? Right? What if these machines failed? Then what would be my options for responding? So, simulations to be simulations have been crucial, for example, Agent based models have been crucial in responding to COVID. They were they were the heart of many governmental responses. And that's one reason why people shut the schools down is because they identified schools as a primary vector of transmission for the pandemics. So you can get these strategies that might be painful, but it might be unforeseen, right? But it can be highly effective. And and that might go against our gut instinct, right? So simulations allow algorithms to train on data, synthetic data that goes beyond the historical quarter. And

Tom Raftery:

very often, machine learning and AI are referred to as kind of black boxes, because it's very hard to know how an outcome has been achieved by an AI. Is that the same for deep RL as well?

Chris Nicholson:

Yeah. So with neural nets, including the neural nets embedded in deep RL is that they are what we call large parametric models, meaning they're just a bunch of numbers, or cubes in numbers called tensors. Right? They're available for us to stare at, but they're more machine readable than human readable. So then you get to the question of explanatory power, why do they decide what they do? Right? And it's not you can't say? Well, because there's a five there and seven there. Right? That's not really a good explanation. Right? So you come to the question of auditing, those models, right? That that arises a lot in fairness, talking about bias and fairness. And I'll do that. And by ordering, I mean, you look at the decisions they produce when you expose them to data. Right? And, and you see if those decisions seem to make sense, right? So there's a human in the loop, maybe not for every decision, but definitely for the the auditing stage with these deep RL models. With you see them their decisions play out in a simulation, one thing you're looking for, is, what is it doing? And does that does that make sense? In an old way? Does it make sense in a new way that I haven't thought of? Is this plausible enough for me to actually implement in the real world? Right? So there's those human audits? Are we for the humans to bring explanatory power to the algorithms to be because they're developing a new theory of the world? Based on what the algorithm may surface for them?

Tom Raftery:

Yeah, cuz I was that was going to be my next question, in fact, is how do you? How do you double check that their output is a good one? I mean, for for want of a better way of putting it?

Chris Nicholson:

Yeah. Yeah. Before we go there, I just want to say, people overestimate the explanatory power that human experts have. So there's, there's a bit of a, I don't want, I think it should be audited. But there's a bit of a double standard being applied, because a lot of human experts should say some can explain very well. And others find plausible explanations post facto, right? Which is, which is all fine. That's just what we do as a culture, right? But, but the demand of these algorithms explained very precisely how they do what they do, is really a sign that people are less certain about them. Right? It's really a sign that they need to build trust. I think the best way to build trust actually, is to combine these decision making algorithms with domain experts, right? So that the two of them a machine, that's kind of an extension of our thought, and our senses, and and a person that can handle context and knows the history and can kind of cut through ambiguity, those two things need to combine, right, the most powerful configuration for us right now. And it's, of course, we can see this because I'm speaking to you, across many miles, right through undersea cables, the most powerful combination is humans plus machines. So I don't propose that the algorithms behave, act all by themselves. Right? I do think that they can augment us and we can augment them in ways that will make kind of a fusion that is more powerful.

Tom Raftery:

Okay. You will need though, to have people who were who are capable of seeding a certain amount of trust to the the algorithms though, which you know, for for some can be challenging.

Chris Nicholson:

Well, you know, trust, Rome was not built in a day. Trust is not one day. But there are lots of ways to get these algorithms up and running, to show you their results where you're not actually controlling the machines. They might just be suggesting things. So decision support is a typical structure, or you start presenting the algorithms, decisions or suggestions as optional, right? And you start getting habituating people to seeing those decision suggestions. And kind of seeing how things play out. Without the suggestions, maybe try them some time, it depends on how high stakes the system is. But the longer people are exposed to decisions that they see might make sense, right? Or that they see cause better results, the more they're going to trust, right? And then, and then you what you really need to build over the long term are systems that ensure that the algorithms continue to train off of recent data, right? and adapt to the world, right? So if, if you imagine, first of all that the algorithms are useful, and you just have to keep them, they have to keep being used. So you have to keep feeding them. And you have to keep them in touch with the world.

Tom Raftery:

And can you speak to some kind of supply chain examples given that this is the digital supply chain podcast? Where in kind of supply chains, could this kind of thing be used? And, you know, when when we speak of supply chain on this podcast, we typically, you know, go every, all aspects of it everything from the planning and engineering of the manufacturer of something through to the manufacturer of it through to the logistics and delivery of it through to the operations of it through to back to recycling, and etc. Back to stage one? Again?

Chris Nicholson:

That's a good question. So I think one of the primary one of the primary use cases in supply chain is multi Echelon inventory optimization. So meaning every every supply chain we're talking about is multi Echelon, there's just many nodes in the graph. Those supply chains involve many components kind of coming together to form fully assembled items, most of the time, that go from factories on one continent to retail and distribution centers on another continent. And those supply chains in order to be resilient, cannot run too lean, right? People have made a big deal about Lean. For the sake of efficiency, that's great. But if you don't have a buffer, you're going to you're going to hit a stock out. And that's going to make some people are unhappy, right? Yep. So the question is, what is the size of the buffer you need? What are the safety stocks that you need? Right on and getting those levels, right is an optimization problem. So it's an optimization problem that people have been working on for many years, there's some great solutions out there. Not all of them are very dynamic. One of the advantages of deep reinforcement learning is that it's very dynamic, it can respond to a lot of variability in the data and supply chains are obviously subject to a ton of very believable supply, and demand and breakdowns in the network itself. Deep reinforcement learning in this in this context that we've discussed, can train on that variability, I can find the best ways to allocate resources in order to maximize the efficacy and efficiency of that network at the same time. So what you want two things, you want to minimize the cost of carrying those extra goods, right, and you want to maximize the number of happy customers, if everybody had infinite, our safety stock, your customers would always be happy, but your cost would be infinite. Right? So so it's a it's a trade off, and that, that that optimization problem is finding the right point where you're meeting both of those objectives in a dynamic and highly complex scenario. So that that I think is one crucial supply chain use case that deep RL can be applied to,

Tom Raftery:

okay, and the others

Chris Nicholson:

routing, so fleet routing, you could call it a traveling salesman problem, you know, on the scale of an army. So fleet routing, obviously, is something that most large organizations confront when they're delivering things. And that too, is an optimization problem where you care about speedy delivery, minimizing carbon emissions distance traveled, all the costs associated with it. Fleet routing is another is another problem that this can be applied to, in inside the walls, right? If we just pretend for a moment that kind of intra logistics or industrial control is really a part of supply chain. Most scheduling problems processing those items through machines even put away problems, stocking them in a warehouse. Those are coordination problems that deep RL can apply to and where we have, where we're working with partners to show how that can be done.

Tom Raftery:

And how how does your solution how does it work with existing systems because you know, if you're thinking of your last example, working within a warehouse, most organizations would have some kind of warehouse management system already in place. How do you fit in with that.

Chris Nicholson:

So a lot of the data is contained in sis europei systems like SAP, it's not hard to get that data out. So that historical data, we can train on the for the domain expertise, we will talk to people operating those physical systems and where it's necessary. And we'll work with people to create the virtual models, the simulation of the physical systems so that we can train on greater variability and offer predictive resilience. And then, that is the decisions as to scheduling decisions, right, are just like the scheduling decisions that are being made already by optimizers, like ibmc, Plex gurobi, or optimizers, out there that are already helping people make decisions, right, there are slots in the system that are open to the decisions being made. Right. And we would simply be another source of decisions or in many cases, decision support. So yes, they people work with all kinds of, you know, warehouse management software, other kinds of industrial control software. You know, it's easy to put something on a dashboard. Right.

Tom Raftery:

Fair enough. And you I mean, I know you are based in the US West Coast, but path mind, is it? You know, is it us West Coast? Is it all North America? Is it all of you know, North America, Europe, me, Africa, Asia, you know, what's your kind of market reach?

Chris Nicholson:

Yeah. So you're right, we're, we're headquartered in the Bay Area, that's been a How can I say, that's been a funny situation for the last year or year and a half, with with COVID. Right, being headquartered anywhere, I'm not sure that even makes sense. When everybody's working remote in here, you're under a lockdown. But we have, we have a team here, we have a lot of remote team members as well, in Europe, and Asia, and Latin America. So we see a lot of we see a lot of interest from Europe, particularly the manufacturing centers of Europe, like Germany, and in northern Italy, you see a lot of interest from Latin America, particularly natural resource intensive parts of Latin America. Because, you know, things like mining are obviously the start of supply chain, feeding right elements into the system. So, you know, if you want to talk about manufacturing centers, obviously, East Asia is a major one, right into the we're working with partners that are located on several continents, and many of them in themselves are global companies with intensely complex supply chains stretching across oceans.

Tom Raftery:

So Chris, we're coming towards the end of the podcast. Now, is

Chris Nicholson:

there any question I haven't asked you that you think I should cover? any topic we've not touched on that do you think people should be aware of so I've mentioned deep reinforcement learning a couple times, I think it's a term your, your readers and your listeners are going to want to understand that. And I can say a few words about it. But they should really do their own research as well, because they're going to be hearing it a lot more in the years to come, they're going to be hearing more because it's a kind of AI that applies to the problems they care about. So in May of this year, DeepMind. So alphabets main AI, Think Tank, DeepMind, released a paper saying reinforcement learning is probably the path to AGI artificial general intelligence, which is to say, to a certain super intelligence that can solve most problems better than humans. Right. So this kind of AI that I'm talking about, that solves all financial on inventory, optimization, could also be the path to super intelligence. Right? That's an extraordinary thing to be able to say, there's a lot of hype in the field. And people are rightly skeptical when they hear such claims. They've probably heard them many times before. But this is a very credible source, which is DeepMind. They're at the forefront of the field. They're sitting there making a very big claim. They're putting their reputation on the line. And I don't think they'd do it if they didn't believe it and had evidence for it now, so that that'll have repercussions in our society, way beyond the supply chain, and people and people should start thinking about that. So what they're saying is that reinforcement learning, so reinforcement learning, learns from rewards. If you've ever had a pet, and you've ever had to train the pet, to, to do things you want it to do, and not make a mess in the house, given it treats, and you probably swatted it on the button, right? That's reinforcement learning, positive and negative rewards for behavior. Right. So what the mind is saying is that intelligence itself is a response to positive and negative rewards. So intelligence behavior can emerge in a system, or you simply give it positive and positive and negative rewards this rewards and penalties just like you and I face in life itself, right? So we humans are evolutionary specimens that have evolved a certain on a certain intelligence, some of the time in response to an environment that is pretty clear in its rewards and penalties to us, or at least has been over many generations. So what deep mind is saying is those analogies apply very well, to how we can grow artificial how we can grow intelligence artificially in silicon, and AI? And I think so that's a, it's creepy. And it's exciting Skynet. It's exciting, right? And if people want to understand what that technology is, they can, especially in a domain domain, where they're already experts, they can see what we're doing. Right. And we can show them, Look, here's some strategy, some unforeseen strategies, or emergent behavior that our algorithms have learned in response to the systems they've built, right? That show them a new path through it. And you often you get these moments in, it sends a shiver down your back where you're like, wow, that was smart. And I didn't know. I didn't know that. That's what it should do.

Tom Raftery:

Very cool. Very cool. Fascinating. Chris, if people want to know more about yourself, Chris Nicholson, or about path mind, or deep RL or any of the other topics we chatted about today, where would you have me direct them, you know, they can go to our website, path

Chris Nicholson:

mind.com. We have a lot of examples, and demos there of different supply chain problems that we work on. We have a wiki that talks a lot about AI. It's like Wikipedia, but it just covers a lot of AI topics. So you can read about deep reinforcement learning on the path mind wiki. If you want to know more about me and my background, I've got a LinkedIn page. You know, they have a checkered history as a journalist, they can read all about that.

Tom Raftery:

Super, super curious. That's been great. Thanks a million for coming on the podcast today, Tom. Thanks for having. Okay, we've come to the end of the show. Thanks, everyone for listening. If you'd like to know more about digital supply chains, head on over to sa p.com slash digital supply chain or simply drop me an email to Tom raftery@sap.com. If you'd like to show, please don't forget to subscribe to it and your podcast application of choice to get new episodes as soon as they're published. Also, please don't forget to rate and review the podcast. It really does help new people to find the show. Thanks. Catch you all next time.