Beyond Neural Networks: How Pattern Discovery Is Solving Cancer, COVID, and the World's Hardest Problems Without a Hypothesis
Guest: Mark Anderson, CEO at Pattern Computer
Hosts: Seth Earley, CEO at Earley Information Science
Chris Featherstone, Sr. Director of AI/Data Product/Program Management at Salesforce
Published on: March 2, 2022
In this episode, Seth Earley and Chris Featherstone sit down with Mark Anderson - CEO of Pattern Computer, winner of the 2021 Alexandra Jade Nobel Award for science discovery, and co-author of a landmark paper on the next generation of AI with 35 co-authors. Mark is a theoretical physicist who built the world's first whale museum, spent decades developing what he calls resonance theory in physics, and then built an entirely proprietary computing system designed for one purpose: finding patterns that no other system can find. The conversation covers why pattern discovery is fundamentally different from neural networks, why eliminating the hypothesis and starting from the data changes everything, how the approach has already identified combination drug treatments for triple negative breast cancer currently in their fifth round of positive testing, how a 98.5% accurate COVID test with no reagents emerged in three seconds from two drops of saliva, and what it means to build a technology powerful enough that you have to be very deliberate about who you will and will not work for.
Key Takeaways:
- Pattern discovery is not neural networks - it is a purpose-built system grounded in new mathematics that finds correlations in high-dimensional data that deep learning cannot find, with full explainability built in from the start rather than added as an afterthought.
- The fundamental shift is eliminating the hypothesis entirely: instead of starting with what you think might be true, you start with the data and a Y value - the outcome you are after - and let the discovery engine generate the hypotheses from the data itself.
- A dataset everyone in the world had already studied for 20 years - the Metabrick triple negative breast cancer dataset with roughly 1,000 women - yielded 17 novel combination drug candidates on the first run, two of which are now in their fifth round of positive testing at major research institutions including Berkeley Labs and Charles River.
- Pattern Computer's Starbright Project took on all five of the top cancer killers simultaneously; the colorectal cancer work at Fred Hutch Cancer Research Center used 155,000 patients with 39 million features per person and improved on existing state-of-the-art accuracy by moving results from the low sixties up toward 100% through clustering advances.
- The COVID detection system that emerged from hyperspectral work with Berkeley Labs delivers 98.5% balanced accuracy in three seconds per test for approximately two dollars with no reagents required - just two drops of saliva - and can be pushed to capture every positive at a slight reduction in specificity.
- The system is deliberately locked down rather than open-sourced because the team has already turned away a major bank that wanted to use it for market manipulation - the technology is powerful enough that controlling who uses it is treated as a core organizational responsibility.
- Clean data remains the price of admission regardless of how sophisticated the analytical system is - but when the data is good, pattern discovery consistently finds correlations in datasets that neural networks have already been applied to and pronounced exhausted.
Insightful Quotes:
"Today, when someone says AI, 99.5% of the time they mean neural networks. We don't. And they have some really big limitations. You don't have explainability. You don't make big discoveries in general with the tools that Google uses. You just don't. So instead of having a hypothesis, you start with the data itself." - Mark Anderson
"The Y value is not the hypothesis - it's what you're after. It's the solution you're after without knowing what hypothesis to make. So the way we got hypotheses was after we ran the data. We call the machine our discovery engine. It's a hypothesis generator. And it doesn't come from our bias or our teacher at Stanford - it comes from the data itself." - Mark Anderson
"I had complete faith in our own technology, but I also had rather complete faith in how screwed up the data world is in general and how quickly you can get screwed up by garbage in, garbage out. People give us sheets of paper on PDFs and say: here's our data. Modern companies. So the price of admission is still good data - and despite what a lot of people think, AI is not going to fix your bad data." - Mark Anderson
Tune in to hear Mark explain why Leonard DaVinci got flow right but missed interactions - and why Curtis Wong, Bill Gates' curator of the Codex Leicester, told Mark he had achieved what Leonardo aspired to - and why a nincompoop with a toolkit for pattern recognition can make major physics discoveries that the world's specialists missed.
Links:
Book: The Pattern Future: Finding the World’s Great Secrets and Predicting the Future Using Pattern Discovery https://www.amazon.com/gp/product/B07659RJGB/ref=dbs_a_def_rwt_bibl_vppi_i0
About Pattern Computer https://www.patterncomputer.com/
Paper: Learning from learning machines: a new generation of AI technology to meet the needs of science https://arxiv.org/abs/2111.13786
Contact Mark:
Thanks to our sponsors:
Podcast Transcript: New Science - Pattern Discovery, No-Hypothesis AI, and Solving the World's Hardest Problems
Transcript introduction
This transcript captures a conversation between Seth Earley, Chris Featherstone, and Mark Anderson covering the foundations of pattern discovery as a scientific discipline, why eliminating the hypothesis changes what problems become solvable, how Pattern Computer's proprietary system has already identified novel cancer treatments and built the world's most accurate rapid COVID test, and what it means to build a technology powerful enough to require deliberate ethical guardrails on who you will work with.
Transcript
Seth Earley: Welcome to today's podcast. I'm Seth Earley. And Chris, I know you've had a busy week. I'm out here in Las Vegas, my least favorite part of the world. But at least you're closer to your time zone now.
Chris Featherstone: And I'm Chris Featherstone. It's refreshing actually - welcome to the West, where you get to inherit all the things from the East as they're up earlier than you.
Seth Earley: The problem is I've been going to bed on West Coast time and getting up on East Coast time. That doesn't work very well for me.
Chris Featherstone: It also makes you feel really good about yourself when you spend time on the East Coast and then come to the West - you're like, I'm up early, early bird gets the worm, all those cliches.
Seth Earley: I switch time zones really fast. Anyway, let's get started. But before we do, I want to thank our sponsors: Simpler Media, the Marketing AI Institute, and Earley Information Science.
Our guest today is a scientist, an entrepreneur, and a winner of the prestigious Alexandra Jade Nobel Award for science discovery in 2021. He is the author - among 35 other prestigious authors - of a new paper titled "Learning from Learning Machines: A New Generation of AI." He is here today to challenge your fundamental notions about how science is done and what is next for AI. Please welcome the CEO of Pattern Computer, Mark Anderson.
Chris Featherstone: It's great to have you on, Mark. I really appreciate you spending the time with us. From our early interactions, everything I thought I had preconceived notions about really needs to be questioned and thrown against the wall. It truly feels like it is the collective work of your life to challenge those ideas and try to get to the truest form of: how do we actually go and get the truth? How do we make sense of it, and then how do we apply it to give us more abilities to gain knowledge and insights?
Mark Anderson: Thank you.
Seth Earley: I want to get the audience oriented around Mark. I know there is so much to talk about. Mark, maybe you could begin by telling us a little bit about yourself. I know you had a really great 2021. And I have been listening to your Google Talks, and I am up to chapter 15 in The Pattern Future - so I am just about to solve climate change.
Chris Featherstone: I am super excited to have him on. Go ahead, Mark. We will let you take the stage.
Mark Anderson: Well, I have always been a science guy since I was about five. I always knew I was going to be a science guy. I was disappointed when I finally figured out during my PhD program that I was not going to stay in formal academia - but I think I still am a science guy.
Seth Earley: I am a scientist at heart as well. I am a chemistry undergraduate and I still read science publications every week. Carry on.
Mark Anderson: I published three papers - physical review category papers, so physics theory papers - this year in about eight weeks. That was added to a family of three before, so I now have six in a series called Resonance Theory. I am still actively working in theoretical physics essentially on my own dime.
Seth Earley: I read a lot about your Resonance Theory and how it correlated with string theory and the subatomic resonances. It was fascinating and actually gave me a clearer understanding of some of the quantum physics that has always been beyond my grasp. Very refreshing.
Mark Anderson: I am happy to hear that. The summary of all the work in Resonance Theory is that the physical properties of otherwise empty space - space is not empty - lead directly to the laws of physics. You see everything through space itself and its physical nature instead of seeing all the billiard balls.
Seth Earley: And what was fascinating from your book was the whole idea of everything being a flow. I have always talked about information flows in the enterprise and looking at patterns from those flows. When we talk about information management, we talk about speeding up the information metabolism of the organization. So that work in complexity theory and pattern recognition really resonated with me.
Mark Anderson: Let me add to what you said. Everyone believes in flow, I think. The great discovery was not flows - which Leonardo himself actually discovered - it was flow and interaction as a pairing of complementary fundamental actions. That discovery is probably the most important I have ever made in my entire life. And that is what we were talking about earlier that Curtis Wong said: you have now achieved what Leonardo aspired to.
The idea that there were these two complementary things behind chaos or complexity mathematics - even the people at Santa Fe Institute lack a fundamental theory. Here it is. When Murray Cantor, the well-known IBM Fellow, was at our conference, his comment was: "Mark, this applies to everything." He said everything. Everything in the universe.
Chris Featherstone: Will you double-click on that comment that Curtis made? Because it should not be taken lightly - the work Curtis did, how it applies, and what Leonardo DaVinci was actually doing.
Mark Anderson: It took me 30 years to get there. Not of intense effort - it just took a long time. When I finally figured it out, I had the luxury of running the Future in Review Conference, so I was able to ask top speakers to come with flow and interaction in mind. Every one of them said yes, that is a perfect fit. Bill Janeway, Murray Cantor, a lot of very famous people. We did a whole day covering everything from politics and economics to the shape of leaves, the structure of the eye, complexity mathematics - all through flow and interaction.
One of the people there was the founder of the Dent Conference. He asked if I could come do this at Dent. I said sure. How much time do I have? He said eight minutes. So my EA and I trained to deliver the whole theory plus eight different fields in eight minutes. Curtis was there at Dent. At the speakers' wine event later I said, what did you think of that piece on flow and interaction? And he said: "Mark, you have achieved what Leonardo aspired to."
He is not just blowing smoke - he is a serious guy. Curtis is Bill Gates' curator for the Codex Leicester, which is a Leonardo manuscript. I went back and reread it. Here is what Leonardo was doing: he was standing on bridges looking at water, seeing the flow of a stream and finding a fundamental truth in it. He would compare it to a woman's hair and say - that is the same thing you are seeing in the stream.
Seth Earley: So he was seeing fractals.
Mark Anderson: We could go there too. What Leonardo was missing - and that is why Curtis said it the way he said it - was interactions. He got the flow part right, but he had the chance to get the other part and did not.
Chris Featherstone: Finding just the flow pattern, then the patterns on top of it. That is interesting. And at some level this does and will apply to artificial intelligence, machine learning, and how we get business insights out of that. But before we get to all of that - Seth, you were going to jump into some areas around Mark's background.
Seth Earley: Sure. I know from your book that from an early age you were building - as you say - your bio-computer, by being exposed to classical music and scientific experiments and mechanical problems. That set your mind up from a very early age to identify patterns and to take very diverse concepts and find relationships between them. You went on to get multiple degrees. Talk about that path toward pattern discovery and how it evolved in your career.
Mark Anderson: So I managed to get myself fired from a job where I had built the world's first research-based whale museum. They wanted to take all the money out of it, we had a deal not to do that, and they fired me. Suddenly I was on the street. I had been really bothered by something from my time at Stanford - they were teaching physics in a way that just did not make sense to me around conservation of energy. So I thought: here is a chance to come back to that.
What I did was really fun. I had a waterfront house and an eight-foot table, 150 feet of butcher paper, and all these colored pens. I started looking at all the force laws of physics with an eye toward finding the mathematical patterns in them. Instead of studying physics the way Stanford taught it, I took whatever I was good at in pattern recognition and applied that raw to the math in that science and saw what I got.
The beautiful thing compared to a computer is you have eight feet by three feet, and if you fold it twice you have 23 times as much area to look at. It was not that hard - and it worked. I was able to make a discovery right away that no one had made before. I reduced all these laws to a single mathematical formula, which I proved by putting it into a Commodore 64 program, displayed on a color TV.
I realized something right then: a nincompoop can use this toolkit of pattern recognition to make major discoveries in physics that had been unknown. And I thought - this is very interesting. There is something here I did not expect. It was not just that we all use pattern recognition every day in our brains, it was more than that. Now I call it pattern discovery, but I did not have a word for it at the time.
The short answer is: if you have a very broad funnel instead of being more and more specialized the way they teach you at Stanford - if you look very widely and apply a lot of pattern recognition skills to that - you are going to make major discoveries that even the world's experts did not make.
Later I started publishing SNS, a Strategic News Service, which has been going every week since 1995. The whole predicate was to make predictions - accurate, graded predictions in public - and turn it into a science rather than an op-ed piece. Take it seriously. That whole dismissive thing about not being able to predict the future? Yes, you can.
Chris Featherstone: You have made some pretty spot-on predictions. And with Pattern Computer and the way you think about these things - what data do you need to actually look at to prove the pattern works? And how does that go against hypothesis-based science?
Mark Anderson: Let me flip it. The whole point is: you do not start with a hypothesis. That is true at Pattern Computer and that is true in the science I am talking about. Instead, you do the opposite: you take the data first.
And today, when someone says AI, 99.5% of the time they mean neural networks. We do not. And neural networks have some really big limitations, including explainability. We have done explainable AI already. There is a paper we just co-authored with 35 co-authors about this. You do not have explainability with deep neural networks. You do not make big discoveries in general with the tools that Google uses. You just do not.
So instead of having a hypothesis, you start with the data itself. You can find out if there is missing data. We did this with an aerospace customer - we found latent variables, we told them roughly where they occurred and when, they went and found them, and we helped them improve their manufacturing quality for a tier-one aerospace company.
You do have to know what we call a Y value. You have to know what you are after. We had one fail - the only big fail I have ever seen us have. We had a dataset from a very famous geneticist, healthy young tech workers, and there was no Y value. There was nothing to go after. It was useless.
Here is a good example. We went after Metabrick, which is a publicly open-source dataset on triple negative breast cancer. Berkeley Labs has held it for about 20 years. Everybody in the world who cares had been through it. The average paper written about that dataset discussed one gene expression - one gene. Three papers we found talked about two or three genes. We did our very first run with no hypothesis. The Y value was: did women die or not die? That is what we wanted to know. So with that in hand, we immediately came up with what are now two different combination drug candidates - both of which are in their fifth round of positive testing at major institutions including Berkeley Labs and Charles River. Both of which killed 100% of the cancer cells in the only no-direct-treatment category in the world. We found 17 combination drugs picked out of roughly 50 to 100 gene expressions that had never been found before.
This is what Murray Cantor calls new mathematics. We gave him an NDA and he looked. It is not just a deep neural network. This is literally going back to the 1950s, picking up some lost threads, bringing them forward, building new mathematics to do this one job: pattern recognition and pattern discovery. So you see 50 to 100 new correlations of gene expressions, you see how they relate to each other and to the problem, and you look at the mechanism - then you take two pre-approved FDA drugs together to go after that unknown mechanism and see if you can kill the cancer.
Seth Earley: The inputs for that - genomics data, what else?
Mark Anderson: RNA-seq data on about 1,000 women. These are gene expressions. And the Y value was who died and who did not.
Seth Earley: Let me ask this question. How is the Y value different from a hypothesis? Is that not premised on a question?
Mark Anderson: There is no hypothesis. Look - in every company, until we showed up, anyone who wanted to go after this - from Merck on down - had a hypothesis. What if it is this gene? What if it is this enzyme? What if it is a lifestyle problem? One or two hypotheses. We did not do any of that. None, zero. The Y value is not the hypothesis - it is what you are after. It is the solution you are after without knowing what hypothesis to make. The way we got hypotheses was after we ran the data. We call the machine our discovery engine. It is a hypothesis generator. And it does not come from our bias or our teacher at Stanford - it comes from the data itself.
Chris Featherstone: So you are saying: the Y value is the outcome - people die or do not die. Now let's look backwards at the data and all the genetic markers and patterns to find the causal relationships. That is it.
Mark Anderson: Yes. Exactly.
Seth Earley: And the patterns are there. It is not a matter of opinion, it is not something you make up. It is what is in nature. The patterns are there and it is a matter of identifying them. Talk about that.
Mark Anderson: Sure. We made up a little elevator speech: who knew that the world was made of patterns? We all did. Who knew that you needed a pattern computer to understand them? We did. And the stunning thing is - when you look at our story - we built Pattern Computer originally at our Future in Review Conference on stage in 2015. The idea that no one had done it yet, that was an amazing opportunity for us. With all the work that had gone into computer design, chip design, mathematics, programming - with all of that by 2015, it had apparently not occurred to someone to build a pattern computer. When in fact all the problems around us, whether it is airplane manufacturing or how a leaf works, are all based on patterns. And our brains work in pattern recognition, I believe, because our brains evolved in a world made of nature. To be effective, we had to be good at that.
Seth Earley: We are pattern-making, pattern-identifying machines. When we start to learn as a child, we are looking for patterns. That is what is imprinting on us and building the circuitry to identify more patterns.
Mark Anderson: It is kind of surprising that no one had thought to build this until we built it.
Seth Earley: What is fundamentally different about how you built it?
Mark Anderson: Every part of the system is proprietary except for the chips. Even Linux got stripped down to a very thin layer and we rebuilt all the drivers for it. Everything above that is ours alone. We were in stealth mode for three years before our first coming-out event in San Francisco, which we called Splash One. We were not even called Pattern Computer - we were Coventry Computer. We worked very hard on mathematics and engineering. Very smart veterans out of Amazon and Microsoft in math and architecture. We built the whole architecture from scratch.
I had a hint this was going to work - not just from my other work, but I had written a paper about three years earlier called "The Most Important Chip Not Yet Invented." I called it the pattern recognition processor. Then I got a call from a guy who became a good friend, Dharmendra Modha, who was secretly the project lead on an eight-year DARPA-IBM project called True North. True North turned out to be the very first pattern recognition processor, and I had not heard of it. He invited me to help launch it at Almaden. When I saw the amount of worldwide effort that had gone into that chip, I realized - this is not just an idea. This is an idea whose time has come. And if IBM and DARPA had put roughly $200 million into this, who is going to build the computer?
Seth Earley: So you have addressed life sciences problems. You are looking at treatment of breast cancer. What other classes of problems are you currently focusing on? And I believe you mentioned a $1.2 billion valuation?
Mark Anderson: Yes, 1.2.
Seth Earley: What is the trajectory? Where do you see the biggest opportunities?
Mark Anderson: It is agnostic - you get to pick whatever field you want. It works as well in agriculture as it does in biology or markets or airplane design. We have to be careful what to pick because we are still a small team. We picked bio for a number of reasons. It is the hardest. It is the most competitive. It had the most venture money going into it. We love saving lives - that is a team unifier. And when you succeed, you do not have to beg for approval. If you kill the cancer cells, you killed the cancer cells.
We decided one good way to showcase how good the system is would be to take the top five cancer killers and move the needle on all of them. We call it the Starbright Project. It is not just breast cancer - it is also lung cancer, ovarian cancer, prostate cancer, and colorectal cancer.
For breast cancer we are in the fifth round of testing treatments. We are about to begin testing treatments for ovarian. For colorectal, we are working with the Fred Hutch Cancer Research Center. We have been working for about two years on the largest dataset in the world of that type - 155,000 patients worldwide, seven different consortia doing the collection, 39 million features per person. We were able to advance our own mathematics while doing this, particularly in clustering. We call it AM - advanced mathematics, not AI. We beat their existing state-of-the-art numbers - going from the low sixties up toward 70%, 80%, and 100% through the clustering work. That is a lot of lives saved if you get it right.
When you gain that knowledge based on gene expressions, what it frees you to do is diagnostics, treatments, and ultimately I hope prevention.
Seth Earley: And you are referring to patient data that includes environmental data, full genome, gene expression - so clinical, observational, and environmental data all together?
Mark Anderson: Yes. 39 million features per patient - very wide, including everything from environmental data to full genome and gene expression. And we are not done yet, but we have already done what I described primarily with the genomic data.
Seth Earley: And if someone listening worked for a large global manufacturer - an industrial company - what kinds of problems would be appropriate?
Mark Anderson: We like to say: give us your biggest problem. As an example, we are helping a mining company - iron miners who want to double their throughput. We told them no coal mining, iron is fine. With the data they were giving us, we very quickly found things about the machines they use that were wrong - not performing well, costing a lot of money. Very specific information about chemicals and types of ore and sizes of crushers that meant a lot of money to them and were not available through their current techniques.
In the aerospace example, a company was making very important equipment. You do not want it to fail when you are flying. All the components were passing tests individually, but the assembly was failing increasingly. We were able to find enough about that to help them turn it around.
Seth Earley: Are there any downsides, weaknesses, or flaws in the approach? What is it not appropriate for?
Mark Anderson: I do not know of any flaws. There are things it is better at or not so good at. We did not start out being good at computer vision. We got involved in it at the request of a defense firm. Here is what we did: we applied normal techniques including deep learning to computer vision and did what anybody would do. Then we improved it with our proprietary math. That is still normal territory. Then we did something beyond that - we took one of our discovery engines and applied it to the output. Now we had two different types of engines working together, which was really special. The second of which was a world-class performer at very high-dimensional reduction in very complex datasets without losing the original connection information. We call that Leonard Island - our first pattern discovery engine.
In one thing we did, we had to take a universe of 10 to the 40th correlations in a microbiome, drop it down to 39 proteins and then 9 proteins in about 10 days. Larry Smarr brought us that problem - he is the co-inventor of personalized medicine and has his own supercomputer. He could not do it.
Then we got into computer vision. We took a problem where, at that time about four years ago, you could not use x-rays for diagnosing pneumonia in children - they had to do biopsies, which was painful and not good. We found we could use x-rays with our system to identify that the child had pneumonia, where it was localized in the lung, and - more importantly - whether it was bacterial or viral. All from x-rays. If you are in Africa and you are many miles from a lab, that time delay can be life or death. So that was a meaningful thing.
Chris Featherstone: I hate that most truly great technology that gets out in the wild gets used for nefarious activities first before business figures out how to use it. How do you keep it locked up so it is not put in the hands of those who want to use it to do harm?
Mark Anderson: We lock it up. Murray Cantor came to one of our annual math retreats - we just had our latest in Berkeley - and he was so excited he said this has to be a paper. We said no, Murray, it is not a paper. By end of day he was saying it is a Springer book. We are a company. But the real reason is not just to honor our shareholders - it is exactly what you said, Chris.
I will give you an example. A major bank came to us after a private meeting with the DOE in Chicago. They wanted us to help them. They were asking about flows of this and flows of that. We worked for about four weeks thinking it over and then said: we are not interested in this project. We have the ability to control our destiny. We have done enough good things that we do not have to go to the dark side.
And here is an example on the good side: COVID. Berkeley Labs had said a couple of years ago that they thought our mathematics was a good fit for what is called hyperspectral work - broadband light. When COVID appeared, they said - take a look. We now have the best COVID detection system in the world for high throughput. We are monitoring 400 competitors. Three seconds per test, about two dollars, almost free, no reagents.
Seth Earley: What are the markers you are looking at for the COVID test? Is it an antigen test?
Mark Anderson: There is no hypothesis. There is a Y value, and the Y value is: COVID, or not COVID. Everything we did was measured against the PCR gold standard. Two drops of saliva, three seconds, 98.5% balanced accuracy. We can push it to 100% if we are willing to take a small hit on specificity, down to about 94%. So we can capture everyone going into a cruise liner or a baseball game.
Seth Earley: And there is a sensor on a chip essentially that is processing some signal from the saliva and then correlating that with the illness?
Mark Anderson: I appreciate the curiosity, Seth - but that would be getting into IP.
Chris Featherstone: We do not want to have attorneys jumping in and locking down the podcast to just the four of us listening. Let us move on. You mentioned processing power. We are hopefully getting more powerful processing over time. Do you need to get to the chip level and design your own, or are the off-the-shelf processors powerful enough?
Mark Anderson: We were hoping to use True North, but the Air Force Research Lab took it over so you cannot buy it. Moore's Law is not important to us. The idea of a pattern recognition processor is very interesting to us, and there are people going that direction now - even Google with TensorFlow, and others. There will be more and more chips at least trying to go to AI. Unfortunately for us, they are mostly trying to go to deep neural networks. That does not help us at all.
There are people like IBM who were trying to go toward the pattern recognition processor, and there will be more of those. We are not going to make chips - I do not want to make chips. But we have done a lot of things in software and mathematics to avoid pitfalls others have encountered.
When Larry Smarr was trying to do his work on the microbiome on a supercomputer, you can blow up the supercomputer memory stack because everything grows at n-squared using normal neural networks. We found a system that is only linear in that respect. That allows us to go faster, do more, do it better. We also found a way to run parallel systems very efficiently. When the Phi chip first came out, we called Intel - we had some deep context there - and told them we were running tests on it and could run it faster and with higher capacity across all the parallel pathways than they could. At every step we have tried to design this system to be more efficient and more focused on this one task.
Chris Featherstone: The neural nets are trying to take everything at the same time, designed for the optimal use case of applying whatever they've been trained on.
Mark Anderson: I see them as being incremental. The problem of Google advertising - I want to sell more shoes to girls between the ages of 12 and 18 - a 3% improvement is worth a lot of money to them. But that is not going to tell you what a girl fundamentally is, or why. It is just not very interesting from a discovery standpoint.
Seth Earley: When you start looking at these problems, you are looking at data and looking for patterns in data. A lot of organizations do not have their data house in order. What needs to be in place, and do you need things like ontologies or knowledge architectures or reference data to have a good foundation for what you are doing?
Mark Anderson: I will start by saying I am not the person best positioned to talk about this - but we have people who are. I was smart enough to recognize the difficulty of what you are describing as being the most dangerous thing in our path. Because I had complete faith in our own technology, but I also had rather complete faith in how screwed up the data world is in general and how quickly you can get screwed up by garbage in, garbage out. We have had people give us sheets of paper in PDFs and say: here is our data. Modern companies.
So to put it quickly: it is no secret to anyone in this business that you need to have good data and it has to be clean. Having said that, when that is true, we can do a lot. We have built really excellent ingestion engines that can take clean data - even if it has missing values - and fill them in if we understand from a subject matter expert how to fill them in. We can manipulate it wonderfully - move it around, run parts of it instead of all of it, compress it, run it while it is still compressed. We can do it in the cloud or in our own data center. But it has to be decent data.
Seth Earley: The price of admission is still good data. And despite what a lot of people think, AI is not going to fix your bad data. It can help in certain circumstances, but that is still the price of admission.
Mark Anderson: And what we have found - which I think is important for your audience who may be using neural networks - if you have gotten so far with whatever you are trying to do and hit a wall, those limitations do not have to be permanent. Here is an example: we published something called Three Things on our website under press releases. We had one of our guys - brilliant, really smart - take new existing science papers and improve on them. Scientists were using AI to do some kind of work, and he beat them. In one case it was the Wisconsin breast cancer dataset, well known to everyone. He beat their results, sometimes in about two hours. The data ingestion takes longer than the runs. The point is there is more there than you thought. You think you have hit the ceiling with a neural network. You have not.
Chris Featherstone: Are there areas where you would say the neural net is fine for that use case? Like NLP, speech recognition, call center sentiment analysis - that is pretty vanilla and maybe that is exactly right for that application?
Mark Anderson: I am not saying do not use them. I am just saying there is a particular use case where they shine. One way we express what we are good at: the closer you are to a complex, very high-dimensional dataset, the better we will do compared to normal approaches. If all you are trying to do is sell gym shoes, you probably do not need us.
Chris Featherstone: Unless you are trying to predict if rubber prices are going to go up.
Mark Anderson: Maybe - yeah, we could do that actually. We have done property values, why did certain cars sell in 1980, all kinds of fun demo projects.
Seth Earley: Great. Well, I love the approach, love the thinking, and unfortunately we are at the top of the hour. It has been a tremendous pleasure speaking with you today, Mark and Chris. Before I close, I want to remind everyone of our sponsors: Simpler Media, Earley Information Science, and the Marketing AI Institute. There will be show notes with URLs for Mark and his company. Thank you, Mark. This has been tremendous. And thank you, Chris.
Chris Featherstone: Thank you. Where can people find you quickly - tell people about your book and what you are doing next.
Mark Anderson: The book is The Pattern Future - it is on Amazon. And patterncomputer.com has the papers we talked about today. If you go to stratnews.com and futureinreview.com, you will see the other subjects we talked about and a lot of different things to read.
Seth Earley: We would love to have you back, Mark. We just scratched the surface in so many of these areas. This has been great. Thank you. Thanks, Chris. Thanks, everyone. See you next time.
Mark Anderson: Thank you.
