Deep Dive #01 - How AI and humans learn how to count

Show notes

Tune in for the first episode of our brand-new podcast series Deep Dive into Applied Science for a discussion on how AI works and why it struggles so much with numbers. Abrar Ahmed (PhD researcher at Trier University of Applied Sciences) and I are joined by Prof. Dr. Daniel Ansari of Western University (Canada) to see what pre-trained models have to do with how humans learn numerics.

Show transcript

00:00:00: Welcome to DeepDive into Applied Science.

00:00:03: This is our very first episode and we're here with Abra.

00:00:07: Hi, my name is Abra Ahmed.

00:00:09: I'm currently doing my PhD here at the Hochschule Trier

00:00:12: in collaboration also with the University of Trier.

00:00:16: I'm doing my PhD in the area of applied statistics and data science,

00:00:21: working with large language models as the buzzwords both.

00:00:29: Large language models or AI are everywhere.

00:00:32: We use AI chatbots for homework or business reports,

00:00:35: for pattern recognition and medical research

00:00:38: or even to make animals talk,

00:00:40: like in a recently opened museum in Cambridge in the UK.

00:00:44: In this episode we dive into how large language models work and don't work,

00:00:49: why they struggle with learning numbers

00:00:52: and why AI differs from humans doing the same.

00:00:56: I am Regina Zielecke and this is DeepDive into Applied Science.

00:01:01: The idea of language models is not actually a new concept,

00:01:05: but it was actually a very old concept.

00:01:07: I think it was during the 1960s that the first language model was created.

00:01:11: I think it was called ELISA, where the idea was like,

00:01:15: it had certain built-in answers and you give it a question

00:01:19: and it would give the answers.

00:01:21: And then, like at one point you could see that it has become illogical.

00:01:24: But then in modern times, new neural networks came into development

00:01:28: which could be applied,

00:01:30: neural networks which could base their answers on previous words.

00:01:36: For example, you have a sentence, you have a word,

00:01:39: and then it could predict the next word.

00:01:41: But the revolution kind of came, I think in 2017,

00:01:45: when Google developed this paper, which is called "Attention is All You Need",

00:01:50: where they gave the idea of something called a transformer,

00:01:55: which used multiple of these so-called neural networks.

00:01:59: And the way it works is, if I have to explain it in a very simple way,

00:02:04: you know you have these toys where when you touch on one side

00:02:08: you have a pattern on the other side.

00:02:10: So imagine that, but you put a machine in the middle.

00:02:14: So you put input from one side, it goes through a lot of steps within,

00:02:19: and it finds out similar patterns from that side to this side.

00:02:23: And then it gives an output on the other side.

00:02:25: So these large language models are trained on huge amount of text.

00:02:29: Like you have books, web pages and everything.

00:02:33: And given that they're learning on so many big amount of text,

00:02:37: these texts are first, as you said, tokenized.

00:02:41: By the term tokenized means it is broken down into different bits or small bits.

00:02:46: And then these are formed into embeddings.

00:02:50: By embeddings means these small tokens or these small bits

00:02:53: are formed into some numerical structure,

00:02:56: and they are put in some sort of space,

00:03:00: or basic way of saying it, let's say you're putting it in a big chart

00:03:04: with numbers on it.

00:03:05: Like a coordination system.

00:03:06: Exactly, in a coordination system.

00:03:08: And then you have words which are similar to one another.

00:03:11: For example, if you say Shakespeare and book,

00:03:14: they have a close connection to one another,

00:03:16: because Shakespeare wrote a lot of books.

00:03:18: But then again, you have the term book and a worm.

00:03:21: In general sense, they might not seem similar,

00:03:24: but there is a term called a book worm.

00:03:26: Hence you will see them together.

00:03:28: So this is how different in many ways the connections are created between them.

00:03:33: And in this, let's say, multi-dimensional space,

00:03:36: they have these connections created because of these embeddings.

00:03:39: And given that, because of these embeddings,

00:03:42: you have these different probabilities which are created.

00:03:45: So for example, if you ask, okay, who was the Queen of England,

00:03:51: then it's going to give, based on what it can predict,

00:03:54: it's going to say, okay, the Queen of England was Elizabeth the Third, and so on.

00:03:58: So it works kind of like that.

00:04:00: Yeah, so basically, the probabilities are always dependent on the words that come before.

00:04:08: So it takes what it has seen in the structure before,

00:04:12: and then goes through the masses of data it has seen

00:04:15: and tries to predict the most probable next word.

00:04:18: Exactly, exactly.

00:04:19: So really like improv, where we just switch around and you complete my senses.

00:04:27: Yes, exactly.

00:04:29: Let's recap.

00:04:31: A large language model trains on large amounts of natural language data,

00:04:35: but a computer needs numbers to calculate and do stuff.

00:04:39: So the individual words that are in the original data

00:04:43: need to be tokenized and transferred first into numerical values.

00:04:48: The abstract tokens are then embedded in the model

00:04:51: based on their sequential probabilities in the data,

00:04:53: and this is how the patterns or neural networks are identified.

00:04:58: And because language represents a view on the world and its meaning,

00:05:03: these patterns represent applicable knowledge or, as we prefer to call it, intelligence.

00:05:09: But while chatbots can answer questions in full sentences that are seemingly meaningful,

00:05:15: this form of intelligence is somewhat limited.

00:05:19: They are only an improv partner, as Helen Turner already called it in Time Magazine in 2023,

00:05:25: on autocomplete unsteroids, as NYU professor Meredith Gary Marcus called it in the "Washing Post".

00:05:32: So let's take a closer look at those limitations.

00:05:36: One is that it can't really, unless trained to do so, it won't acknowledge user intent,

00:05:44: so it won't know what the human person interacting with it actually wants.

00:05:49: Yeah, yeah.

00:05:50: So, yeah, that's the thing.

00:05:52: These large language models are trained on like massive amounts of data,

00:05:55: so you don't know, like, what it can give out.

00:06:00: So there are many ways to actually tackle these limitations, for example.

00:06:04: So I saw this example a few days back, was like, if you ask GPT-1, I think, like, "What is Belgium?"

00:06:14: Then it's going to say it's just a barren land and village.

00:06:17: And like, as time went on, it, like, if you ask, like, "Okay, what is Belgium?"

00:06:22: It's going to give, like, this beautiful paragraph, like, "Okay, it has many historical heritage," blah, blah, blah.

00:06:28: So the reason why I'm saying this is because it has no context as to what you want.

00:06:34: So one way of tackling it is, firstly, if you're using a chatbot, then you could give it prompts saying that,

00:06:41: "Okay, these are the specific things I need, so give that to me accordingly."

00:06:46: Let's say, "Okay, in bullet points, give me five points regarding the geographical location of Belgium,"

00:06:54: and it's going to give you that, or something like that.

00:06:57: Also, there's this certain thing called a fine-tuning.

00:07:01: So what a fine-tuning does is you have the complete model, but you tell it to do,

00:07:08: "Okay, I need it for these particular aspects," for example, someone who's working with medical data.

00:07:14: So you can say that, "Okay, I have these, these, these symptoms.

00:07:19: According to the medical thing that I taught you, you give me the outcome for this."

00:07:25: And then that whole thing will be based for just medical purposes, for example.

00:07:30: Or like recently, there was this thing that it passed a law exam.

00:07:35: So there's still those masses of data that are in the first step, pre-training,

00:07:39: and then in the fine-tuning you just have a little bit of data that you feed extra

00:07:44: that's sort of narrowing your purpose data.

00:07:47: Exactly, exactly.

00:07:49: Some of the other limitations are that obviously, I mean, you've said so in the beginning,

00:07:56: the output depends on the input.

00:07:58: If there's, if the data is harmful in some way, or bias as data usually is,

00:08:04: when it's collected by humans, so will be the output.

00:08:08: So the harmful characteristics of the training data will be given out.

00:08:13: And what I'm really, what I think is really funny to watch and has been quite popular

00:08:21: in social media as well is to watch those large language models to hallucinate.

00:08:27: Yes.

00:08:28: So would you go into that?

00:08:30: Yeah, so the idea of hallucination is it's producing certain information

00:08:34: which is not supposed to produce.

00:08:36: For example, it could be that you give it a very complex situation

00:08:41: and it was just going to produce some rubbish or gibberish or thing that you won't even understand.

00:08:46: And this used to be somewhat of a very common thing, and to a certain extent it still happens.

00:08:52: Like you could say a very weird question that probably is not even in existence,

00:08:58: like you say, "Okay, give me a complex number of poems or something like that."

00:09:02: It's, it just won't know what to produce.

00:09:05: But there is this thing called human interaction fine-tuning, let's say.

00:09:13: I forgot the complete term for it.

00:09:15: But what it does is, if it does produce an answer, it asks the human, like,

00:09:20: "Is this answer correct or not?"

00:09:22: And so it's going to ask, like, "Okay, is this correct?"

00:09:25: And then if the user says, "No, this is not correct," then it's going to produce a different answer.

00:09:30: And like, it's going to go on like this till it, let's say, produces a proper answer.

00:09:33: And the human says, "Yes, this is the correct answer."

00:09:36: And the next time it's going to be asked that, it's going to then give out that particular answer.

00:09:40: I mean, that sort of requires that the humans themselves know the answers.

00:09:45: So basically we transport our own limitations back to the model.

00:09:49: In a way, yes.

00:09:51: But then you have a lot of experts using it.

00:09:53: That's true. Thankfully, yes.

00:09:56: Hopefully.

00:09:57: One of those, well, I would classify this as a hallucination.

00:10:01: And I think we're getting closer to your topic here.

00:10:04: As one example posted on Twitter a couple of years ago.

00:10:12: So the input the user gave to GPT was Barbara, Mary and Harry had red balloons.

00:10:20: Ted had a blue balloon.

00:10:22: Sam also had a red balloon.

00:10:24: How many children had red balloons?

00:10:27: And GPT answers, Barbara, Mary, Harry and Sam had red balloons.

00:10:32: In total, there were five children with red balloons.

00:10:36: So what's going on here?

00:10:38: And how does this relate to your topic?

00:10:40: So it's still learning.

00:10:41: And I think like over, I think the last one month or so in terms of numerical or more logical conclusions,

00:10:49: it has become better.

00:10:51: But it is still kind of difficult for it to go into more in depth mathematical ideas.

00:10:58: For example, if you ask it like, okay, give me the value of pi till the 1000 decimal,

00:11:04: it probably won't be able to do it because it's a huge, huge, huge number.

00:11:09: And it's still not capable of performing mathematical, let's say, calculations which are very complex.

00:11:17: And how does that relate to your PhD?

00:11:20: Okay, so what I tried to do is like, okay, since we already have so much information embedded within a language model,

00:11:29: I thought that, okay, maybe we could utilize that information to perform some sort of regression analysis,

00:11:36: like predicting continuous values.

00:11:39: Because beforehand, there have been papers which have done it with, like, let's say binary classification.

00:11:46: So I thought that, okay, you know, we have different data sets which are quite complex.

00:11:52: For example, you have data sets which has tons of text and tons of numerical.

00:11:57: And if you're doing it with like the traditional statistical methods that we use,

00:12:02: you have to do like a lot of fine tuning, not fine tuning in this regard,

00:12:06: but rather you have to pre-process the data in different ways so that the model can understand it or do proper calculations.

00:12:12: Whereas when you're feeding these sort of data into the regression model for using an LLM,

00:12:18: you don't need to do all of this, you just feed it quite raw, basically.

00:12:23: So, and given that it has the ability to understand complex languages, like be it text or numbers,

00:12:31: it can understand that.

00:12:33: Then you ask for predictions.

00:12:35: This is where sometimes problem happens because since it's not completely capable of doing, like, difficult, let's say, calculations,

00:12:44: it kind of gives out some erratic results, let's say.

00:12:49: So my idea was in this regard that, okay, what if we change these numbers into a more text-related format?

00:12:57: A very simple idea was that, okay, maybe, you know, you have lots of continuous numbers,

00:13:02: you maybe break it into small bins, and then...

00:13:05: What are bins?

00:13:06: I mean, small categories.

00:13:08: So, for example, so let's say you have a particular variable which can take numbers from one to 100.

00:13:18: So we say that, okay, one to 50 could be low and 50 to 100 could be high.

00:13:23: And then instead of having like one to 50 and 50 to 100, we put in another information to the model saying that, okay,

00:13:30: so let's say 48 is low, 79 is high.

00:13:36: So you're giving it more information.

00:13:39: As a result, when it comes to predicting, it can do better prediction, per se.

00:13:44: Okay, so it doesn't need to know that 76 is one step higher than 78.

00:13:50: What did I say?

00:13:51: 75?

00:13:52: Yeah.

00:13:53: It doesn't need to know that specifically, but it just needs to know where on the...

00:13:59: Yeah, on the range.

00:14:00: Yeah, yeah, yeah.

00:14:01: Like the whole range.

00:14:02: So, I mean, I'm saying it in a very like layman terms, let's say, but like these categories could be even more.

00:14:09: So, for example, I could say, okay, one to 10, 11 to 20 and so on.

00:14:13: So you could make it even larger so that it understands better.

00:14:16: But I think like over time, it has learned to at least recognize numbers better to a certain extent.

00:14:23: How do you know that the performance gets better?

00:14:26: So we use different evaluation metrics when we're performing these regression analysis,

00:14:32: things such as mean squared error and mean average error.

00:14:38: So those are like standard statistics?

00:14:40: Yes, these are evaluation metrics.

00:14:43: And the lower they are, the better.

00:14:46: So when you first perform those, it's quite high.

00:14:49: But when I've used my methods, it has come down.

00:14:53: But this has come down.

00:14:56: This has come with, of course, some limitations.

00:15:00: When you're doing these regression methods using traditional methods, you have the power of interpretability.

00:15:08: Whereas using these models, you cannot really interpret why things are happening.

00:15:12: Yes, the model is improving.

00:15:13: You're getting better predictions, but you don't know why these things are happening.

00:15:16: So basically, we go back to the beginning of the limitations of large language models in general.

00:15:22: Yeah, but I mean, this is the case of any modern, let's say neural network problems.

00:15:28: So there's this fine line between, let's say, what we say, statistics and data science.

00:15:34: Whereas data scientists mostly try to predict and get the best model or prediction.

00:15:41: And us statisticians try to understand why these predictions happen.

00:15:46: So I mean, yeah, it's actually very trade off kind of like you might get the best model,

00:15:55: but you might not be able to say why these things are happening.

00:15:58: But when it comes to statistics, like, okay, you've got a good model, good, but justify it.

00:16:03: Why is it kind of like that?

00:16:06: Okay. And you're trying to combine these two in your...

00:16:09: Yeah, bringing basically both of them together, which isn't the easiest task.

00:16:13: Sounds like it.

00:16:16: [Music]

00:16:24: Lacking any sense of context on tension, even a model trained on good data can only produce data

00:16:30: that while statistically plausible may also be false.

00:16:34: Without improving or fine tuning the model to perform a specific task, it fails.

00:16:40: And in the case of handling something as complex as the numeric system, it gets overwhelmed and starts hallucinating, almost like humans.

00:16:50: The problem that Abra wants to solve in his PhD research is that on the one hand, more data doesn't actually make models work better.

00:16:59: In fact, less data regularly outperforms more data.

00:17:03: It's all about providing the amount of information that is just right, like dividing numbers into rough categories.

00:17:11: On the other hand, the numeric system in and of itself is so complex that fine tuning it requires understanding what these categories have to look like.

00:17:22: So maybe if we want to find the best way to make AI learn to handle numbers, we should take a look at how humans do that.

00:17:30: So for our next part, we invited developmentalist Daniel Ansari to dive into that.

00:17:36: Nice to be here. My name is Daniel Ansari. I'm a professor at Canada Research Chair of Psychology and Education at Wexton University in Canada.

00:17:46: So would you like to tell us a bit more about how this whole concept actually develops in humans? How do we understand numbers? How do we conceptualize them?

00:17:59: Yeah, so there's of course a variety of perspectives on this, but from a sort of cognitive neuroscience, cognitive psychology perspective,

00:18:10: I think we have wealth of data to suggest, as I alluded to earlier, that we share with other species basic intuitions of our quantity,

00:18:21: whether they are numeric,

00:18:23: And numerical in content is something that is still debated.

00:18:26: But some people argue that they are numerical in content.

00:18:29: I tend to think that is not the case.

00:18:31: But let's just assume they are numerical in content.

00:18:34: And it's been argued that we learn the symbolic representations

00:18:38: that we've invented by connecting them to the pre-existing

00:18:42: intuitive representations of quantity.

00:18:46: And that we go beyond essentially these more rudimentary

00:18:49: coarse representations.

00:18:51: So for example, there's work demonstrating that six-month-old infants,

00:18:56: Faishu and Lisbelkishev, this back in 2000,

00:18:59: six-month-old infants can discriminate between eight and 16 dots,

00:19:03: but they can't discriminate between eight and 12 dots.

00:19:06: So they have this sort of approximate sense of number.

00:19:09: And nowadays that's called the approximate number system.

00:19:12: And some people argue that this approximate number system

00:19:15: is what forms the representational basis of our symbolic math learning.

00:19:21: My position is a little bit different.

00:19:23: I argue that the approximate number system exists.

00:19:27: But symbolic numbers are something that's deeply colorful

00:19:30: and that's passed on through generations.

00:19:32: And we just need to look across the globe and see that there are cultures,

00:19:36: that remote cultures that don't have fully-fledged economies,

00:19:40: that don't have a fully-fledged symbolic number system.

00:19:43: They might have number words for one to three,

00:19:45: but after one to three they just refer to many, many more,

00:19:48: one hand, two hand, because they just don't deal with large numbers.

00:19:52: There's no necessity, there's no pressure on them to deal with large numbers.

00:19:56: It doesn't mean that they're not capable of learning it,

00:19:58: they just don't have the environments that foster that.

00:20:01: So, and some of our research also shows that once children learn number symbols,

00:20:06: it shapes the way they process quantities in the world around them.

00:20:09: It's almost like an attentional filter.

00:20:12: That's what we're still trying to figure out.

00:20:14: So, for now we know we did some longitudinal work with kids

00:20:17: between the ages of about five and seven.

00:20:20: And we tested them on various tests of dot discrimination,

00:20:23: but also symbol discrimination.

00:20:25: And you might assume that dot discrimination drives symbol discrimination.

00:20:30: That would be sort of the most intuitive way of thinking about it.

00:20:33: But it's actually the other way around.

00:20:35: Once you learn symbols, you could become better at dot discrimination.

00:20:40: So, we think and we don't have the data to really say this is ultimately

00:20:45: what's going to turn out to be true.

00:20:47: But my hypothesis is that when you start developing symbolic representations

00:20:52: that are extremely exact, that you view the world around you much more numerically.

00:20:57: So, when you look at a set, you don't just say it's a set of stones,

00:21:01: but you might say it's about 10 stones.

00:21:04: And that's because you have that symbolic reference that kind of guides your perception.

00:21:08: We know that knowledge in general, in cognitive science,

00:21:12: we know that knowledge influences the way we perceive the world.

00:21:15: Where our brains are predictive machines,

00:21:18: we're constantly making predictions on the basis of what we know and what we've learned.

00:21:22: And I think learning about numbers for children also gives them a way

00:21:25: to make predictions about things in the world around them.

00:21:28: So, first we have an approximate number system

00:21:33: that gives us basic intuitions about quantities

00:21:36: and the ability to make rough decisions that may or may not pan out.

00:21:41: And then we learn a more specific but also more abstract symbolic representation

00:21:46: in the form of numbers.

00:21:48: That helps us make more accurate predictions.

00:21:51: That does sound quite similar to what Abra is teaching AI.

00:21:55: Basic pre-training followed by a task specific but abstract fine-tuning.

00:22:00: For example, me as a statistician who works with a lot of numbers,

00:22:05: we have to predict a lot of stuff.

00:22:07: So, for example, you have certain conditions and you say,

00:22:10: "Okay, what would the output be in that regard?"

00:22:12: So, for example, in regression and stuff.

00:22:14: So, I thought that okay, I mean, since these large language models,

00:22:18: these are trained on like huge amount of data.

00:22:21: So, let's see how well these models work when it comes to doing the same thing.

00:22:25: So, when it came to kind of asking it to generate outputs

00:22:31: in terms of numeric values, they weren't doing quite well.

00:22:34: So, then I, of course, like, because computers don't learn the way that we humans do

00:22:40: or understand the way that we humans do.

00:22:43: So, I thought that okay, maybe I could specify it in a textual way.

00:22:46: Then maybe the outcomes could be a bit different.

00:22:49: So, what I did was that okay, when I'm teaching these models from the data that I already have,

00:22:54: I'm telling it okay in terms of numbers.

00:22:57: Let's say instead of going directly numerically, I would say that okay,

00:23:01: in forms of text, so for example, I would say that okay, the output is high, medium or low.

00:23:06: And then I would say that okay, let's say from this range to this range, it's low,

00:23:10: this range to this range, it's medium, and this range to this range, it's high.

00:23:13: As a result, it has a better stepping stone in terms of learning.

00:23:17: And when I did that, it kind of gave a much better prediction.

00:23:21: That's really interesting.

00:23:23: I have to confess that I don't know a lot about AI.

00:23:26: I mean, I'm like most people in this world just sort of dabbling a little bit with the large language models.

00:23:32: But I did read that they aren't particularly good at math.

00:23:38: It's interesting that you say that you sort of give them intuitions about quantity, right,

00:23:46: rather than precise number.

00:23:48: And I think that is something that we know from the research with both humans and non-human animals,

00:23:54: is that we, I think, have a sort of evolutionary grounding in some basic intuitions around quantity.

00:24:02: They're fairly imprecise, and maybe they form our basis of our more precise number abilities.

00:24:09: One question I would have for you, Abra, is, you know, I'm a developmental psychologist.

00:24:16: So I think about development.

00:24:20: Large language models don't really develop, do they?

00:24:23: They are already developed because they already have so much knowledge,

00:24:28: "knowledge."

00:24:30: So how do you -- this is always a problem when you're trying to simulate something in a computer environment,

00:24:35: whether that be a computational model or be a large language model,

00:24:39: is how do we actually, you know, if you really want to compare the two,

00:24:44: do you see any scope for building sort of development into that?

00:24:49: I mean, of course, there's always scope for development.

00:24:52: If you compare how these models were, let's say two years back,

00:24:56: so this idea of whole large language models came about, I think, in 2021, 2022,

00:25:03: when OpenAI created its first GPT model, and it was, I think, the GPT-2, which was available.

00:25:10: So if you use that, and if you use, like, let's say, the chat GPT right now,

00:25:16: it's like this huge difference, and the reason for that is, like,

00:25:20: so these models are trained on massive amounts of data, right?

00:25:24: And the more data it's been trained on, the more number of, let's say, what we call in science,

00:25:29: like, parameters are formed, and the more number of parameters that we have,

00:25:33: the better it is in terms of, like, conducting a work, let's say.

00:25:40: So for example, before, you could ask it certain questions,

00:25:43: and it would give an answer, but, like, midway through, it would get lost,

00:25:47: like what it's saying or what it's trying to say, but now it can literally give you, like, a huge essay, for example.

00:25:54: I guess my conceptualization of development, I don't just mean that it gets better and larger and bigger,

00:26:01: it's more, if we're trying to understand the parallels between humans and AI,

00:26:06: humans go through a process of development that is both physical and mental in nature, right?

00:26:13: And children learn by interacting with the environment and learn by interacting with others.

00:26:19: So the way, I guess what I'm trying to say is that the way that a large language model learns

00:26:25: is maybe not quite the same as a human child.

00:26:29: So then try to understand the parallels, like you and I are trying to understand the parallels,

00:26:34: there is a fundamental mismatch in the way that knowledge is acquired and knowledge is built.

00:26:40: Yeah, so basically creativity is one of the differences, and what you already mentioned,

00:26:45: AI is a closed system, the model that it feeds is closed in itself,

00:26:52: and you spoke of interaction, which is highly important for human development, right?

00:26:58: Exactly, exactly. Interaction with the child with the physical environment

00:27:03: and the child with the social environment in the domain of math, of course,

00:27:07: interaction with the physical environment is really important, right?

00:27:11: Through the idea in math development is that children start out with very concrete representations

00:27:18: of quantity and numerical value, and then gradually go away from concrete to more representational,

00:27:26: and finally to the abstract level, which of course is a level of sort of human cognition

00:27:32: that is perhaps unique to the human species and is something that is quite recent in cultural history.

00:27:39: 8,000 years ago we didn't have fully-fledged symbolic representations or a number sequence.

00:27:46: These are inventions, and you talked about inventions, you know, and how we've gone beyond that,

00:27:51: how mathematicians have even invented more complex things.

00:27:54: So those might be some interesting differences to consider.

00:28:00: The problem with computers, or not computers, but these models in general, is like,

00:28:05: when you're teaching them a model, so when you're teaching them numbers,

00:28:09: they don't take them as numbers, but just random symbols.

00:28:12: So when you tell them that okay, 2 plus 2 is equal to 4, for example,

00:28:17: so it's now embedded in its memory that okay, if someone provides me with data of like, what is 2 plus 2,

00:28:25: then according to my knowledge, probabilistically, this is the output that I should be giving.

00:28:31: Hence, it doesn't know what these numbers do, but rather it more gives us an idea of what it has already known.

00:28:38: And another issue is like, okay, I'm still working on it, but now like, it has gotten to a better situation,

00:28:46: I would say, so when you provide in certain variables, it tries to kind of logic with things.

00:28:53: Since it's learning these things more in, how do you say, symbolic way,

00:28:58: it still doesn't know what the real meaning of these values are.

00:29:01: It's just a probabilistic idea of what it could be.

00:29:04: So that's an issue.

00:29:07: Yeah, that, you know, I mean, the idea that we can use computers to understand human cognition is not a new one.

00:29:17: It's not unique to large language models, but it has a long history.

00:29:22: And I think that we're probably not going to be able to find big similarities between humans and machines.

00:29:32: But what machines have always been powerful for is simulating the way that humans learn.

00:29:38: And through those simulations and models of human learning, we can generate testable hypotheses about the mechanisms of learning.

00:29:47: And so, new deep learning models and all that kind of stuff, it tries to approximate the neural system a bit more.

00:29:54: So I wonder how large language models play into that, whether they play into that at all. I have no idea.

00:30:00: I mean, these large language models do play around with the idea of neural networks.

00:30:05: So large language models are a more, let's say, advanced version of these neural networks.

00:30:09: So we have a basis of transformers, which is a sort of deep learning method, but a more advanced one.

00:30:19: And the idea is, instead of deep learning models where you just put in an input and it gives an output, what it does, it takes in an input.

00:30:27: It gives an output and then it goes back into another input and then gives another output.

00:30:32: As a result, it parallelly does two things, which kind of makes it similar to how a brain works.

00:30:38: But in a much more vague sense.

00:30:41: When I try to summarize from what I've heard from you two, basically, we don't really, like, we will never have an AI that is learning in the same way as humans do.

00:30:55: But we can learn from AI about how humans learn in distinguishing the differences between both outcomes and the parameters that come into play.

00:31:08: I mean, so I would say that the initial stage of how children are being taught and how AI is being taught in general is the same.

00:31:17: So when a kid is born, they literally know nothing.

00:31:20: And then you give it information.

00:31:23: That's what's, let's say, feeding the AI with lots of data.

00:31:27: Then it goes through something called reinforcement learning.

00:31:30: So if a child touches something hot, you know, not to do it the next time around.

00:31:34: And it's similar to that of an AI model.

00:31:37: So we have this concept of rewarding.

00:31:39: So if it does something right, we give it a reward saying that, okay, you get one point if the answer is correct and if not.

00:31:44: And it learns in that way.

00:31:46: And of course, like, there's also this concept about these teaching these large language models that okay when it makes an error, like the humans can come and fix it.

00:31:56: Whereas it's kind of the same for the children as well.

00:31:58: So if they make a mistake, you say that, okay, no, this is how it's done.

00:32:01: This was wrong.

00:32:02: Yeah.

00:32:02: The one thing that I would disagree with you is that the humans, it's not true that humans don't know anything when they're born.

00:32:09: No, no.

00:32:09: We're born with quite sophisticated pieces of knowledge that we don't have to be taught.

00:32:15: You know, they range from knowing things about objects, surfaces, about depth, about height, and to some extent about quantity.

00:32:26: So I think we have a lot of priors in our neuronal systems.

00:32:31: And those priors may not exist when a large language model starts to learn.

00:32:35: A large language model is kind of a blank slate as far as I understand it.

00:32:39: When it starts up, while humans have priors that have been structured through the process of evolutionary adaptation and help to survive.

00:32:49: Right.

00:32:50: So I think there is a difference there.

00:32:52: And it's an important difference because these priors are influential in terms of what we learn and how we orient what we learn.

00:33:01: Yeah.

00:33:02: Of course.

00:33:03: It's already like a pre-trained model for the humans.

00:33:06: Like they already come built in with that.

00:33:08: But isn't this also maybe a very important difference that the large, like the large language models come with all the data at once, whereas learning for humans is incremental.

00:33:22: Yeah, that's what I was trying to say at the beginning, right?

00:33:24: That as a developmental psychologist, I always think about the mechanism of development and developmental changes being fundamental to the human experience and to human learning and cognition.

00:33:37: And that doesn't apply to children.

00:33:39: It applies to adults as well.

00:33:40: Unfortunately, I'm on the other curve now where things are going down, you know, where rather than sort of offering advantageous constraints now, you know, my knowledge and skills are declining and my ability to learn is declining.

00:33:53: But I'm still developing as such.

00:33:57: And that's maybe a constraint that isn't built into large language models.

00:34:01: That's where we get back to the point that large language models can help us simulate human cognition, but they do not resemble human cognition necessarily.

00:34:10: They might look like human cognition in their output, but that doesn't mean that the mechanisms that generated that output are the same.

00:34:17: Yeah.

00:34:18: One thing that I always tell people like, I mean, of course, there is some sort of fear mongering going on that AI might take over people's jobs and stuff.

00:34:26: But I say, no, I say the opposite.

00:34:28: It actually makes people's life easier.

00:34:31: So I would say in terms of reaching super human levels, of course, it's not there or even human levels, let's say it's not there.

00:34:39: But whatever it does, it does quite well.

00:34:41: We've got to embrace it to some extent, but I think embracing it critically.

00:34:45: I think a lot about it in the context of education, where a lot of people make grand claims about AI kind of replacing education and everything will be personalized to the student.

00:34:55: But returning to one of the topics we had before, you know, human interactions have certain characteristics to them that are never going to be there in a machine human interaction context.

00:35:08: Yeah. I mean, people also thought that the internet itself will replace learning and score.

00:35:15: So we've seen how that turned out.

00:35:18: Being there.

00:35:19: Yeah.

00:35:20: Thank you very much for joining us in this episode on AI.

00:35:25: I hope you enjoyed it.

00:35:27: I sure learned a lot.

00:35:29: Thank you very much.

00:35:30: Thank you as well.

00:35:31: Thank you.

00:35:32: Thank you.

00:35:33: Thank you.

00:35:34: It was a great chat.

00:35:35: Thanks for having me.

00:35:36: Thank you for listening in on deep dive into applied science.

00:35:42: I hope you enjoyed our first episode in this discussion on learning numerics from both perspectives of cognitive and AI research.

00:35:50: If you did, tune in for the next episodes, which will be released once a month from now on.

00:35:56: This podcast is produced by Trier University of Applied Sciences within the GERA project, Graduate International Research Opportunities, funded by the German Academic Exchange Service.

00:36:08: Editorial and production, Regina Zieleger.

00:36:11: Design, Céline Bonar, Marie Abramowicz, Bastian Franz.

00:36:16: Post production, Philipp Klin of Promo-Klin.

00:36:19: [Music]

00:36:24: [Music]

00:36:29: [Music]

00:36:36: [Music]

00:36:38: [Music]

00:36:40: [Music]

00:36:42: [Music]

Show notes

Show transcript

New comment