1 00:00:03,130 --> 00:00:08,770 [Auto-generated transcript. Edits may have been applied for clarity.] Thank you very much, all of you, for coming. I'm very, very happy, very pleased to, uh, to be introducing, uh, 2 00:00:09,070 --> 00:00:13,690 our speaker tonight for the second annual, uh, Voltaire Foundation Digital Enlightenment Studies lecture. 3 00:00:14,050 --> 00:00:17,290 Uh, this is something we started last year. I can't remember who the speaker was. 4 00:00:17,410 --> 00:00:20,530 Um, but, uh, I think it went well. Yeah. 5 00:00:21,520 --> 00:00:28,260 And that we hope to, uh, to keep going in years to come. And then we've we've coupled this with, uh, with a workshop on digital enlightenment studies. 6 00:00:28,270 --> 00:00:30,399 It's been really rewarding over the past few days. 7 00:00:30,400 --> 00:00:36,700 So it's really there's some momentum going and and our tonight's speaker is uh, is a big reason for that. 8 00:00:36,720 --> 00:00:45,700 And so I'm happy and very pleased to, uh, to, uh, to, uh, announce Michael Dolan and professor of digital humanities at the University of Helsinki. 9 00:00:46,060 --> 00:00:48,940 Uh, Mikko is, uh, is a trained intellectual historian. 10 00:00:48,940 --> 00:00:57,069 And I've always been very, uh, very impressed by the way that he juggles the sort of academic rigour, uh, of, uh, of what is in many fields, 11 00:00:57,070 --> 00:01:04,809 in many places, a very analogue, uh, field which is, uh, intellectual history, uh, with very serious computational explorations. 12 00:01:04,810 --> 00:01:11,500 Uh, and so I think, uh, he runs, uh, the computational history, uh, group there and the University of Helsinki who does amazing experiments. 13 00:01:11,500 --> 00:01:15,430 And I think he's going to talk about a few of them, if not more, uh, tonight. 14 00:01:15,430 --> 00:01:20,200 So we're very pleased to have him. He's going to speak to us about, uh, books as objects, data and meaning. 15 00:01:20,210 --> 00:01:31,200 Uh, thank you. Mikko. Uh, please give him a warm welcome. All right. 16 00:01:31,200 --> 00:01:36,929 Um, thank you very much, Glenn. Uh, and thank you, Nicholas and everybody at the Voltaire Foundation. 17 00:01:36,930 --> 00:01:40,259 This is really a great honour for me, me to do this. 18 00:01:40,260 --> 00:01:49,710 And I really, uh, love to be part of what we kind of started or you started, and we try to support that as best as we can. 19 00:01:50,340 --> 00:01:53,579 Um, let me say a few words about the title. 20 00:01:53,580 --> 00:02:03,180 So books is objects date on meaning? Uh, it aims to capture the kind of idea that that when we think about books, uh, and in the digital realm, 21 00:02:03,540 --> 00:02:09,659 uh, we want to see them to function on several different kinds of levels, which sometimes is forgotten. 22 00:02:09,660 --> 00:02:15,780 And, uh, first, for us, uh, the way that we approach this is that they are physical objects, so, 23 00:02:16,170 --> 00:02:21,900 so tangible artefacts that are shaped by production and circulation in many different ways. 24 00:02:22,440 --> 00:02:28,440 And at the same time, uh, they are data, which is what the analogue part, uh, wasn't always thinking. 25 00:02:28,440 --> 00:02:34,889 Uh, so the, the carriers or there's metadata that we can, um, mould and modify and uh, 26 00:02:34,890 --> 00:02:39,030 there's textual information that can be analysed computationally. 27 00:02:39,720 --> 00:02:46,320 And they are, of course, uh, vehicles of meaning, uh, which is the one that intellectual history always has been interested in. 28 00:02:46,650 --> 00:02:51,000 But the approach, uh, has been everything but, uh, computational. 29 00:02:51,510 --> 00:03:01,110 So, uh, cultural, uh, and intellectual expressions, uh, readers interpret and then reinterpret, uh, across different contexts. 30 00:03:01,230 --> 00:03:07,230 Uh, and that's what the thing that we are interested in is, is really about. 31 00:03:07,770 --> 00:03:11,370 And then let me emphasise this a computational approach. 32 00:03:11,370 --> 00:03:16,050 So, so we don't want to have let's try everything. 33 00:03:16,140 --> 00:03:26,850 Uh, I'll be talking how we, we sort of, uh, have had a strategy and a vision for, for some time now, already before I became a thing. 34 00:03:27,240 --> 00:03:33,600 Uh, but so, so that that's something, uh, hopefully something that you get out of this talk. 35 00:03:34,410 --> 00:03:39,840 So what I want to do is to kind of present this holistic approach, that what we mean, 36 00:03:39,870 --> 00:03:44,730 uh, by studying books as physical objects, data and vehicles of meaning. 37 00:03:45,240 --> 00:03:53,070 Uh, then I tried to offer some concrete, uh, data, uh, driven insights into enlightenment, print culture. 38 00:03:53,220 --> 00:03:59,430 Uh, so if also for for those who are maybe, uh, coming from the sort of traditional side, 39 00:03:59,460 --> 00:04:03,930 uh, I'm really that's the most important part that we are talking. 40 00:04:04,050 --> 00:04:10,800 We are not trying to do something different. We are expanding the context, maybe, and the ways of interpretation. 41 00:04:10,800 --> 00:04:16,440 But the continuum needs to be in the tradition of enlightenment studies and intellectual history. 42 00:04:16,440 --> 00:04:25,110 So all of that who are not doing computational, uh, just as important as those who are, uh, experimenting with different kinds of ways. 43 00:04:25,770 --> 00:04:31,110 Uh, but the point is that, uh, the methods they need to advance the scholarship also, 44 00:04:31,110 --> 00:04:39,390 so that we go in some ways beyond the traditional, uh, otherwise, there's no point, uh, for, for us to do anything in this room. 45 00:04:40,050 --> 00:04:45,780 And then I'll try to, uh, sketch a little bit, uh, where computational history might be heading and, uh, 46 00:04:45,780 --> 00:04:52,920 how I will shape, uh, what we call the digital enlightenment studies so that those are the three objectives. 47 00:04:54,210 --> 00:05:02,220 Now, uh, in the idea of, of working, uh, and giving a talk lecture, uh, 48 00:05:02,250 --> 00:05:08,069 in an intellectual history, uh, and enlightenment studies, usually you would have a handout, 49 00:05:08,070 --> 00:05:13,950 right, printed handout where you get a set of quotes, which kind of underlines that, 50 00:05:13,980 --> 00:05:20,640 uh, the work that we do, uh, in traditional way is somewhat anecdotal. 51 00:05:20,760 --> 00:05:23,970 So we are interpreting certain passages that are important. 52 00:05:24,780 --> 00:05:30,180 What I want to give now, uh, through this QR code and this is now an experiment a little bit. 53 00:05:30,540 --> 00:05:38,880 So, so that if you go there, uh, you will find a structure, you will find, uh, articles that embed my talk. 54 00:05:39,570 --> 00:05:48,990 So there is this, uh, read number or code and you can look at which article I'm referring to every now and then. 55 00:05:49,470 --> 00:05:54,330 You don't have to follow this, but it gives you kind of like a sense of where are we going. 56 00:05:54,330 --> 00:05:58,860 So don't feel that you're missing out necessarily if you don't want to do that. 57 00:05:59,130 --> 00:06:02,760 This gives you an opportunity for those who will want to be multitasking. 58 00:06:03,120 --> 00:06:09,150 Uh, so so I get I don't get angry when if you're looking at your phone here, uh, so so so so you can do that. 59 00:06:09,840 --> 00:06:15,780 But, but and and then another one is that if you're interested in what we do, you can go back, 60 00:06:15,810 --> 00:06:21,210 uh, after this lecture, look at the structure, uh, and then go through those articles. 61 00:06:21,930 --> 00:06:25,049 So, so most of them are available openly. 62 00:06:25,050 --> 00:06:30,330 So, so that's also the point. And you can go you don't have to use the QR code you can go to. 63 00:06:30,390 --> 00:06:34,230 To tiny. You are welcome. Uh, D.s. 64 00:06:34,590 --> 00:06:38,120 And hyphen Dolan in, uh. And you you can go there. 65 00:06:38,190 --> 00:06:43,259 So so that's that's one of the points. Uh, is there anything unclear about this? 66 00:06:43,260 --> 00:06:46,830 Because this is, uh, so so if someone is still wondering that. 67 00:06:47,010 --> 00:06:50,040 What what's going on? Uh, but but anyway. All right. 68 00:06:51,770 --> 00:06:59,210 Now, uh, and this is the structure. The this is how the also those, uh, articles are structured, uh, that I will follow today. 69 00:06:59,720 --> 00:07:03,980 So we will talk about what I call what we call bibliographic data science. 70 00:07:04,990 --> 00:07:09,319 Then we'll talk. I have two different kinds of. Examples. 71 00:07:09,320 --> 00:07:14,210 There will be more snapshots because I one hour is not enough for me to open up everything. 72 00:07:14,510 --> 00:07:20,270 And that's why you have the articles also. But I want to talk about canon formation in early modern Britain, 73 00:07:20,660 --> 00:07:26,330 a little bit about vernacular ization and what we have called democratisation of reading. 74 00:07:26,390 --> 00:07:33,350 Um, and then I will talk about reception history, uh, in different kinds of ways, uh, for enlightenment studies. 75 00:07:33,770 --> 00:07:43,640 And then I turn to this, uh, new type of place where we are now after enlightenment studies and a lot of the articles that, 76 00:07:43,670 --> 00:07:49,760 uh, we are working on that I'll be talking about through, for example, we do something called translation mining. 77 00:07:50,060 --> 00:07:57,050 Uh, that's something that hasn't yet been published. Uh, but I it's it's so good that I dare there speak about it. 78 00:07:59,430 --> 00:08:06,050 Well, let's go back. Uh, this is also a little bit of a personal journey and, uh, journey, but our, uh, research group. 79 00:08:06,060 --> 00:08:10,680 So let's go back to 2001. Here I am, uh, as a student of intellectual history. 80 00:08:11,040 --> 00:08:19,140 Uh, asking what is intellectual history? If you remember, back in 2001, that was the golden age of of this contextual reading. 81 00:08:19,170 --> 00:08:25,350 Uh, so Skinner is very, very, uh, he's still big, but even bigger in Helsinki especially, uh, 82 00:08:25,350 --> 00:08:31,620 and, uh, so, so a lot of the how to think about intellectual histories is about context. 83 00:08:32,310 --> 00:08:39,120 And what I was started to think already then this is me not being context, uh, computational at all at this stage. 84 00:08:39,540 --> 00:08:47,800 Uh, but what I was thinking, is that, okay, if we really want to do this contextual, don't we want to then really expand the day? 85 00:08:47,820 --> 00:08:54,149 I mean, what we are working with. We had to come here to Oxford Bodleian or uh, just to look at the books. 86 00:08:54,150 --> 00:08:57,430 If we're working on, uh, 18th century, this is the back in the day. 87 00:08:57,450 --> 00:09:01,710 I mean, uh, that's how it was. Another thing that was important for me. 88 00:09:01,740 --> 00:09:03,630 Uh, you see Mandeville and Hume there. 89 00:09:03,690 --> 00:09:12,270 Uh, so, so I'm a simple person coming from near Lapland, where the enlightenment, uh, people thought that, uh, not so bright people live. 90 00:09:12,420 --> 00:09:18,989 So, so I, I, I had a very, uh, when I was reading Mandeville and when I was reading Hume, I got very I mean, 91 00:09:18,990 --> 00:09:25,470 like that people at the time were reading Hume in a way that that is everything but Monday religion. 92 00:09:25,590 --> 00:09:35,010 Uh, and so for that it was just didn't I mean, for me, this kind of reception type of question that how how is, uh, ideas changing? 93 00:09:35,040 --> 00:09:40,620 It really stuck with me for, for a long time, and I haven't got rid of it anyway yet either. 94 00:09:41,940 --> 00:09:45,900 Another thing was that where do you get really the data into early 2000? 95 00:09:46,020 --> 00:09:54,000 And, uh, and this was the time we got one in a, in a trial in Helsinki for two weeks, and I wasn't sure. 96 00:09:54,240 --> 00:10:04,200 Is it going to go away? Uh, so for the two weeks, I sat in the library and I was able to get CD-ROMs and then download things. 97 00:10:04,680 --> 00:10:09,870 There was a limit 50 pages at the time. So I sat through the whole opening time. 98 00:10:09,870 --> 00:10:13,410 I should have brought the CD is I still have a big bunch of them. 99 00:10:13,740 --> 00:10:20,160 Uh, so so that that I just thinking that that that I had this so I can be able to look at the PDFs. 100 00:10:20,670 --> 00:10:23,579 Uh, so so that was the idea. The storage. 101 00:10:23,580 --> 00:10:32,550 I calculated it would have taken about 25 years to to download them at the at the time with that, that modem uh, time, uh, but didn't work. 102 00:10:33,120 --> 00:10:37,560 And of course, I mean, the I mean, if we think about the interfaces, uh, 103 00:10:37,740 --> 00:10:43,490 18th century book, as we all know, uh, survives and if you have it in good conditions, 104 00:10:43,500 --> 00:10:49,500 the CD-ROMs they break and we don't know how the cloud computing and, uh, 105 00:10:49,800 --> 00:10:55,320 supercomputers and so forth will survive in the end, but but, uh, that that's that's where we are. 106 00:10:56,430 --> 00:11:03,629 So then my steps, uh, from being a traditional intellectual historian and then being involved with, uh, 107 00:11:03,630 --> 00:11:08,760 at the great bunch of people from many different backgrounds, uh, our computational history group. 108 00:11:09,270 --> 00:11:18,170 So I really, I mean, that that. When I was doing my PhD, I was getting these ideas that, why don't we do this computationally? 109 00:11:18,590 --> 00:11:25,790 And and different kinds of ideas. Then when I finished my PhD, I started looking for a twin collaboration. 110 00:11:25,790 --> 00:11:31,490 So I called one of my friends who was a professor in medicine, asked that, do you have anybody in your group who would like to work with me? 111 00:11:31,850 --> 00:11:39,110 I have these ideas. That's what we started doing. Then we got, uh, it was two of us at the beginning, and then we had some funded projects. 112 00:11:40,040 --> 00:11:47,030 Some people came in, and then from 2014 onwards, uh, came this phase where intentionally, 113 00:11:47,330 --> 00:11:52,070 uh, took the agency to build a research group, which in the humanities, it's not. 114 00:11:52,340 --> 00:11:56,390 There are people who do it, but it's still a little bit different kind of undertaking, 115 00:11:56,810 --> 00:12:01,460 uh, than working just in the funded projects that we were, we were doing also. 116 00:12:02,060 --> 00:12:05,900 So, so that that's kind of kind of where how things were going. 117 00:12:07,310 --> 00:12:15,110 Now for me, Voltaire Foundation. So my book on Mandeville and Hume, Anatomy of Civil Society, that was published in 2013. 118 00:12:15,650 --> 00:12:23,540 And that's the kind of from there ten years to this point where digital enlightenment studies and we had an article there also. 119 00:12:23,780 --> 00:12:28,669 So, so that this period of time here was very formidable in many ways. 120 00:12:28,670 --> 00:12:38,090 And I feel like that there's a little bit of a kind of kind of overlap with, uh, how, uh, Voltaire Foundation is also, uh, transferring and so forth. 121 00:12:38,420 --> 00:12:42,709 This period when I had finished, uh, this book and was very serious. 122 00:12:42,710 --> 00:12:50,300 Okay. Now I'm going to do I'm going to start this computation. Seriously, people, uh, advising me, my supervisors, that you have a decent career. 123 00:12:50,360 --> 00:12:53,360 Uh, don't do it. Stop. Stop now. 124 00:12:53,370 --> 00:12:54,440 Uh, so. 125 00:12:54,560 --> 00:13:02,560 And, uh, but and then they said that, uh, well, if you are successful, that then that means that we also need to start doing this computationally. 126 00:13:02,570 --> 00:13:09,440 We don't want to do it. Uh, so so that but but anyway, here we are in Helsinki, our group. 127 00:13:09,530 --> 00:13:14,990 Uh, so it's formed so that we have data scientists, uh, and what is and, 128 00:13:15,010 --> 00:13:20,660 and linguists who look at the world and concepts in a different way than intellectual historians. 129 00:13:21,290 --> 00:13:27,019 And then we have a lot of elements of book history, uh, people interested in semantic change, 130 00:13:27,020 --> 00:13:30,260 uh, different kinds of historical processes and so forth. 131 00:13:30,920 --> 00:13:35,600 The what is not always so easy is that to understand that, for example, 132 00:13:35,600 --> 00:13:39,620 if you have data scientists, these are not people who come to fix your computer. 133 00:13:40,310 --> 00:13:43,370 You know, they they don't they are serious scientists, uh, 134 00:13:43,370 --> 00:13:53,140 who need to be thought of in that way so that we are doing work where the identity for everyone is so that you can retain it. 135 00:13:53,150 --> 00:13:56,719 And, uh, we are not like bioinformatics quite yet, so. 136 00:13:56,720 --> 00:14:02,990 So that to think about this, uh, and not, uh, kind of, uh, take advantage of anyone. 137 00:14:03,170 --> 00:14:09,350 So that's, that's the important part. But at the same time, we need to advance together. 138 00:14:09,380 --> 00:14:15,200 Uh, and, and also think that where we're going to get our next funding and and so forth and so forth. 139 00:14:15,200 --> 00:14:23,270 So, so there's a lot of, uh, different kinds of elements and this, this sort of, uh, collaboration that need to be taken, taken really seriously. 140 00:14:24,890 --> 00:14:29,480 Now our aims. This was done. This slide is from year 2017. 141 00:14:29,540 --> 00:14:37,550 Uh, so that what we are trying to do and the reason why I want to show this is that that we've had quite, uh, consistent. 142 00:14:37,940 --> 00:14:46,850 So the strategy that we started already then so understanding public communication and then also looking at the early modern Europe more broadly, 143 00:14:47,270 --> 00:14:53,770 movement of ideas, so clearly saying that metadata work, uh, looking at uh, 144 00:14:53,780 --> 00:14:59,340 genres, looking at intellectual traditions, uh, ancient texts and how they, uh. 145 00:15:00,360 --> 00:15:07,470 Preserved, or how their reception and so forth goes in the early modern period. 146 00:15:07,500 --> 00:15:14,130 Uh, and then looking at uh, also texture use and this coupled with conceptual change. 147 00:15:14,340 --> 00:15:20,130 Very much so the intellectual history underpinnings there. We haven't done nearly enough. 148 00:15:20,280 --> 00:15:24,089 Theoretical work in this line as we would like to uh, 149 00:15:24,090 --> 00:15:29,520 and it's something that also should be encouraged more and more in the enlightenment studies in general. 150 00:15:30,060 --> 00:15:37,020 Uh, but then doing these data releases, uh, doing tools, building tools for others. 151 00:15:37,050 --> 00:15:42,510 Uh, so, so all of this kind of going together was, was what we aimed at. 152 00:15:44,440 --> 00:15:48,010 All right, let me start now. Uh, that from this background. 153 00:15:48,400 --> 00:15:55,180 Go to this, uh, conceptual, uh, and practical level of talking about bibliographic data science. 154 00:15:55,180 --> 00:15:59,950 And this is now the first part where you can look at some of the articles, uh, as well, if you like. 155 00:16:01,150 --> 00:16:12,100 So 2015, uh, it was very clear to us, uh, and we were very enthusiastic that we can map ideas in different kinds of ways, 156 00:16:12,100 --> 00:16:20,170 looking at coming up with this, this network that that is great and bibliographic data and treated properly. 157 00:16:20,200 --> 00:16:27,100 Uh, is something that where we can take these steps and, uh, really look how how, for example, 158 00:16:27,490 --> 00:16:35,889 natural law tradition in some, some sense is, is uh, being, uh, published, uh, across Europe and um, 159 00:16:35,890 --> 00:16:41,950 and take that and combine different elements and move forward and also thinking 160 00:16:41,950 --> 00:16:48,249 about what is the reception of of Mandeville or Hume and this type of elements, 161 00:16:48,250 --> 00:16:52,540 where it was really clear that that this is a good way of, of thinking about it. 162 00:16:54,210 --> 00:17:03,690 And conceptually, uh, we started or I started reading quite a bit about what in bibliography, uh, and in book history, people have been doing. 163 00:17:04,080 --> 00:17:11,050 And there is this debate about potential use of library catalogues, uh, and decades of debate about that. 164 00:17:11,070 --> 00:17:14,910 Of course, the debate about bibliography and science that, uh, 165 00:17:14,910 --> 00:17:23,520 Tencel and Mackenzie and others were engaging it with was a little bit different than what we are now thinking when we are talking about data science. 166 00:17:23,880 --> 00:17:30,690 And then the questions that, in what way are we sure that you can actually use library catalogues to receive? 167 00:17:30,720 --> 00:17:39,660 I mean, to get, uh, decent interpretations of, of, of, uh, how ideas are formed and how they change and, 168 00:17:39,930 --> 00:17:46,290 and what can you do also with respect to book history, only relying on library catalogues? 169 00:17:46,680 --> 00:17:53,700 There's a lot of book historians who think that, uh, the archival, uh, evidence is the only way to, to actually move forward. 170 00:17:54,030 --> 00:18:02,939 We took the other way and, uh, so that you can still debate, uh, and these haven't been resolved in, in any sense. 171 00:18:02,940 --> 00:18:06,179 And it's obvious that, that you need different kinds of approaches. 172 00:18:06,180 --> 00:18:14,880 Of course. Uh, but our argument was that if we take this what we called bibliographic data science approach, 173 00:18:15,210 --> 00:18:19,980 you can actually use in a scientific way, uh, these catalogues. 174 00:18:21,620 --> 00:18:26,630 So for us, uh, the early on, uh, little update from University of Turkey. 175 00:18:26,870 --> 00:18:30,560 It was him. Him and me, we. I worked plenty with him early on. 176 00:18:30,620 --> 00:18:34,340 Uh, and this was kind of like a white paper for us. 177 00:18:35,000 --> 00:18:39,980 Bibliographic data science and the history of the book that was published eventually in 2018. 178 00:18:40,730 --> 00:18:48,500 Um, so if you want to read that, what what where the really the ideas that we had for for this kind of approach, 179 00:18:48,530 --> 00:18:56,960 uh, that is the article, uh, to look at, uh, much of it has to do with the aspects of harmonisation. 180 00:18:57,440 --> 00:19:07,340 So, uh, you if you just take any kind of metadata and think that now I'm going to, uh, get some results that are relevant. 181 00:19:07,700 --> 00:19:13,480 What happens often is that you end up playing with the data. So you are maybe just, um. 182 00:19:13,790 --> 00:19:23,359 Well, explorations can be good. Uh, but our argument was, and still is that that you need to harmonise, uh, data. 183 00:19:23,360 --> 00:19:31,460 There's many different ways of doing that, uh, in order to, uh, get the kind of results that are robust and, and what, 184 00:19:31,550 --> 00:19:40,490 uh, so the thesis, also the analogue traditional, uh, scholars, it mainly it's about the research questions, of course. 185 00:19:40,850 --> 00:19:48,320 So, so what you are asking, uh, is the relevant part, uh, and then what data can you use in order to, to answer those? 186 00:19:49,780 --> 00:20:01,300 In 2018. Uh, it was a little bit of a shift, uh, which was towards, uh, full data, uh, meaning that we have images and we have, uh, the text. 187 00:20:01,330 --> 00:20:07,479 So the echo for us, we came here at the centre, and the idea was very, 188 00:20:07,480 --> 00:20:12,730 very seriously that we are building this kind of knowledge infrastructure for enlightenment studies. 189 00:20:13,180 --> 00:20:18,280 And, and this is also the reason we are able to do, uh, different kinds of things. 190 00:20:19,190 --> 00:20:32,330 These days. Uh, because the combination of the SDC metadata, uh, that includes, for example, uh, having work level and additional level, 191 00:20:32,360 --> 00:20:39,410 uh, sorted, which when you want to do some kind of a text mining becomes a very, very crucial issue. 192 00:20:39,560 --> 00:20:48,320 So the combination between SDC and Agco and then combine of course, with other data sets like VCF, uh, 193 00:20:48,740 --> 00:20:57,860 if we want to look at the influence between French and, uh, English, uh, or British, uh, we need BNF, uh, we need garlic. 194 00:20:57,860 --> 00:21:01,490 Uh, we need information about the printers and publishers. 195 00:21:01,940 --> 00:21:06,560 Uh, and also what we really want is also the newspapers. 196 00:21:07,070 --> 00:21:15,800 So the idea that that if we want to study the public, uh, we need to also, why would we only look at books? 197 00:21:16,160 --> 00:21:24,620 Why don't we also look at the overlap between, uh, newspapers when we are talking about public discourse in the 18th century? 198 00:21:24,620 --> 00:21:31,370 And this was something that not that many people are actually yet even doing that cross catalogue thing. 199 00:21:31,400 --> 00:21:37,760 So, so for us to build from here onwards that we started really seriously in 2018, 200 00:21:38,120 --> 00:21:43,160 moving from the library catalogues to this kind of a knowledge infrastructure. 201 00:21:43,190 --> 00:21:50,750 Uh, that's also something that, uh, has been very, very beneficial for us, uh, throughout and will continue to be so. 202 00:21:52,110 --> 00:22:02,159 Now, one thing that, uh, bibliographic data science combined to this type of work means is in this one article, uh, a two, uh, 203 00:22:02,160 --> 00:22:13,830 anatomy of 18th century collections online, uh, where the idea was that we analyse, uh, what is in echo just based on the STC, which is the best. 204 00:22:14,310 --> 00:22:21,000 Uh, it's not it isn't perfect, but it's the best, uh, information that we have available. 205 00:22:21,360 --> 00:22:27,180 A lot of people want to use the use echo, but, uh, if you don't really know what is in there, 206 00:22:27,660 --> 00:22:31,140 uh, might be a little bit difficult to justify your results. 207 00:22:31,740 --> 00:22:43,680 So just one example. If you want to study pamphlets in an echo and there is this kind of drop, uh, of coverage compared to SDC. 208 00:22:44,040 --> 00:22:52,480 So going from 50, 60% to 30%, uh, I would at least want to be aware of this before I start that kind of work. 209 00:22:52,500 --> 00:22:56,430 And and this is what that, uh, combination enables. 210 00:22:56,850 --> 00:23:00,749 Also, the, uh, like these big gaps, some of them are known. 211 00:23:00,750 --> 00:23:07,380 So people know that there are gaps, but I wouldn't trust what what, for example, the glass company, because they haven't done this. 212 00:23:07,740 --> 00:23:14,969 Uh, so they don't know. So and, and and I think that this kind of it's like preliminary work for, 213 00:23:14,970 --> 00:23:19,680 for any type of, uh, research you want to do using, using this kind of material. 214 00:23:19,980 --> 00:23:25,950 So for us, this is the bibliographic data science, what we want to mean, uh, 215 00:23:26,460 --> 00:23:32,040 when before we start engaging with, with certain, uh, analysis, uh, that we are doing. 216 00:23:33,640 --> 00:23:41,770 Now, another part of the bibliographic data science is the relationship between unstructured and structured data. 217 00:23:42,550 --> 00:23:51,850 So the structured data meaning, for example, metadata from the catalogues, this kind of classic bibliographic data approach. 218 00:23:52,120 --> 00:23:57,009 It's great because it's rather systematic. And then you have unstructured data. 219 00:23:57,010 --> 00:24:04,480 Could be text image whatever, audio collections, anything uh, that can also be analysed in a systematic way. 220 00:24:05,260 --> 00:24:09,900 But it's not I mean, and I don't understand why people don't think about it, that okay, 221 00:24:09,910 --> 00:24:15,100 it might be good to do some analysis, but why would you not think about it at the same time? 222 00:24:15,100 --> 00:24:26,380 Creation of new structured data. So there's a lot of, uh, mark fields, for example, that are missing, uh, from the catalogue information. 223 00:24:26,830 --> 00:24:36,760 But if we can use the unstructured data, for example, to create, uh, genre information or, you know, uh, subject topics, whatever you want to call it, 224 00:24:37,240 --> 00:24:47,410 uh, that kind of a way of working together, uh, in a shared infrastructure, if possible, this then will, uh, make us go much further. 225 00:24:48,670 --> 00:24:58,480 Now, example of this is that, uh, we've started working more and more with the physical aspect and, uh, images, for example, from the echo. 226 00:24:59,320 --> 00:25:04,330 So already I was wondering one part that this comes I was doing it. 227 00:25:05,340 --> 00:25:11,520 By hand. Uh, so I was printing and cutting, uh, head pieces when I was doing my PhD. 228 00:25:12,080 --> 00:25:16,910 And then gluing them to the to the title. 229 00:25:16,920 --> 00:25:24,530 Uh, so sort of, uh, kind of looking at four variants, uh, of, of the different ornaments and. 230 00:25:25,410 --> 00:25:31,500 Okay. Why wouldn't you do this in an automated way and, uh, well, because you didn't have the data. 231 00:25:31,500 --> 00:25:39,600 You couldn't. So, so that this is something, uh, so, for example, my when I was looking for Mandeville, being obsessed, as I always was, that's. 232 00:25:39,600 --> 00:25:48,360 How can people say that Thomson wouldn't be his publisher when all of the everything you look at in the book points exactly that direction. 233 00:25:48,360 --> 00:25:55,170 So and so to me that the idea that that what can we actually do with these, uh, is clear. 234 00:25:55,260 --> 00:26:02,190 There were has, of course, been people who have been looking at printers ornaments, uh, for a long time, usually doing by hand. 235 00:26:02,640 --> 00:26:10,650 But there the idea has been that the, that the woodblock is owned by the, by the printer, and there might be some lending going on. 236 00:26:11,190 --> 00:26:18,930 Uh, but the interest for collecting the information was always just to look for that one tied to Boya or somebody, 237 00:26:19,380 --> 00:26:24,240 uh, famous, famous, uh, printer and, uh, not looking at the variants. 238 00:26:24,510 --> 00:26:29,280 So what we've done, uh, there's also, of course, other people who work with ornaments. 239 00:26:29,560 --> 00:26:38,520 Not saying that, but but there's been discussion whether the echo quality, which is quite poor, uh, is good enough for doing this and well, 240 00:26:38,520 --> 00:26:48,780 we been able to take the 700,000 head pieces from echo and then cluster them in this kind of super and subclasses where we get these variants and, 241 00:26:48,780 --> 00:26:57,209 and some of these are very interesting. So what is the intentionality of producing, uh, these where the eagle, uh, 242 00:26:57,210 --> 00:27:05,400 is looking inwards and some that are looking outward and they are printed in Ireland, the other ones and the other ones in London. 243 00:27:05,730 --> 00:27:13,020 And, and so there's all kinds of it becomes like a big puzzle piece that, that we can then start mapping in different types of ways. 244 00:27:13,680 --> 00:27:20,130 And uh, that combined to better printer, uh, metadata information, publisher information. 245 00:27:20,400 --> 00:27:27,270 There's also, of course, a lot of thought that you can do looking at the type, uh, looking at the fonts, uh, and, and so forth. 246 00:27:27,690 --> 00:27:33,509 But a great way of going here so that we are combining the unstructured and structured. 247 00:27:33,510 --> 00:27:39,510 So, so with the intention that, that we, we get, uh, better data sets and that the project doesn't end here. 248 00:27:39,990 --> 00:27:44,879 It somebody will when we are able to offer it to others, the the work continues. 249 00:27:44,880 --> 00:27:48,980 Uh, and, and we move forward. Also. 250 00:27:48,980 --> 00:27:54,500 Uh, so. So then you can really I mean, that, uh, if you look at these two, two books here, 251 00:27:54,890 --> 00:27:59,870 uh, one should be said set to be published by Roberts and the other one by Thomson. 252 00:28:00,230 --> 00:28:11,510 And that's the one of his, uh, later fable. Uh, and you look at the details, uh, you may start thinking also about the, the author author attribution. 253 00:28:11,530 --> 00:28:17,120 And in this one. Oh, sorry. Uh, the publisher, uh, attribution in this kind of cases. 254 00:28:17,930 --> 00:28:22,399 And, uh, so just an observation. Uh, so they are quite intriguing patterns. 255 00:28:22,400 --> 00:28:26,840 So, so some people say that. Okay, Thomson was a publisher, uh, the younger. 256 00:28:26,840 --> 00:28:32,000 So he didn't have in-house printing. And it was this what, who did all the or the printing? 257 00:28:32,210 --> 00:28:37,550 Uh, but when he dies, uh, some of the uses of the head pieces drops. 258 00:28:37,910 --> 00:28:42,500 Uh, they're not used anymore. So if there were watches, I don't know why that that happened. 259 00:28:42,500 --> 00:28:46,580 And so. So there's a lot of this kind of kind of quite an intriguing. 260 00:28:47,680 --> 00:28:54,940 Questions. We are now working. Uh. I'm not there's no, uh, final, uh, thing that we have said about Thompson, 261 00:28:54,940 --> 00:28:59,830 but but that's something that we are working towards with this head piece of information. 262 00:29:01,330 --> 00:29:10,390 So with bibliographic, uh, data science, what I want to say is that this virtuous cycle of better data, which is a never ending process. 263 00:29:10,840 --> 00:29:18,520 We're talking about better data. So you combine the harmonised bibliographic data, text and image sources, and you come out with, uh, 264 00:29:18,520 --> 00:29:25,480 something that supplements, uh, and helps you use the images then to enrich the bibliographic data. 265 00:29:25,960 --> 00:29:33,340 Uh, there's a loop that move forward. Uh, and then you can also use the text information in many, many different kinds of ways. 266 00:29:33,790 --> 00:29:39,490 And what is really important is that you develop this kind of wit together with the use cases. 267 00:29:39,610 --> 00:29:46,560 So thinking about what kind of research question where in what sense is the data good enough? 268 00:29:46,570 --> 00:29:48,310 Because it never will never be perfect. 269 00:29:48,730 --> 00:29:54,879 We would never be doing an intellectual history if we are just there to let's get this dataset to a perfect condition, 270 00:29:54,880 --> 00:29:58,900 and then you end up annotating everything by hand and so forth. 271 00:29:59,290 --> 00:30:05,360 So good enough and usable and but then you need to be also able to evaluate that. 272 00:30:05,470 --> 00:30:08,530 What does that mean in in different kinds of context. 273 00:30:09,370 --> 00:30:16,000 But this is very much the kind of core of why I want to talk about bibliographic data science. 274 00:30:18,280 --> 00:30:19,899 And okay, I'll make this. 275 00:30:19,900 --> 00:30:28,630 So I publicly say that we are going to finish the cleaning of the our version of the STC and make that available as open data. 276 00:30:29,480 --> 00:30:35,200 We corrected it from 25 to 26. So I give our give us a little break here here. 277 00:30:35,200 --> 00:30:40,149 But but but anyway so so we think that this is uh, a bit of a milestone. 278 00:30:40,150 --> 00:30:46,900 So then after that anyone can can work with it. It will include for example, addition work relationship. 279 00:30:46,930 --> 00:30:52,870 Uh, that will help you a lot if you're working, for example, with the echo, echo data and so forth. 280 00:30:53,260 --> 00:31:00,070 Uh, so hopefully uh, that will also, uh, help, uh, collaborative projects when people work with, 281 00:31:00,520 --> 00:31:06,250 uh, text mining and for example, uh, lamps and, and and so forth. 282 00:31:07,100 --> 00:31:10,880 If we are not done by 26, it will be no fault. 283 00:31:12,140 --> 00:31:22,430 Okay. Uh, then a couple of snapshots. Uh, about what can you do when, when when we are in a situation that that is your data is kind of in a, 284 00:31:22,430 --> 00:31:25,490 in a sense, uh, good enough that we can start working with it. 285 00:31:26,000 --> 00:31:29,870 Can information is something something that we've been very, very interested in, 286 00:31:30,380 --> 00:31:36,110 uh, in, in many different, uh, subprojects or however you want to call them. 287 00:31:36,770 --> 00:31:43,010 Uh, this is one, one project, uh, that we worked with, uh, Mark Hill, who's here? 288 00:31:43,220 --> 00:31:49,670 He was, uh, postdoc, uh, at the time. So we wanted to examine the early morning cannon with the idea that. 289 00:31:49,670 --> 00:31:55,520 What is a cannon here? We just took it so that what is published most often, most frequently. 290 00:31:55,700 --> 00:32:04,820 And for the longest period of time. So. So, uh, means that cookbooks and Shakespeare go on level, uh, as a as a starting point. 291 00:32:05,390 --> 00:32:11,480 And then, uh, we use the addition field information, the work edition, uh, that was really crucial for this. 292 00:32:11,930 --> 00:32:14,930 Uh, and then, uh, publisher and printer information. 293 00:32:15,470 --> 00:32:22,490 And the main interest was really, uh, kind of if we can see an epistemological shift in this kind of kind of information. 294 00:32:22,500 --> 00:32:31,459 Uh, there is an article, uh, before, uh, so, so if you if you want to read it, I'm just going to show a couple very simple, 295 00:32:31,460 --> 00:32:37,280 uh, slides that, that what, what you can, you can do when you are examining this kind of data. 296 00:32:37,290 --> 00:32:44,900 So, uh, here, uh, okay. On the left here, here's, uh, the full, uh, STC most published. 297 00:32:45,350 --> 00:32:51,380 Uh, then if we look at, uh, the new, I mean, first editions, uh, it looks like this. 298 00:32:51,770 --> 00:32:55,550 And then our, uh, uh, cannon. 299 00:32:55,580 --> 00:33:01,910 Cannon data is, is is here, and it kind of breaks down quite nicely in the end. 300 00:33:01,940 --> 00:33:07,820 Uh, when when they are sorted by the first year of publication, this is the number of times they were published. 301 00:33:08,210 --> 00:33:16,730 There's a little bit of way, way in going on, uh, so that later, if you, there's no, uh, data ends at 1800. 302 00:33:16,730 --> 00:33:21,500 So, so that these, these, uh, works can be also included. 303 00:33:21,500 --> 00:33:24,809 And it's not only only earlier. Once. Now. 304 00:33:24,810 --> 00:33:28,110 Uh, so then you can, of course, upset in different ways. 305 00:33:28,650 --> 00:33:35,490 You can start looking at most frequently printed at least during one decade between 1500 and 1800. 306 00:33:35,760 --> 00:33:41,069 So if you look at, uh, what you find, uh, in this way, uh, the works, 307 00:33:41,070 --> 00:33:47,310 there's a lot of things that you would expect to find, maybe some that that wouldn't be so common. 308 00:33:47,730 --> 00:33:56,639 Uh, but that's not the point here. The point is that we we can do this kind of examination, uh, and set it up, uh, when you have a certain interest. 309 00:33:56,640 --> 00:34:08,040 And ours was that that if, uh, we are taking this, this type of, uh, uh, canon, uh, which is justified also by the by the reason that, uh, 310 00:34:08,850 --> 00:34:16,440 the you would the so-called London average of print runs, uh, works and, and that's, 311 00:34:16,440 --> 00:34:21,780 that's if you want to have more books printed, you will get out with the new edition. 312 00:34:22,110 --> 00:34:27,750 So, so that's also a reason why why this this is justified to be doing. 313 00:34:29,270 --> 00:34:34,370 Also, of course. Authors. Uh, publishing after death is important. 314 00:34:34,430 --> 00:34:41,420 Uh, usually, uh, the normal thing is for the non-canonical ones that once you die, they don't print you anymore. 315 00:34:41,450 --> 00:34:47,779 Uh, for whatever reason. Uh, so here also, you find very Voltaire's there, of course. 316 00:34:47,780 --> 00:34:54,320 Uh, and, uh, these are now. Now, remember the the British SDC, uh, data, uh, used here. 317 00:34:55,310 --> 00:34:58,910 So, uh, this was when 21, uh, we were doing. 318 00:34:59,570 --> 00:35:02,690 But then our interest in canon is constant. 319 00:35:02,700 --> 00:35:07,460 Uh, so there's this many more articles. I just want to show one recent thing that we did. 320 00:35:08,300 --> 00:35:13,720 So we took, uh. Um. Well, the article's, uh, title tells you a lot. 321 00:35:13,870 --> 00:35:18,010 Quantifying the presence of ancient Greek and Latin classics in early modern Britain. 322 00:35:18,760 --> 00:35:23,140 And, uh, we won when we were exploring the data. 323 00:35:23,920 --> 00:35:27,790 And it was a lot of work to get the cannon engine cannon right. 324 00:35:27,820 --> 00:35:29,680 Uh, so so that was a lot of work. 325 00:35:30,460 --> 00:35:40,360 But, uh, what we noticed in a data driven way, and what the article is very much about is that one while there was a as a whole, 326 00:35:40,360 --> 00:35:43,900 there's a there's a growing popularity of ancient printing. 327 00:35:44,650 --> 00:35:48,250 But what happens at the same time is the diversity. 328 00:35:48,280 --> 00:35:50,110 Diversity diminishes. 329 00:35:50,380 --> 00:35:59,740 So so the ones that they want, there's many more authors that are printed in the 17th century, but in the 18th century it becomes very concentrated. 330 00:35:59,740 --> 00:36:04,390 So so the so-called martell's effect becomes very, uh, prominent. 331 00:36:04,780 --> 00:36:09,940 So the ones that are printed, most, uh, printed and the other ones, uh, kind of dropped. 332 00:36:10,240 --> 00:36:18,280 So, so we did a lot of, uh, different kinds of, uh, uh, testing, uh, whether statistical testing, whether whether that is right. 333 00:36:18,280 --> 00:36:21,750 If you're interested in that, please, please look at the, uh, article. 334 00:36:21,910 --> 00:36:31,410 Uh. Be good. So then, uh, Verna Colourisation and what we call democratisation of reading. 335 00:36:31,420 --> 00:36:40,320 Uh, these are also now just snapshots. So, uh, this work was done long ago, but I still find it quite interesting. 336 00:36:40,770 --> 00:36:45,360 So if we look at book printing in Latin vernacular in northern Europe. 337 00:36:45,370 --> 00:36:51,300 So there's now many more, uh, data base, uh, for, for this than just the British case. 338 00:36:51,900 --> 00:37:03,630 Uh, what do you notice quickly, for example, from this heritage of the printed book database, is that, uh, book size sizes, uh, shift considerably. 339 00:37:04,050 --> 00:37:09,180 So we go towards much more mobile. So October is the start of the enlightenment. 340 00:37:09,180 --> 00:37:14,400 Uh, not Voltaire. Uh, so so that that, uh, becomes quite clear. 341 00:37:14,550 --> 00:37:19,940 Uh, at the same time, the Latin share that that is dropping, but it's so, 342 00:37:19,940 --> 00:37:28,500 so one thing to note about the Latin share is that, uh, it's not a synchronised move. 343 00:37:28,560 --> 00:37:34,740 Uh, so very looking at the local local differences, different cities have different profiles. 344 00:37:35,220 --> 00:37:39,540 Uh, so we can't say that there was a uniform move, uh, or something like that. 345 00:37:40,020 --> 00:37:44,040 Yet from here, uh, you could start. I mean, not not saying. 346 00:37:44,040 --> 00:37:46,169 I mean, the the Latin, uh, part. 347 00:37:46,170 --> 00:37:54,510 We looked quite carefully, uh, the different profiles, uh, but from the, the document size, uh, what do you want to make out of that? 348 00:37:55,230 --> 00:37:59,219 Um, that's another other question. And how how red or thick? 349 00:37:59,220 --> 00:38:03,150 You want to see the connection? That's that's another issue. 350 00:38:03,810 --> 00:38:12,000 But, uh, especially the drop of the Latin share, uh, might suggest that the reader community is at least getting wider. 351 00:38:12,060 --> 00:38:18,690 Uh, so also, uh, those who didn't go to college, uh, are starting to read, uh, and which is, 352 00:38:18,690 --> 00:38:26,240 uh, important part and part of, uh, this movement and then combined to this, this is a, 353 00:38:26,300 --> 00:38:35,160 uh, article led by, uh, doctoral researcher Eero de Honan, uh, who's very interested about the kind of like, 354 00:38:35,160 --> 00:38:39,180 uh, looking at book history from the economic perspective as well. 355 00:38:39,840 --> 00:38:50,820 So, uh, probably everybody knows this, uh, idea that there was a monopoly in London and the book prices were kept artificially high. 356 00:38:51,420 --> 00:39:02,070 Uh, and then when a lot of people have been interested about this copyright ruling, uh, that that there is this public domain that comes, 357 00:39:02,100 --> 00:39:11,010 uh, and then that the, the, the prices would have gone down considerably when we looked at this from the perspective of price constraint. 358 00:39:11,700 --> 00:39:17,550 Uh, the obvious conclusion was that that didn't have such an effect. 359 00:39:18,060 --> 00:39:27,870 So the, the reading still was, uh, mainly for the, for the, well, I mean, well, not reading, but but book ownership was, was for the wealthy. 360 00:39:27,960 --> 00:39:38,340 Uh, and then more emphasis could and should be put on this kind of different kinds of circulation, uh, resale or lending practices and so forth. 361 00:39:38,940 --> 00:39:47,460 But, uh, a good way, in my opinion, to show how, uh, this bibliographic data, uh, 362 00:39:48,270 --> 00:39:55,810 science perspective can bring a new aspect to something that has been debated over decades. 363 00:39:55,940 --> 00:40:01,080 The copyright question is, uh, is a huge issue in, in, in book history. 364 00:40:01,500 --> 00:40:06,090 And, uh. This is about bringing a new perspective. 365 00:40:06,390 --> 00:40:11,760 There was, uh, we calculated, uh, probability for the whole SDC for the book prices. 366 00:40:11,760 --> 00:40:14,880 And there was, uh, quite a lot of work gone into that. 367 00:40:15,330 --> 00:40:20,370 But if you're interested, please look at that, too. Now reception history. 368 00:40:21,300 --> 00:40:31,680 Which is also something that, uh, in our workshop now, over the two days, uh, we've talked quite a bit about texture use, uh, and such things. 369 00:40:32,250 --> 00:40:36,690 This is also something that we in Helsinki we've been working quite a bit with. 370 00:40:37,110 --> 00:40:42,689 Uh, so we have run this blast bioinformatics, uh, 371 00:40:42,690 --> 00:40:51,750 software and all of the TCP 18th century collections online and also then the British Library newspapers, Burnie and Nichols collection. 372 00:40:52,110 --> 00:40:56,400 And we have about a half a billion, uh, text reuse pairs. 373 00:40:57,120 --> 00:41:03,779 And, uh, what do you I mean, why that bibliographic data science for doing this is so important is that 374 00:41:03,780 --> 00:41:10,019 that we also need to have the SDC frame and the possibility of getting rid of, 375 00:41:10,020 --> 00:41:18,600 uh, multiple editions, uh, in certain cases if we want to really analyse that, that what is happening here. 376 00:41:19,200 --> 00:41:23,339 But, uh, so we doing different kinds of reception studies, uh, 377 00:41:23,340 --> 00:41:30,060 patterns of borrowing impact in some sense and, and also what we might call discourses through the, 378 00:41:30,420 --> 00:41:39,930 this kind of dissemination of texts and what again, that knowledge uh, in uh graph uh, 379 00:41:39,930 --> 00:41:48,360 that we want to see uh, so that we go beyond one single digital archive, we have plenty of them. 380 00:41:48,720 --> 00:41:58,800 And then we are looking to step from some kind of simple textual overlaps to mapping ideas so that we are we are able to classify and understand, 381 00:41:59,130 --> 00:42:07,740 uh, what the texture uses, for example, are or what uh, Glenn call today, uh, intellectually relevant, uh, 382 00:42:08,160 --> 00:42:17,850 texture uses that are more interesting than just mere Bible being, uh, quoted many times. 383 00:42:19,410 --> 00:42:26,670 So, uh, for us, uh, just to again, I want to go back to, uh, back in time, 2019 was still relevant. 384 00:42:26,700 --> 00:42:34,910 Uh, the ideas that we had then, uh, these were the basic questions that we came with, uh, when looking at the tax free use. 385 00:42:34,920 --> 00:42:45,450 So if we take Mandeville, uh, what kind of patterns of borrowing, uh, did he have or what kind of habits of reusing his own works did Mandeville have? 386 00:42:45,450 --> 00:42:51,000 Often with texture use, we tend to think that let's get rid of the author's own works. 387 00:42:51,480 --> 00:42:55,020 So because they are not interesting, in some cases they are very interesting. 388 00:42:55,470 --> 00:42:58,090 And then, of course, who quoted Mandeville and why? 389 00:42:58,110 --> 00:43:05,970 So there's quite a lot of, uh, if we are a little bit clever with like the simple data, we are able to do quite interesting things. 390 00:43:06,520 --> 00:43:13,799 And it's not always running for the next, uh, biggest model or something like that, but using what we have. 391 00:43:13,800 --> 00:43:20,790 And that's where, uh, humanities people could be much more effective and, uh, thoughtful in many sense. 392 00:43:22,080 --> 00:43:27,690 This is again, uh, the, uh, image from 2017. 393 00:43:28,200 --> 00:43:32,279 But I really like it because a lot of things came out because of this. 394 00:43:32,280 --> 00:43:36,030 So what we called anatomy of the impact. 395 00:43:36,030 --> 00:43:42,480 Mandeville. Fable of the beast in 18th century. So what we have here is the beginning of the book. 396 00:43:43,080 --> 00:43:50,310 And here's the end of the book. These, uh, count of how many times different chapters, uh, reused. 397 00:43:50,610 --> 00:43:54,750 And then the colour tells you, uh, when is the reuse happening? 398 00:43:55,380 --> 00:44:02,820 So what we notice here, uh, that the beginning of the book, when there is, uh, uh, Mandeville makes a lot of noise. 399 00:44:03,330 --> 00:44:14,340 So this part where, uh, private vices, public benefits is said, uh, it's used in 20s and 30s, of course, people complaining quite a bit about it. 400 00:44:14,820 --> 00:44:19,500 Uh, but it's something that, that then, uh, vanishes a little bit after that. 401 00:44:19,860 --> 00:44:21,570 And then there's an interesting shift. 402 00:44:21,720 --> 00:44:32,730 Uh, so the part, the chapter where he discusses luxury and pride, 1760s, there are many more times that this is this part is quoted. 403 00:44:33,300 --> 00:44:37,710 And then, of course, we can see what is not quoted at all. So Mandeville was a vegetarian. 404 00:44:38,010 --> 00:44:42,750 So he wrote a chapter called Why Man's Craving flesh for food is unnatural. 405 00:44:42,960 --> 00:44:51,490 Nobody is quoting it in the 18th century. But, but but uh, so this kind of, I mean, like that we can do this kind of anatomy of a book. 406 00:44:51,540 --> 00:44:57,959 I mean, of course, this is, uh, like, like a visual way of looking at how, uh, 407 00:44:57,960 --> 00:45:03,450 different parts of a book are reused, but it's so useful and especially when it's scalable. 408 00:45:03,720 --> 00:45:10,530 So you can do it, uh, for basically for any, any 18th century books, because we have the, The Echo collection. 409 00:45:10,830 --> 00:45:18,870 And this kind of shift especially to me is extremely interesting that that and and also that the 410 00:45:18,870 --> 00:45:27,509 question that why quantity uh sometimes is matters and we can see some kind of shift then of course, 411 00:45:27,510 --> 00:45:36,000 I mean, if you want to go uh, we developed based on, on mainly this idea here, uh, we developed this, uh, reception reader. 412 00:45:36,870 --> 00:45:43,170 Uh, so if you go w w w reception reader accom uh, you can use this interface. 413 00:45:43,500 --> 00:45:50,340 We are very grateful to Gail, uh, that they let us, uh, publish this openly so anybody can use it. 414 00:45:50,340 --> 00:45:54,540 You don't have to have a subscription. Uh, and it works so that. 415 00:45:54,570 --> 00:45:58,110 Okay. And here is the beginning of the book, and here's the end of the book. 416 00:45:58,470 --> 00:46:04,200 And there is a timeline, all of these, uh, dots here, uh, reuse cases. 417 00:46:04,650 --> 00:46:12,480 So you hover over one, uh, here, you see, when you are hovering over it's interactive, you see which book are we talking about? 418 00:46:12,870 --> 00:46:18,290 And then you click on the on the dot and it opens up the pages. 419 00:46:18,300 --> 00:46:22,770 So here on the left is the original. And here on the right is the blue. 420 00:46:22,770 --> 00:46:27,480 It's uh, uh, book that is quoting this, uh, part of wonderful. 421 00:46:27,900 --> 00:46:34,049 And, and it's highlighted here again, I mean, very easy, intuitive, uh, 422 00:46:34,050 --> 00:46:38,370 and meant for intellectual historians who don't want to do necessarily anything statistical. 423 00:46:38,700 --> 00:46:44,760 But this is what people used to do. I mean, that you are looking I mean that, uh, do I find the same things? 424 00:46:44,820 --> 00:46:48,780 Uh, and here you can have an interface that does it, does it very easily. 425 00:46:49,380 --> 00:46:55,800 So David Rossen, uh, from our group is the one, one the front end guys, so to say, uh, 426 00:46:55,800 --> 00:47:01,020 who, who very quickly, when he came with us, uh, we were able to, to put this together. 427 00:47:02,570 --> 00:47:08,380 Now, uh, I want to show a couple of things that you could do with text reviews that you maybe wouldn't think, uh, very ugly graph. 428 00:47:08,390 --> 00:47:16,700 But what you have in here, in this matrix, uh, is, uh, Manuel's, uh, all of his own books, uh, and, and textual overlaps between them. 429 00:47:17,420 --> 00:47:24,470 And so one thing that Mandeville is very well known for is he recycles passages from his own works, 430 00:47:24,860 --> 00:47:31,940 and then he cannot be without quoting Bayle or taking something from Bayle, not quoting, but but just taking material. 431 00:47:33,020 --> 00:47:38,200 This happens in almost all of the works except one, which is public use. 432 00:47:38,720 --> 00:47:47,780 Uh, if you look at the reason why public use is attributed to Mandeville, it becomes quite questionable. 433 00:47:47,780 --> 00:47:53,630 That is he actually the author? Uh, so we there's there's an article. 434 00:47:53,660 --> 00:47:59,600 Uh, that's in the digital enlightenment studies. Uh, we we wrote about Mandeville and where we talk about it. 435 00:48:00,290 --> 00:48:10,070 Uh, we are not saying that this is now concrete evidence that we have done everything to show, but this is evidence to at least think about it. 436 00:48:10,550 --> 00:48:14,070 That that. The way that oath. 437 00:48:14,190 --> 00:48:22,230 I mean, people frequently quote, if people want to talk about prostitution and then they, they quote, say that public statues is by Mandeville. 438 00:48:22,590 --> 00:48:28,040 Uh, but the reason why is this by Mandeville is that everybody thought that it is by Mandeville. 439 00:48:28,050 --> 00:48:30,000 So, so that the evidence is super slim. 440 00:48:30,210 --> 00:48:37,530 Also, if you look at the publisher, uh, information, uh, about it, um, there are reasons that you might want to doubt it. 441 00:48:39,410 --> 00:48:50,090 Also, uh, one thing. So. So we mean, the way that we did this addition work and then collection level was based on, uh, titles from the HTC first, 442 00:48:50,870 --> 00:48:58,670 but then when we have this text reuse information, what we did, we computed a coverage metric from echo and bought TCP. 443 00:48:59,210 --> 00:49:08,060 And then it's a directed one. So which way is the I mean now we are talking about works of the same, same same book. 444 00:49:08,210 --> 00:49:11,720 Uh, or might be a collection of, of something else. 445 00:49:12,410 --> 00:49:17,120 But the question that how much overlap there is between the same, uh, 446 00:49:17,120 --> 00:49:22,970 additions we are involved in Hume's history of England, uh, for Clarendon edition. 447 00:49:23,180 --> 00:49:25,639 So we need this kind of, uh, overlap. 448 00:49:25,640 --> 00:49:33,469 Uh, question for, for many, many different reasons, but what do you get here is a possibility based on the text we use. 449 00:49:33,470 --> 00:49:47,000 So if we have a work and edition one in addition to have, there's 90% of material that is the same going this way and 75% going that way. 450 00:49:47,330 --> 00:49:54,740 Okay. You can make, uh, different kinds of, uh, assumptions from there, but they are still probably the same addition. 451 00:49:55,370 --> 00:50:06,500 But when you have a situation where 90% of the text goes to this edition here, but only 30%, uh, is is kind of covered like this, 452 00:50:06,830 --> 00:50:11,390 there's a good reason to start thinking that maybe we're talking about collection level instead of a work level, 453 00:50:12,110 --> 00:50:17,540 but for many, many different kinds of reasons. Uh, extremely important. 454 00:50:17,640 --> 00:50:22,970 And one good way of thinking that how do we want to use, uh, this, this type of, uh, 455 00:50:23,480 --> 00:50:29,240 text reuse material that might be seemed trivial to someone, but it most definitely is not. 456 00:50:31,500 --> 00:50:37,800 So. Yeah. So just to string matching, uh, the newspapers as an intellectual platform. 457 00:50:38,130 --> 00:50:41,460 Uh, that's something that I would really encourage people to think about. 458 00:50:41,850 --> 00:50:46,590 Uh, and also this kind of way of using this for critical editions. 459 00:50:46,620 --> 00:50:55,620 Uh, we are building, uh, scholarly edition tool that we're and we talked about that, that today, uh, where, where this is, 460 00:50:55,620 --> 00:51:03,510 uh, coming to use and also we want to make this available for others so that, that, uh, making, uh, kind of. 461 00:51:05,210 --> 00:51:09,410 Enlightenment studies, uh, move forward in this sense as well. 462 00:51:11,220 --> 00:51:16,470 Okay, I for Enlightenment studies. Uh, this is my last, last part. 463 00:51:16,550 --> 00:51:25,470 Um, so this now, uh, for me is is a quite, uh, important, uh, slide. 464 00:51:25,620 --> 00:51:34,440 Uh, so we're trying to conceptualise, uh, between text reuse and something we call semantic similarity. 465 00:51:35,460 --> 00:51:40,080 And this has much to do with the what we are able to do with, uh. 466 00:51:41,550 --> 00:51:45,780 Embeddings. Uh, so so also if you want to say that. 467 00:51:45,810 --> 00:51:48,870 Okay, what can we do with AI and large language models. 468 00:51:49,260 --> 00:51:59,820 This has much to do with it. So if we think about parallel text uh, and uh, what is kind of overlapping or has to uh. 469 00:52:01,030 --> 00:52:06,519 What connects them in a way. So one way is to think about this kind of lexical text reuse. 470 00:52:06,520 --> 00:52:10,180 Right. That, uh, includes artefacts. 471 00:52:10,390 --> 00:52:18,360 Uh, so there's a lot of, uh, if you think about newspapers and books, uh, the, uh, advertisements for books. 472 00:52:18,370 --> 00:52:24,310 So basically, uh, imprints, uh, the one of the most repeat repeat that elements. 473 00:52:24,730 --> 00:52:28,030 There's also OCR errors. You could study those if you'd like. 474 00:52:28,540 --> 00:52:33,940 Uh, then the the reprints. Uh, that's that's an interesting question for us at least. 475 00:52:34,480 --> 00:52:38,650 Uh, you have quotations, uh, near verbatim, uh, verbatim. 476 00:52:38,650 --> 00:52:44,650 And then you have also extensive, uh, reuses and secondary quotes and that kind of things. 477 00:52:45,280 --> 00:52:53,200 But then this continuum, uh, here, uh, when you are thinking about semantic similarity, this is extremely important. 478 00:52:54,040 --> 00:52:58,540 Illusions. You can study them, but it's on. Right. Because we are not doing it at the moment. 479 00:52:59,110 --> 00:53:08,020 You can think about paraphrasing, uh, and also what is really important for us is translations full, partial, combined and so forth. 480 00:53:08,530 --> 00:53:11,350 And then also something that we call meaning matching. 481 00:53:11,620 --> 00:53:21,040 Uh, and this is, uh, really, really, uh, interesting, but at the same time maybe difficult so that, 482 00:53:21,040 --> 00:53:26,919 that when what is the meaning match and when is there like a intentionality between 483 00:53:26,920 --> 00:53:31,570 someone commenting and when are we just talking about topic or similarity? 484 00:53:32,350 --> 00:53:40,150 But this is, uh, something that it's for the US, the intellectual history and, and uh, 485 00:53:40,360 --> 00:53:48,700 humanists to sort out the methods are there, but we need to be able to define what, what we really want to do. 486 00:53:50,120 --> 00:53:57,139 So, uh. This is our sort of aim to, to make sense of, 487 00:53:57,140 --> 00:54:03,620 of what we are talking about when we are talking about tax free use and when we are talking about semantic similarity and meaning. 488 00:54:05,320 --> 00:54:10,870 So, uh, what we'd like to do. And there might be other ways also to think about it. 489 00:54:10,870 --> 00:54:16,180 So. So this kind of taxonomy of meaning or how do you want to do it? 490 00:54:16,480 --> 00:54:21,850 But the very important point is here when we are thinking about the meaning matches, uh, 491 00:54:22,450 --> 00:54:30,400 the multi uh lingual uh language models, uh, uh, transformer models, they are able to do it across languages. 492 00:54:30,910 --> 00:54:33,370 So the language barrier is not a problem anymore. 493 00:54:34,330 --> 00:54:40,120 And this is the what what probably going to change quite a quite a bit of research, uh, when we go towards the future. 494 00:54:41,730 --> 00:54:48,480 Here's an example of what is the lexical overlap. So quote there might be this kind of OCR noise as you see here. 495 00:54:49,170 --> 00:54:52,730 And and here is an example of a semantic similarity. 496 00:54:52,740 --> 00:55:00,690 So we have a small interface where you can type in uh about women humans rights and equal to men. 497 00:55:00,990 --> 00:55:10,200 And then you can hit uh, the based on what it classifies as, uh, something that is closest to that, that that's an example of, of. 498 00:55:11,160 --> 00:55:22,110 How how how that that works. Now, maybe the most interesting thing about from this where we are now, uh, concerns this translation mining. 499 00:55:22,800 --> 00:55:33,450 So we created embeddings. Uh, so we turned the 200,000 books from echo into this, uh, embeddings, uh, that are. 500 00:55:34,290 --> 00:55:37,530 So every single book is chunked. 501 00:55:37,650 --> 00:55:41,610 Uh, so, so that they have, uh, have their own own, um. 502 00:55:43,360 --> 00:55:50,350 Uh, vector. And, and then we turned to 300,000 books that we get from garlic into similar ones. 503 00:55:50,980 --> 00:55:55,090 And then they are mapped. Uh, and, and there are different kinds of filtering steps. 504 00:55:55,420 --> 00:56:01,660 Maybe I don't talk about that, uh, now, but, uh, what we get is this type of, uh, 505 00:56:02,590 --> 00:56:09,250 as an output, uh, where you have the English book here, the beginning and the end. 506 00:56:10,030 --> 00:56:13,570 And here's the French book, uh, beginning and the end. 507 00:56:13,990 --> 00:56:18,280 Each single one book compared to each other. And then different kinds of filtering. 508 00:56:18,280 --> 00:56:26,979 What is relevant, what is not. Uh, and you can start there are these diagonals that we are looking for then, uh, to recognise that. 509 00:56:26,980 --> 00:56:30,040 What is the translation? Here's a very interesting case. 510 00:56:30,600 --> 00:56:34,390 We have a book that has a British introduction. No translation. 511 00:56:34,900 --> 00:56:38,260 Then you have a part that is translated this here in red. 512 00:56:38,590 --> 00:56:42,370 Then you have some original text the British, uh, have added here text. 513 00:56:42,830 --> 00:56:48,729 Then you have another translation part, and then you have another, uh, from a third, another book. 514 00:56:48,730 --> 00:56:51,430 There's more translation here at the end. 515 00:56:52,270 --> 00:57:00,280 Uh, so you can do I mean, all we have done this to every single one book that, that there are in these collections. 516 00:57:00,820 --> 00:57:03,040 And if you think about it, I mean that that. 517 00:57:04,940 --> 00:57:13,100 Cases that that you could start developing from there, uh, different kinds of intentionality in translation and also thinking about that. 518 00:57:13,130 --> 00:57:20,930 Uh, we are not interested only in the full translation, but, uh, different kinds of ways of, of of, of doing the translation. 519 00:57:21,350 --> 00:57:23,000 That's, that's quite interesting. 520 00:57:23,540 --> 00:57:32,440 And of course, I mean, look here, uh, you the because there are these green that go to lower our partner, uh, Philip Ginter from Turku. 521 00:57:32,750 --> 00:57:37,040 He is very angry because he wanted all of the nicely on green here on the top. 522 00:57:37,370 --> 00:57:40,160 Uh, so, so that that we could say that it works perfectly. 523 00:57:40,580 --> 00:57:47,360 Uh, but, I mean, so, so that these different kinds of evaluation methods are just as important as doing the, 524 00:57:47,840 --> 00:57:52,190 uh, development of the methods that that take us towards this direction. 525 00:57:52,850 --> 00:57:56,030 And then also, of course, there's a lot of manual annotations. 526 00:57:56,240 --> 00:58:00,469 Uh, just check that, that what is actually the translation. 527 00:58:00,470 --> 00:58:07,460 So the between commentary on Bible obviously isn't a translation most of the times. 528 00:58:07,970 --> 00:58:12,050 Uh, but uh, just as something on the same topic. 529 00:58:12,440 --> 00:58:16,850 So, so these kind of things you need to sort out, it's a probabilistic approach. 530 00:58:17,180 --> 00:58:25,220 So we're not claiming that is 100%. But here if you see a clear semantic and intellectual similarity check, it's a translation. 531 00:58:26,220 --> 00:58:32,910 And then for us, kind of what is important is also developing a kind of taxonomy of translations. 532 00:58:33,510 --> 00:58:40,590 So, so, uh, going a little towards the Linnaean direction that we are not saying that these are set in stone, 533 00:58:40,950 --> 00:58:47,400 but we want to kind of organise knowledge and think about the world in a way that we we are doing different and we have the 534 00:58:48,180 --> 00:58:56,489 we are brave enough to to say that there are some kind of taxonomies that organise the knowledge in this humanities world. 535 00:58:56,490 --> 00:59:02,550 And even when, uh, this class and that class are individual and they can be studied, 536 00:59:03,210 --> 00:59:07,590 you know, as objects of their own, but this may be some kind of similarity between objects. 537 00:59:08,070 --> 00:59:11,640 So, so that's also something, something that we've been developing. 538 00:59:13,080 --> 00:59:16,090 And then we get to the meaning of meaning. So, Rousseau. 539 00:59:17,270 --> 00:59:21,860 Man is born free and everywhere. He's in chains as it's translated into English. 540 00:59:22,400 --> 00:59:29,390 Uh, so what is interesting is the reinterpret interpretation used in misuse in different contexts. 541 00:59:30,310 --> 00:59:33,540 This is on the plus side. We did this together with Ana. 542 00:59:34,300 --> 00:59:38,410 Uh, so, like, really that the interest in. 543 00:59:39,280 --> 00:59:43,959 I don't want to say that everything is reuse or the meaning is reuse or. 544 00:59:43,960 --> 00:59:47,140 But but oftentimes the meaning comes from that. 545 00:59:47,800 --> 00:59:54,370 Uh, and so first it's uh, taken as a moderately in criticism or criticism towards Mandeville. 546 00:59:54,370 --> 01:00:02,730 So there's vices uh, commercial society will lead us French Revolution suggestion to take arms against monarchy. 547 01:00:02,740 --> 01:00:06,160 Uh, then it's a claim against slavery. Some people said that. 548 01:00:06,160 --> 01:00:11,230 Well, clearly it's, uh, uh, you can use it in the pro-slavery way as well. 549 01:00:11,350 --> 01:00:18,400 Uh, and then, of course, uh, him being criticised, that principle doesn't apply to women. 550 01:00:19,150 --> 01:00:24,550 But so what we can do algorithmically. So we start separating engagement. 551 01:00:24,610 --> 01:00:29,230 Now, now here the the meaning, uh, comes through the engagement. 552 01:00:30,280 --> 01:00:40,659 So we see that that okay, here is not, uh, it's not a text you use, but, um, similar meaning. 553 01:00:40,660 --> 01:00:50,410 And there's an engagement to Rousseau and there's then a context where it's understood and those contexts are shifting. 554 01:00:51,130 --> 01:00:55,450 So a little bit about the question where we started that what is intellectual history? 555 01:00:56,320 --> 01:01:00,610 But why aren't we. In intellectual history. 556 01:01:00,620 --> 01:01:04,730 Interested in theory anymore? I mean, we really should be. 557 01:01:05,360 --> 01:01:13,970 And there's such good opportunities here to start thinking about this also for the people who don't want to be computational. 558 01:01:14,810 --> 01:01:20,840 And that that I mean, it's it's a it's a silly thing that if we think that everybody needs to be computational. 559 01:01:20,840 --> 01:01:26,180 Of course not. But I mean that these, these tools and opportunities need to be. 560 01:01:27,360 --> 01:01:35,610 You know, they need to embed what we are doing. Uh, because otherwise we are just looking at Mandeville and Hume and, you know, so. 561 01:01:35,610 --> 01:01:38,910 So that we can still do it. But but but let's move forward. 562 01:01:42,290 --> 01:01:45,550 Okay, I'll skip that. All right. 563 01:01:45,560 --> 01:01:48,710 Oh, so let me just say. 564 01:01:50,390 --> 01:02:00,500 Conclusions. So also with respect that if I'm thinking that use of computation in kind of historical humanities, 565 01:02:01,400 --> 01:02:06,889 it was for a long, long time and still I mean summed up by its mixed methods enterprise. 566 01:02:06,890 --> 01:02:16,320 Right. So you use algorithms to spot patterns, and then you go to the classic close reading to make the interpretation. 567 01:02:16,500 --> 01:02:19,530 So that that's that's quite often what people have been doing. 568 01:02:20,480 --> 01:02:26,870 But I don't think that the next step forward. It doesn't have to be another algorithmic novelty. 569 01:02:27,290 --> 01:02:30,410 I think with this, this kind of stuff, we have quite a bit already. 570 01:02:31,590 --> 01:02:37,650 So the tools are here, but then the theoretical growth inside that that will take us forward. 571 01:02:38,190 --> 01:02:46,440 So instead of kind of chasing for the next quick mix, uh, genuinely scientific mindset. 572 01:02:47,520 --> 01:02:51,780 That we are really blending the domain knowledge and rigorous data analysis, 573 01:02:52,320 --> 01:02:59,250 and then this kind of large scale infrastructure gains would be much higher than than what we have here, for example. 574 01:03:00,090 --> 01:03:08,160 So then the ad hoc fixes, remember this kind of one off topic models that people wanted to do maybe still, 575 01:03:08,550 --> 01:03:17,010 but they postpone the kind of bigger ambition. And, uh, so really sustained collaboration between historians and, uh, 576 01:03:17,020 --> 01:03:21,220 data science and, and so forth that I think a lot of people are pushing towards that. 577 01:03:22,030 --> 01:03:28,210 So then the AI comes to serve us, and we are not just some kind of mascots in other people's projects. 578 01:03:30,670 --> 01:03:38,500 So also maybe this kind of computational hermeneutics that should push the machine towards the. 579 01:03:40,790 --> 01:03:43,790 Or beyond the preliminary pattern hunting. 580 01:03:43,850 --> 01:03:47,540 Uh, or maybe the modelling. Modelling history. 581 01:03:47,570 --> 01:03:52,490 What would this mean? And what do we want to do? So. So this should, uh. 582 01:03:53,640 --> 01:03:58,050 Lead us to reshape. Uh, also the interpretive act. 583 01:03:59,990 --> 01:04:06,070 And, uh, so the interpretation is no longer something when the computer is completely put aside and then, 584 01:04:06,080 --> 01:04:12,620 okay, I'll think about it and make my own biased, uh, view of of things. 585 01:04:13,610 --> 01:04:18,230 So iteratively and, uh, kind of in concert going, going forward. 586 01:04:19,370 --> 01:04:23,689 And I really don't think that this will eclipse or the use of AI that that would 587 01:04:23,690 --> 01:04:29,660 eclipse anything from take anything away from us in a in a more substantial sense. 588 01:04:30,750 --> 01:04:42,520 And it will give us, uh, we sharpen the understanding of historical change and still rely on the our interpretive judgement in a, 589 01:04:42,530 --> 01:04:49,440 in a much bigger, bigger, grander scale. Uh, as part of, uh, this kind of integrated interdisciplinarity. 590 01:04:50,720 --> 01:04:54,170 And of course, I mean, we will change, uh, as, as we move along. 591 01:04:54,170 --> 01:04:59,060 But that's how the world works. The got stop here.