Generated Shownotes
Chapters
0:00:07 Chit Chat Across the Pond Episode 783
0:02:59 The power of AI in creating chapter markers
0:04:24 Introduction and Recap of Previous Episode
0:07:11 Splitting the Show and Advanced Searching
0:11:43 Types Must Match for Proper Operations
0:13:09 String Containment: Checking if one string contains another
0:19:01 Array Containment: Checking if an array contains another array
0:22:50 Array Containment: Multiple elements must be contained for a match
0:25:18 Examining the Structure of the JSON File
0:27:36 Syntax of contains function and dictionary argument explained
0:31:00 Failure when looking for missing value in input
0:33:31 Explanation of inside function as inverse version of contains function
0:36:16 Introduction to regular expression usage with test function in JQ
0:38:26 Avoiding Backslashes and Double Escaping
0:41:39 Bart teaches regular expressions again.
0:47:29 Using SELECT statement with dictionaries as input
0:50:11 Prettier output using the alternate operator for null values
0:51:24 Unexpected Results and Copy-Pasting Mishap
0:52:22 Multiple Names Confusion
0:55:45 Vegas Memories and Craving for a Proper Old-Fashioned
Long Summary
In this episode of Chit Chat Across the Pond, Alison Sheridan announces a significant enhancement to the show’s material. They are now using a service called Auphonic that generates AI-generated transcripts for their episodes. These transcripts are unedited and can be found on the show notes on podfeet.com. Additionally, Auphonic has added auto-generated chapters to the podcast file, allowing listeners to jump to specific topics covered in the transcript. Bart Bouchat, the guest, highlights the power of AI in creating these chapter markers and provides one-sentence summaries of the text that follows. Alison expresses excitement about this feature and emphasizes that it is provided at no extra cost. Bart explains that this use of AI showcases the capabilities of modern machine learning technology.
Alison and Bart are halfway through Programming by Stealth episode 158, and Bart provides a summary for listeners to catch up on the previous episode. They mention the Christmas break between recordings and share jokes about enjoying some alcoholic beverages. They are finishing up their exploration of JQ's ability to query data, focusing on more generic criteria and how JQ handles typing.
They have learned how to convert and filter data types using functions, as well as discovered the alternate operator for finding replacements if certain data doesn't exist. They refer to data as "dirty" or "messy" but acknowledge that some people may not like that terminology. They chose to split the show after discussing more advanced searching. The select function acts as a filter or screen, passing through only the inputs that meet the criteria. Their goal is to reduce data to the desired output. JQ provides various functions to perform more powerful operations within select, including regular expressions.
Before delving into regular expressions, they explore the concept of containment, which involves checking if an input contains specific values. They discuss two types of containment: determining if the input contains something they are looking for, and determining if what they have is a subset or superset of the allowed values. To handle these checks, they introduce the contains and inside functions.
In the next part of the conversation, they discuss the importance of matching types when using the `contains` function in JQ. They explain that if the argument given to `contains` is a string, the input must also be a string, and different types cannot be compared or operated on. They give examples to illustrate these concepts, including searching for Nobel Prize winners in an array and checking if a number is in a string.
Moving on to the next part of the conversation, they explain the concept of containment in JQ. They clarify that for an element to be considered contained in an array, it has to be an element of the array and not the array itself. They emphasize that JQ requires the elements to have the same type. They demonstrate examples of checking if a string or an array is contained in another array. They also explain how containment works for dictionaries, comparing the values of the keys in the argument dictionary with the values in the input dictionary.
Continuing the discussion on containment, they use a JSON file as input and talk about checking if the menu contains specific dictionaries and arrays. They discover that the rule checks for the existence of keys and containment of values within the input.
They then move on to discussing regular expressions in JQ. They explain how the `test` function works, taking a string as input and a regular expression as an argument. They mention that JQ uses Perl compatible regular expressions (PCRE) and address the need to escape backslashes. They discuss different examples, including matching an IP address and words starting with a specific letter.
In the next part, they discuss filtering data using the `select` function and a regular expression. They encounter an error with null values in the data and explore using the alternate operator to search for names without surnames. They question the decision to use the "first name" field for organizations and highlight the silliness of having no surname.
They encounter unexpected results when copying and pasting a command, realize they don't have a terminal open, and struggle with pronunciations of certain names. They confirm the order of the list and mention that the ideal output format can't be shown yet. They introduce the next topic for episode 159, which is transforming JSON, and provide an extra challenge for listeners.
They discuss creating a regular expression that can learn patterns effectively, making it a pattern matching machine learning system. Upon concluding the episode, Alison expresses their fascination with this concept and looks forward to further conversations with Bart. They encourage listeners to support Bart on let's-talk.ie and provide options for Patreon, PayPal, and referral links. Alison emphasizes that Bart does the majority of the work, and they are there to listen and ask questions. They also encourage listeners to contact them or visit podfeet.com.
Brief Summary
In this episode of Chit Chat Across the Pond, Alison introduces the use of Auphonic for AI-generated transcripts and auto-generated chapters. They continue Programming by Stealth episode 158, exploring JQ's querying capabilities and discussing containment and regular expressions. They encounter errors and provide examples before concluding with a preview of the next topic and encouraging support for Bart.
Tags
Chit Chat Across the Pond, Alison, Auphonic, AI-generated transcripts, auto-generated chapters, Programming by Stealth, JQ's querying capabilities, containment, regular expressions, errors, examples, next topic, support for Bart
Transcript
[0:00] Music.
Chit Chat Across the Pond Episode 783
[0:07] Well, it's that time of the week again. It's time for Chit Chat Across the Pond, and this is episode number 783 for December 30th, 2023.
And I'm your host, Alison Sheridan. This week, our guest is back with Bart Bouchat with Programming by Stealth 158B, because I asked too many questions on the first half.
No such thing as too many questions.
Well, it did work out for us, and I definitely needed another week to get started on the second can have.
But before we dig in, I want to alert the audience to a significant enhancement to the material we're creating.
I use a service called Auphonic, which does a lot of things with the audio file when we're done recording, including leveling the audio, adding metadata to it, converting it to an MP3, FTPing it to my servers for you to be able to download.
But they recently added AI-generated transcripts. And we've had this for a while with Programming by Stealth.
If you look on the podfeet.com version of the show notes that of course point back over to Bart's show notes, and I know that's kind of confusing, but that's the way we do it.
Anyway, if you look at my version, you'll see a link to an unedited version that created by AI of the transcripts. And I'm going to keep emphasizing unedited because I'm not going to edit it. It's what you see is what you get from this.
Anyway, that's been around for a while and you've probably notice that already.
But recently, George, the guy that created Auphonic, has added auto-generated chapters.
And it's a twofold enhancement to the show.
[1:34] First of all, when you're looking at the transcript, you'll see the chapters that it's created at the top, and you can click on them to jump to the part of the transcript where we cover a specific topic.
Now, if you think about Bart's fabulous tutorial show notes from last time, 158A, he's got the section on telling us about the challenge solutions.
And it's very, very short. But what we said about that was very, very long. We talked a very long time.
So if you go into the transcripts and jump to that part, you'll get everything Bart explained to me and all of my silly questions that where he was having to explain things to me.
So there's a lot more content in text that you can go back and reread if you need to, to try to remember how Bart taught me what we learned last week.
The other thing is those same chapter marks are automatically added to the podcast file.
[2:24] So when you look in your podcatcher of choice and you want to say, I just want to listen to Bart explain that piece again, you can jump around and it's not perfect.
It's not maybe where we would put chapter marks, but it's auto-generated and we didn't have to do any work. And again, it is not going to be edited.
We're not going to fix it. But it really, I think, makes it a lot easier.
For example, this week, I wanted to go back and rework the challenges again, and I made it through two of them.
And the way I did it was I went directly to the transcript and I found those parts where Bart explained it and what he did last time. So that really, really helped me.
The power of AI in creating chapter markers
[2:59] I'm excited about it and it didn't cost any extra and you get that for free, yay.
[3:04] Yeah, and can I just say, this is a great example of the power of AI because you have two uses there of the modern machine learning technology.
First off, they have trained the machine to understand English and turn it into text.
So that's a nice bit of machine learning. But then they've taken one of the modern GPT models, and they've used its ability to summarize, to create those chapter markers.
Because they're really good one-sentence summaries of the text that comes after them.
Like, impressively good one-sentence summaries that they use as the headers. Like, that's cool.
Yeah, they almost make it, it's not just the words we said at the beginning or anything like that.
It does sound better than that. By the way, there's also a long summary and a short summary.
Those are a little weird, like of the entire episode.
The short summary is not bad. The long summary gets kind of weird, but you have to get past that. But if you use the chapter marks, you jump right past it.
So anyway, cool stuff. I thought that was really nifty.
That is very cool. And yeah, it's just as a computer scientist, I'm always fascinated to see these new technologies do cool new things.
And, you know, this is AI doing cool stuff, which is nice. Because I was promised as an undergraduate that AI was 30 years away and always would be.
And I think I was an undergraduate 30 years ago. So I think they were right.
Introduction and Recap of Previous Episode
[4:24] Anyway, so this is a slightly difficult one to start because we're halfway through PBS 158. This is part B.
So if you're listening back to back, I'm now going to give you a summary of what you heard two minutes ago.
But I should probably give us the summary anyway, because otherwise, if you're not listening back to back, you're going to say, what is he talking about?
Especially since I'm not listening back to back. Yeah, we've had, for the people listening, we've had Christmas between the last two times we recorded.
I don't know about you, Alison, but there was a substantial amount of really quite nice red wine involved for me. Some really good liqueur, actually.
Amaretto is my favorite liqueur. There might have been a few gin and tonics under that bridge.
Ah, okay. I don't do the G&T thing, but a coffee with a little bit of actual amaretto.
Anyway, yes. So like I say, we've been distracted a bit.
So we are now finishing up our part where we look at JQ's ability to query data.
And we did a lot of it in installment 157, where we met functions as a concept, and then where we met the select function for filtering down our data with a nice clear Boolean, you know, yes, no, kind of a double equals, less than or equal to, very, very simplistic criteria. criteria.
And in this installment we are expanding that out a bit to more generic criteria and to do that we first had to learn how JQ does typing, which is like JSON, which is nice of it.
[5:52] Then we learned how to convert stuff between types and then we learned how to filter stuff by type because there's basically functions to give us only the booleans and so forth.
And then we learned about the wonderful alternate operator which allows you to say if this thing doesn't exist go get that instead because as we have discovered ourselves a few times, by accident frankly a lot of json data is quite i would call it dirty data although i get frowned upon for using that phrase in work apparently people are very precious about their data they don't like it when i call it dirty but it's my data yeah it's so dirty how about messy oh oh it's It's not that they are misinterpreting the word. They're saying it's my data is not dirty.
Yeah, they don't like it anyway. Inconsistent.
Inconsistent, exactly. And so it's actually a very useful operator because basically when there are two possibilities, you can just put one double slash the other.
Now, the fact that they use the double slash symbol, which looks like a comment to my brain.
Let's just leave that aside.
And then why am I not? Oh, I'm my show notes haven't pulled.
So I don't see the chapter marker Alison very kindly put to say this is where we chose to split the show.
But I think that's the point we chose to split the show.
Yeah, I think so.
Splitting the Show and Advanced Searching
[7:11] I was hoping you were going to tell me where we were.
Yeah, we probably should have looked this up ahead of time. Yeah, I'm more advanced searching.
I'm almost certain. No, in fact, I am certain we drew the line that more advanced searching. Yes.
Because that was a really logical place to stop. Because it is.
So we have used the select function to apply our very specific Boolean yes, no criteria for filtering our inputs.
Right. So the select function, as a reminder, it processes whatever was piped into it and it applies some logic.
And if the logic results in true, the entire thing piped into it comes through utterly unchanged.
And if it evaluates to false, the entire thing vanishes.
[7:56] So by entire thing, if you're whatever you've selected each each time it finds one that selects that that is true, it squirts through.
Not not it stops the whole thing if one is false.
Right, but the way it's working is in parallel, right?
So you need to mentally imagine that it's doing the same thing over and over and over and over again.
So for each one time it does its magic, it is a yes-no.
It's a gate. Either you come, you pass, or you shall not.
[8:23] You said the whole thing, but it's each one, its whole thing, not all.
If there's 10 coming in in parallel, then if each one gets its own evaluated, if it's true, it squirts through. If it's false, it gets thrown away.
Exactly. And the effect is that it behaves like a filter or a screen, I guess, where the big rocks get screened out and the little rocks go through.
In this case, the true rocks get through and everything else gets filtered out.
And so you will end up with, at most, the same amount of outputs as inputs, but usually fewer outputs than inputs because you're trying to reduce your data down to the bit you're interested in, right?
And so select does that based on Boolean logic, which isn't a bad way to do things.
But there are other, the select function may want to do more than just double equals or less than or equal to or greater than or equal to, which are the things we learned about last time. What if it wants to apply more advanced concepts that we would want to say in English?
And so JQ provides us a whole bunch of functions that we can use inside our select to do more powerful things than just is the surname equal to Gez.
[9:30] Okay. And so that's really where I want to go today.
And the culmination of the more advanced is obviously regular expressions, because what is more powerful for searching texts than regular expressions?
Expressions but there is some fun stuff between where we are right now the second and regular expressions which is where we're going to finish the show today and the first one is very common it's the concept of does my input contain something right containment is actually something you very often care about so if we were dealing with i mean data validation is an amazing thing right so there might be a set of valid values for something and you're getting some json in and you're basically asking, is this one value one of the set of allowed values I have over here?
So that's a containment question.
[10:22] So does it contain one of these things that I'm looking for?
Or it's inverse. Is what I was handed, is the set of allowed things a superset of what I was handed?
Or sometimes you want it one way, sometimes you want it the other way.
They're the two types of containment, which is why there's two functions.
They're called contains and inside, and they make my brain hurt.
So I figure lots of examples are the way to go. but one is literally the opposite of the other.
So the question is, is the thing I'm looking for inside what I have, or is what I have, is it a subset or a superset? Oh, right.
Examples. Let's get to examples before I tie myself into knots before I've even started.
So we're going to start with the more, well, the more commonly used of the two, which is contains.
And when you're talking in English that's you you say that a lot and the contains function is very powerful and it works by taking whatever the current thing being processed is what it is going to take as its input and the argument is something that it needs to match against basically so the argument is what what we're looking to match against.
Types Must Match for Proper Operations
[11:43] It's very important that the types match.
So if you're handing the contains function a string, the argument has to be a string as well.
That makes sense. Yeah. Because you can't really do any kind of operations if they're two different kinds of things. Right.
Different types. Yeah, it doesn't make sense any other way, exactly.
And it will throw an error if it's not happy.
Is the number seven in a string? No. Yeah.
Precisely. So you can't even ask it.
Well, you can put the seven as a... No, no, if you make the seven a string, it will happily check if it's inside a string.
No, but I mean, you would never run into a situation where you wanted to check, is the number seven in this thing, and you don't know that it's a string.
You shouldn't. Your data shouldn't surprise you like that. When you're getting to the stage of processing down to a specific question, you do need to have an understanding of what your data is to ask it a question.
It's like if we didn't know that our Nobel Prize is contained in an array called laureates, we'd be in trouble, right? So we know there's an array called laureates that we're searching.
So, yeah, you shouldn't run into a problem where you don't know what your data is because then you have a much bigger problem. You're not ready to write a query.
Okay. So...
Okay, now I need to make sure I say this carefully and clearly.
String Containment: Checking if one string contains another
[13:09] So we can check if one item contains another with the contains function.
Okay, I didn't say that very clearly there.
Okay, well, maybe it is. So this is a point where JQ is very powerful.
The rules are different depending on what it is you're passing as the arguments.
So we're going to start with the simplest case, which is the one that leapt to your mind.
String containment. If the argument and the input are both strings, it applies the following rule.
When the input being processed and the argument are strings, contains will return true if the input string contains the entire argument string contiguously, otherwise it will return false.
[14:00] I'm going to need an example Okay, so let us echo into our jq function as our input I love waffles, Alright, so we're going to echo I love waffles as a string So notice that we have single strings in the terminal sense So echo and then a pair of single strings says Dear terminal, I'm going to give you a terminal string And inside the terminal string we have double quote I love waffles Double quote because that is JSON string now right because what has to arrive to jq is json not terminal stuff right so we piped out the jq and our jq filter is contains open bracket the string waffles close bracket right, So, if the string waffles is entirely contained within the input, I love waffles, it should return true.
And if you pop that into your terminal, lo and behold, I love waffles contains waffles.
So, that reads perfectly rationally. Hard to explain, but it reads.
I love waffles contains waffles. True.
Yes, exactly. It's a filter. Okay. Now, it has to be completely contained.
So if I say, I do enjoy the odd waffle, and I pipe that to the filter, contains waffles.
[15:21] That is not correct because waffles is not entirely contained within I do enjoy the odd waffle because the S is missing.
[15:30] Well, you got to think of that as you might as well have said peanuts.
It's not the same thing. Waffle and waffles are not the same thing.
Waffles are way better than waffle.
At least twice as good, maybe three times. Right.
And the other thing, so the word contiguous is what we computer scientists like, but I don't know if it's sensible to humans.
It means all in one piece. It can't be split up. Yeah.
So if I say, did you say pan space cake, and I check if that contains pancake, that will be false because pan space cake is not contiguous pancake.
Yeah that makes perfect sense okay good why would that match yeah good it shouldn't and it doesn't, right so okay this is the foundation for the more complicated stuff so string containment is the foundation here so the next thing you can pass the contains function is an array and what it will do with the array so let me read my sentence exactly so i don't mess myself up here when the The input and the argument are arrays.
Contains will return true if every element in the argument array is contained in any element of the input array. Otherwise, it will return false.
The order does not matter, and it is not looking for equality.
It is looking for containment.
[17:00] Hmm. So it's recursive, right? So let's work through this with examples.
So we are going to take as our input the array waffles, pancakes, apples.
So we're going to echo the JSON for the array waffles, pancakes, apples to the jq function, our command, which is going to have the filter, contains the array pancakes.
The types have to match. So even though I'm only interested in one thing, pancakes, cakes i have to put it in array because that's how contains insists you work.
[17:35] And that is true or false? It's got to be true or false, but let's look at our rule.
So if every element in the argument is contained in any element of the input, so the argument has one element, pancakes.
So does anything on the input contain pancakes?
Well, yes, the second element in the input contains pancakes.
It is, in fact, exactly the same as pancakes.
So the thing I'm going to have trouble with there is the syntax.
So you've got echo, and you've got the square brackets, waffles, comma, pancakes, comma, apples, all in double quotes. Great.
That looks like an array to me. But then it says jq contains, and in roundy brackets, because we always do roundy brackets with this sort of command, and then it has square bracket pancakes inside that.
[18:26] It's not an array. Pancakes isn't an array. Pancakes is an element in the array.
No. Well, okay, but it has to be an array because it has to be the same type.
Bing, bing, bing, bing, bing.
So if you said jq contains and just had, quote, pancakes, unquote, that would be trying to say, is this string in this array?
And while that might make sense to us, that's not what jq insists on.
It has to have the same type. Precisely.
Precisely. It looks screwy, though. It looks a little weird, but we're going to see in the later examples why it is the way it is.
Array Containment: Checking if an array contains another array
[19:01] So in this case, I'm only interested in one thing.
And so the question is, does any element in the input contain all the elements in the argument? There's only one element in the argument, pancakes.
So yes, we have a match. True.
Now, I made a point of saying contains, not equal to.
So let us use the same input. So the input is waffles, pancakes, apples.
Does it contain the array pancake without the s?
Yes, it does, because there is one element in the input that contains pancake.
[19:40] So you did the opposite example when you did the string containment.
[19:45] You did, I do enjoy the odd waffle, and then did JQ contains waffles.
Had you done it the other way around, it said, right. But if you had done echo, I do enjoy the odd waffles, and then JQ contain string waffle, that should have returned true.
True so that kind of a string containment would be true just like the array containment is in that case yeah and the array containment is true because the string containment is true okay everything in the argument has to be contained in anything in the input yeah so that makes sense i just wanted to make sure that was still making sense in the previous one because we kind of did the opposite one got it okay yeah okay so that was with one argument but the reason you're allowed to have an array is because the contains function will happily check multiples for you oh wow so we can ask whether our input array which i still haven't changed it's still waffles pancake apples does it contain waffles and pancakes so we pass the argument array waffles comma pancakes so now Now, every element in the argument has to be in any element of the input.
So waffles is in the first element.
Pancakes is in the second element. So yeah, we have two corrects. Therefore, true.
[21:13] So those didn't have to be in the same order, right? Could it be pancakes, waffles?
Why, look at our next example. Pancakes, waffles. True. I actually didn't read ahead when I thought of that. Okay. There you go.
QED. Yes. So the order is irrelevant.
The question is, does everything in the argument contain anywhere in the input?
Okay.
And the other thing is, it doesn't matter if, just like the order doesn't matter, it also doesn't matter if there's a gap, which is kind of sensible, right?
So if the array is waffles, pancakes, apples, and I asked, does it contain waffles and apples?
Well, yes, it does. The fact that there's pancakes in between doesn't matter.
It still contains waffles and apples. Just, you know, just to be clear.
That makes sense. Right.
So if I then ask it, does waffles, pancakes, apples contains popcorn?
No, it does not contains popcorn. that is not at all surprising but if i by the way you're saying you're saying waffles pancake apples and then in your examples you're talking about does it contain waffles i'm just in case anybody's hearing that he means waffles pancakes apples every time yes i do okay all right so there's no popcorn in waffles pancakes apples that makes sense it's not contained in it and then the other thing is they all have to be contained so if we take that same input waffles waffles, pancakes, apples, and we ask it for popcorn waffles, the answer is false, because while the waffles are there, the popcorn is not.
[22:42] So it does not contain popcorn waffles. So that is array containment. It is powerful.
Array Containment: Multiple elements must be contained for a match
[22:50] That's a lot of if-else statements in JavaScript and a for loop.
Like that's a lot of faffing about in JavaScript, but that's really quite concise in JQ.
Yeah, you called this dense at first, and I wouldn't refer to it as dense.
I think concise is a better way to describe it. It's unlike your regular expressions.
That's dense. that's just everything just smashed together with no Englishy words in between.
That's true, yeah. This one, yeah, JQ does contain actual words you recognize, which may or may not be your friend, but it does contain words, you're right.
Whereas regular expressions is just symbols.
Someone used to say it looked like a noisy modem. You know, the character has gotten corrupted.
[23:30] So the last type of containment that needs a deep dive is dictionary containment.
So if the input and the argument are both dictionaries, what does it do?
So I'm going to read this verbatim again so I don't tie myself in knots.
So when the input and the argument are dictionaries contains will return true if the input dictionary's value for every key in the argument dictionary contains the value in the argument dictionary and false otherwise.
So there's going to be key value pairs in the argument and we've got to find all of those in the input or we're not happy.
Okay, stop. up yeah yeah no no no because i think let's see when the input and arguments are dictionary, contains will return true if the input dictionary's value for every key in the argument.
[24:24] So the argument is going to contain a dictionary so it's going to have keys it could be far fewer keys than the input so the input could have 500 keys and the argument image could have two keys.
If both of those keys are in the input and if the values are contained within each other, then we're happy.
[24:44] The values are contained within each other. So if my dictionary has the key A, B, C, and I'm saying that it contains A1, well then, if the input dictionaries...
Let me say that more. Let's do your example. Let's do my example, because I actually did work with that. Let's not try to invent one.
Yeah. So in order to do these examples, because to stop the commands becoming impossible, we're going to use a JSON file as our input instead of echoing some stuff.
We're going to use the same JSON file for all of our examples.
Examining the Structure of the JSON File
[25:18] That JSON file contains one top-level dictionary, which contains three keys.
The keys are breakfast, lunch, and dinner.
And each key has a value that is an array. The breakfast array is bacon, eggs, toast, waffles, and pancakes.
The lunch array is sandwiches, rolls, baps, and wraps.
And the dinner array is pizza, pasta, and burgers, which gives you a slight insight into my weekly consumption. food.
And it does actually bring up an interesting question. Do Americans eat baps? Do you know what a bap is?
I've never heard of a bap. Imagine a roll, but you make it a circle.
[25:57] So you take whatever material you like. So you know the way you can have like a roll, a sandwich roll. It could be like a brioche roll, a soft, whereas it could be a crunchy baguette.
Imagine you take the same dough, but instead of making it long, you just make make it a circle and put it in the oven.
And then what was a roll becomes a bap because now it's round. That's it.
[26:17] I think we just call a brioche like is round.
Oh, see, for us, a brioche roll, a brioche is for us called baps or rolls.
A roll is long and thin and a bap is round. So basically it's the shape.
Huh, okay. We apparently really care. We do it every day. We apparently really care what shape our food comes in. I have no idea why. Why?
Anyway. I thought about that a lot. When you look at, you know, enchiladas and tacos and burritos and things, you know, you start going, wait a minute, that's the same ingredients just kind of mixed around.
And you took one thing out and added it over there.
Well, this is why me as a foreigner has terrible trouble with Mexican food, because it's like, OK, I understand it's corn and you bake it, you fry it, you heat it on some sort of a hot surface without having a raising agent.
And then sometimes you call it a taco and then sometimes you call it a burrito.
Yeah. I get very confused. because I'm pretty sure I call burritos tacos.
I'm almost certain I do. But they're delicious, so who cares?
Anyway, we have our breakfast. Or sorry, we have our dictionary.
So let us look at how contains behaves. So the first thing I'm going to ask is a very simple question.
Does our menu.json contain the dictionary breakfast colon array bacon? Okay.
Syntax of contains function and dictionary argument explained
[27:36] Okay, so describe the syntax. So the syntax, so contains takes as an argument a dictionary.
So I'm giving it a dictionary with the key breakfast, one key, breakfast, and the value is the array bacon.
And so the question is, will that return true or false when I apply it to menu.json, which is our dictionary with breakfast, lunch, and dinner?
And breakfast does contain one element in the array that's bacon, so that should be true.
It should. So the rule says if every key in the argument dictionary, so how many keys are there in the argument dictionary? One key, breakfast.
[28:16] Yes. So then we have to see if whatever the value of breakfast is, is that contained in the value of breakfast that exists in the input?
So we are now looking for an array containment. Does the array bacon, eggs, toast, waffles, pancakes contain the array bacon? taken.
Yes, it does. Therefore, we can finally say true.
I feel like you're saying something that I should really pay attention to, and I can tell it's one of those things that's just slipping right through.
I don't understand why we keep focusing on the key.
Obviously, if there's no key called breakfast, you've already failed.
So why does it have to be, it's like it has to have the key and that key has to have the value that you can find in the array of the input okay you're no you're not missing anything but that is a very succinct way of saying what the rule says yeah okay so if the key is missing completely it will indeed fail so you're already out yeah you're already there now the rule is containment right so we can also say does does our menu.json contain the dictionary breakfast colon bacon waffle, Well, the answer is still true, because...
[29:33] The breakfast in the menu.json contains bacon exactly as is, but it also contains waffles, but the rule is containment.
So waffle, waffles, oh, we have containment. We still get true.
[29:46] Okay. Okay. Right? Gotcha. So the rules for dictionary containment encompass the rules for array containment, encompass the rules for string containment, right? That's why I did them in this order.
Yeah. Okay. So now let's start to break some stuff. So if one of the keys in our argument is missing, it's just, no, you're not allowed, right?
So the first thing we can say is, sorry, if one of the values in our input is missing.
So if we say breakfast bacon waffle popcorn, there is no popcorn in menu.json.
So we're immediately out.
And the other way it can go wrong is if we are looking for something in the argument that doesn't exist at all, in the input so the final example here is contains the dictionary breakfast colon bacon comma dessert colon cake so the argument now has a key dessert the input has no such key, fail and that's what you said earlier right if the if the key i'm looking for does not exist it should just fail yes it does okay so the first example you gave was you asked for or an element of the RAID to be popcorn for the key breakfast.
Failure when looking for missing value in input
[31:00] Now, while I personally have had popcorn for breakfast before, it's not on the menu.
[31:07] I'm really getting hungry, and I've got some popcorn waiting for me right now.
Okay, so that's why the first one fails.
And then the second one is you gave it breakfast colon bacon.
Sure, we got that one. Dessert colon cake. And even though you spelled it desert, I'm going to fix that.
Well, it doesn't matter. or either way, it ain't in the input or in the menu.
Yeah, which is a miss on the menu, that's for sure.
[31:30] Okay. For double reasons, yeah. Gotcha. Okay. So that's the case of where the key's just not even there. Fail.
Exactly. And then the final thing we have is default containment.
So we know what to do for strings, we know what to do for arrays, and we know what to do for dictionaries.
But I said if the type is the same, there won't be an error.
For all other types, it defaults to equality.
So if you give it two numbers, it will check if they're equal to each other.
Now, that may cause a little bit of confusion because if you give it the number 420 and ask if it contains the number 42, it will be false because number, number, they're not the same.
If you ask it if the number 42 contains the number 42, you will get true.
If you ask it if the Boolean false contains the Boolean false, you will get true because it falls back to an equality check.
And as perverse as it sounds, this actually works because I checked.
If you ask it if the null value contains the null value, the answer is true, because null does equal to null.
[32:39] Let me ask you a dumb question. So you had echo 420, jq contains 42, it's false.
What if it was echo 342 contains 42?
It'd still be wrong? It'd still be wrong. Yeah, because as soon as it's not a string, an array, or a dictionary, we fall back to equality.
Okay. But you just wrote, you wrote a quality check on the, the does false contain false and null contain null.
You wrote a quality check, but they're all equality checks. That is true.
[33:12] Yeah, I guess I was just sort of shorthanding it because in the first one I used numbers twice, but I didn't do a true and I didn't do anything else.
Okay. But I'm just saying, if we change that to, say, 342, it's still false.
Right. But for the equality reason.
Yes, precisely. Precisely. Okay.
Explanation of inside function as inverse version of contains function
[33:31] Now, everything we have learned now, if we swap the input and the argument, that is what inside does. does.
Instead of the big one being the input and the small one being the argument, the small one is the input and the big one is the argument.
That is literally the difference between inside and contains, is that you apply the same rules, but you swap where the big one and the little one go.
The entire documentation for this function in the official documents are essentially an an inversed version of contains.
That is all I had to go on when writing these show notes. Essentially, an inversed version of contains.
And the only way my brain worked is to just mentally swap the input and the argument, and then it works perfectly.
[34:21] So the subset is the input, and the superset is the argument.
So if we echo the string waffles to JQ, and we give it the filter inside, I like waffles, we will get true because the input is inside the thing in our argument.
[34:42] Waffles is inside I like waffles. It does read well.
Thankfully. So is waffles inside I like waffles? Yes, it is.
Yeah. Okay. And the arrays work the same way as well.
So we can say the array with the string waffles is that inside the array waffles, pancakes, popcorn.
Yes, it is. It also works exactly the same for objects.
So, sorry, dictionaries. Does the dictionary breakfast colon pancakes, is that inside breakfast pancakes muesli?
Yeah, oh, and snacks, popcorn, waffles. The answer is true, because it is indeed inside, right?
The simple object breakfast pancakes is inside the more complicated object breakfast and snacks.
Yada, yada, yada. Whew.
And it falls back to equality, just like the other one. So 42 is inside 42.
False is inside false. And null is inside null. And that makes no sense.
But null is a weird value. Because I brought it up, I put into the show notes my example of 342 contains 42 is false.
I'm going to add it to this one too to say 42 inside 420 would be false, right? It would, yeah, because they're not equal. Yeah.
[36:02] Okay. Right. So that is containment and in both of its ways.
So contains and inside are containment, depending on whether you want the big one on the inside or the outside.
And that is very powerful.
Introduction to regular expression usage with test function in JQ
[36:16] But the obvious king of the castle is regular expressions.
And for us, that boils down to the test function in JQ.
So the test function takes as its input a string and its argument is a regular expression.
And if the string passes the regular expression, then the entire string goes through. Sorry, then we get a true.
And if the string fails the regular expression, we get a false.
So it gives us a true or a false.
[36:47] Now, this is where we need to have a little discussion first about how JQ does regular expressions.
And I have good news and I have bad news, and I'm going to give you the bad news first so that I can give you the good news last.
So one of the things I adore about JavaScript is that it has a primitive data type for regular expressions.
If you want to write a regular expression, you put a forward slash, the regular expression, and then a closing forward slash, and that is a regular expression.
Just like a number is digits, and a string is something that starts with a quote and ends with a quote.
So JavaScript a script can deal with regular expressions, they call it as a native type.
That is not true for JQ.
It's also not true for PHP and lots of other languages, which means you have to write the regular expression as a string, which is okay a lot of the time.
But inside a string, the backslash character has a meaning. It says, I am an escape character.
Inside a regular expression, the the backslash character has a meaning.
So when you need to use a backslash in your regular expression.
[38:01] You need to double backslash everything. Because the string will see the first backslash and go, oh, you're escaping something.
And it will deal with that escape. And then what's left as a regular expression is missing the backslash because it's just been taken away by the string processing.
So if you want to be left with backslash n, you need to have backslash backslash n.
Avoiding Backslashes and Double Escaping
[38:26] See where we're going here? Yeah, yeah. Okay. And I hate languages that make me do that.
And I'm sorry to say that JQ falls into that category. So I tend to avoid backslashes in my regular expressions.
And you can often sneak around them. Not always, but often.
So I will go out of my way to avoid backslashes. Instead of saying slash D.
You have a hope of reading it later?
Right. So instead of saying slash D for digit, I say open square bracket, zero to nine, close square bracket.
Because the character class zero to nine is a digit, right? It saves me the double escaping.
So I do all those kind of little tricks because I hate double escaping because I always get it wrong. How do you do it in line?
I have to double escape, unfortunately. I haven't found a trick for that one. All right. I know.
Like I say, you can't always avoid it, but I minimize them because I hate them.
Because they break my brain so bad.
So that's the bad news. The good news is that the syntax that JQ uses is Prel compatible regular expression, a.k.a.
PCRE, a.k.a. the way JavaScript does it, i.e. the way we're used to.
So that is a nice bonus. At least the syntax is as we expect, even if you have to double escape some things.
[39:43] So now that we know how that works, let's go have a closer look at this test function.
So the test function always uses the input to the test function must be a string, and it must have at least one argument which is your regular expression as a string but inside in in Perl compatible regular expressions you can have these things called flags so in JavaScript we used to put the little flag after the last slash so if we wanted case insensitive we'd have slash a regular expression slash i so the i is actually officially called a flag flag.
There are no slashes around things here. So the way we give a flag is by passing a second argument that is all the flags, which is usually just I, to be perfectly honest.
But there exists more if you want to go read the PCOE documentation.
But honestly, it's just I for most of the time.
So if you want to be case insensitive, you pass a second argument that is I.
[40:40] And that's all there is to it. Sure. So let us do some examples.
So if we want to match an IP address, as an example, we first need to build a regular expression, which is going to be the argument that we pass to the test function.
And an IP address can be, it's not a perfect representation of an IP address, but it's a decent one.
It is 0 to 9, 1 to 3 times, followed by, inside brackets, 3 times, a period followed by 0 to 9, 1 to 3 times.
[41:16] Okay that that is that is an accurate representation of 100 believe you it will allow nonsense ip addresses like 999.9999.9999 but nonetheless it's it's one to three digits followed by a period followed by one to three digits followed by a period followed by one to three digits followed by a period followed by one to three digits so it's not too bad okay Okay.
Bart teaches regular expressions again.
[41:39] So anyway, we'll take that as given because this is not, you know, Bart teaches regular expressions again.
Which I would happily do because I love him. But anyway.
So if we echo the string, so again, we're doing that thing we did before, echo, single quote, and then inside it, we have the JSON string.
So double quote, 37.139.7.12.
And we pipe that to JQ and we give it the test function with that lovely big regular expression as a string.
We will get true because that is indeed an IP address.
[42:13] So, if we want to match for words starting with the W, we have the much simpler, echo the string waffles to jq, where we have test, and then our string is carrot or hat symbol W, because that is the regular expression starts with W, and that will give us true.
That one I follow. Yeah, nice and easy. I probably should have done those in the opposite order, shouldn't I?
I'm pretty good at starts with.
My ability to do a regular expression goes down the toilet after that.
It starts with an end with. They're very powerful. If you just know those two, you can do a lot.
If we do waffles with a capital W and we run that to test hat W, starts with W, we get false because they are not the same case as each other.
Waffles with a capital W does not start with a lowercase w. But if we want to be case insensitive, we can repeat those same two commands.
But this time we give a second argument, which is the I flag for case insensitive.
And then we get two trues, because whether we have the waffles uppercase or lowercase, they will match against starts with W.
OK, and we used our semicolon to say we're going to give you two arguments here.
Yeah, as confusing as that is to us. And like you and I were saying offline earlier, that is going to confuse us all forever. But, you know, if we keep calling it out, we might remember next week, maybe.
[43:39] So let us let us take our wonderful new knowledge. I know.
Let's take our wonderful new knowledge and let's have another visit to our big Jason file with Nobel Prizes. And let's see if we can't ask ourselves a somewhat arbitrary question that I made up.
How how many Nobel Prize winners have surnames that start with a vowel?
It seems like a very arbitrary thing to want to know, but hey, let's figure it out, right?
[44:07] So, starts with a vowel. Okay, well, the character class A-E-I-O-U will match any vowel.
And if we want to be lazy and not have to double that up to include uppercase A-E-I-O-U, we'll just use the I flag to say case insensitive.
We know that the caret symbol means starts with. So when we put all that together, our regular expression is just starts with square bracket A-E-I-O-U, close square brackets. So that's not too bad.
So how do we go from here? Well, we start off with our dot prizes, open square bracket, close square bracket to explode the top level prizes key into each of the dictionaries representing each prize.
[44:48] Then we go in there. We then take the laureates array, which may or may not exist.
So we shove a question mark in the end of it. And we explode that.
So two square brackets again. So now we're left with all. Now, our parallel lines have just been split again.
So we now have a lot of dictionaries representing each laureate in each prize.
And in there, we're going to use the select function to filter, to screen or filter those down to just the ones that meet our rule. So what is our rule?
Well, inside the select, we say dot surname, pipe, test, a regular expression, semicolon, I.
So the input to test is going to be dot surname.
And that is going to give us a true or a false okay and then we pipe so at that stage we have a dictionary left of only the laureates which have a vowel in their surname and we would like to print out both their first name and their surname so we just pipe that one last time to dot first name comma dot surname and that one is too fun right well it's too fun but but but but but we immediately hit an error.
Null, null cannot be matched as it is not a string.
[46:02] Now I've put this in the show notes exactly as it happened to me, because this is an example of how you find dirty data.
So I have assumed. Let me, let me, let me stop you real quick there.
Bart and I were doing some buddy programming earlier and I looked at, I got the result I wanted, even though I had this null error and it was, it was looking for Andrea.
And I said, why do I care that I got that null error because it got me what I wanted. And he says, that's only because it got to Andrea before it hit the null error.
Yeah. So you don't get to be like, I got what I wanted. I'm fine.
You don't know that you missed something that was after that null error.
Yeah. You got at least some of what you wanted, but you don't know what you don't know.
Yeah. Right, right, right. Okay. So we ask for the first name and surname for people whose name starts with A-E-I-O-U.
And we get Pierre Agostini and Annie Ernox, but then we get the null error cannot be matched. as it is not a string.
So this then immediately made me go, huh?
And I already know that laureates may or may not be there. So it already has a question mark. So then it's like, wait a minute.
Are there laureates without surnames?
Is that conceivable that there are laureates without surnames?
So I went, well, let us answer that question. So I wrote some JQ to answer that question for myself.
And so the JQ I wrote to answer that question was dot prizes, open square bracket, close square bracket.
Using SELECT statement with dictionaries as input
[47:29] So we now have all the dictionaries for all the laureates and all the prizes as the input to our SELECT statement.
.surname double equals null is what I put into my SELECT.
And lo and behold, I got a whole bunch of entries and there's an example in the show notes.
It says ID... Why did you use double equals? Why didn't you do contains null?
I could have done contains, though, but double equals is like, I just want to know, are they null?
And the answer is... I was just excited to know we could use that. Yeah.
So we end up with a whole bunch of entries like the one I have on the show notes, which are ID818, motivation for their efforts to build up and disseminate greater knowledge about man-made climate change and to lay the foundations for the measures that are needed to counteract such change.
Share, two. So whoever this is shared it.
And it says, first name, Intergovernmental Panel on Climate Change.
End of dictionary. First name.
[48:32] Who gives a prize to an organization and then uses the first name field?
If I was doing that on an Excel sheet, I would either have a separate column say organization name, especially given that we're using JSON as our data store, or I'd use surname.
First name, intergovernmental panel on climate change.
The only way they could have made this dumber is to say first name, intergovernmental panel, surname on climate change.
That would have been even dumber but other than that i don't see how they could have made it any messier so okay there really are prizes with no surname and this gave me a fantastic opportunity to use the alternate operator because well hang on a second if there is no surname i still want to do some searching i just want to search on something else i want to search the first name.
[49:23] Right because there is no surname so if i would have thought you'd want to go i just want to see the the ones that are real people and they would have a first name and a surname i would be you could have gone that way but that wouldn't give me that wouldn't give me the opportunity to make use of the alternate operator so my brain immediately went to well the intra-governmental panel on climate change conveniently starts with a vowel so i really should actually have it popping out as a winner that starts with a vowel. Am I thinking?
So if we rewrite our thingy, our thingy, our jq command exactly as we did before, but in the select, so we were saying select dot surname pipe test.
Well we want to pipe the surname or the first name to test and the alternate operator just does it for us.
Prettier output using the alternate operator for null values
[50:11] Dot surname slash slash dot first name pipe test.
[50:16] Hmm, okay. So if we have a surname, it goes.
Well, we saw that in part A, which for us is the other side of Christmas. Therefore, ages ago.
So that allows us to easily check our vowels against the surname or the first name.
But I've also made use of the alternate operator a second time to give us prettier output.
So I am outputting the first name and the surname, but if there is no surname, that would give me null.
So it would say International Panel on Climate Change, null.
That didn't look very nice to me. So I decided to use the alternate operator to output the string none with a star each side of it.
To highlight how silly it is that these things have no surname.
So you have your last pipe goes to in parentheses dot first name comma dot surname or star none star.
You could have written waffles. I could have written waffles or the empty string I could have written as well, which is actually the prettiest output.
Put, but to make the point, I've made it none!
Like, you know, with a star around it. Look at how silly this is. Hang on.
Hang on. Hang on.
Unexpected Results and Copy-Pasting Mishap
[51:24] I'm getting a very unexpected set of results when I copy and paste that command.
Let me make sure I actually copied and pasted it.
Okay, copy. Normally I have a terminal open too, but I don't know if I do today.
I do not have a terminal open. No, I hadn't copied, so when I hit it, it just did the previous one.
Ah, okay, good. So I see Institute of International Law, none.
Savant Arhanius.
Yeah, this is a terrible pronunciation challenge. So you know what else this does, though?
This does now bring in a bunch of people that wouldn't have been in your original intention.
Because it's bringing in someone named Eichmann Sigrid, who wouldn't have been chosen in the first one.
Because his first name starts with a vowel but his last name does it his surname does not i'm surprised he came through because he has a surname.
Multiple Names Confusion
[52:22] Huh well unless i'm uh francis w aston albert einstein, albert is a perfect example albert einstein well but i said because of the e yeah tobias asser william also let me make sure i'm reading the rudolph yukin glass pontus oh he's got two two first names.
Um, he's got two first names.
[52:46] Klaas Pontus, two words. And then Arnaldsen, Paul Ehrlich.
Okay, I may be reading the, I'm sorry, it's really hard to read this because it's just name, name, name, name, name, name, name.
It's hard to see what it is. I think that I did get those out of order.
Okay, good. Because I was pretty sure I had it right because I did exactly what you just did when I was doing this first.
And I had a panic attack that regular expressions didn't work like I thought they did.
And then I checked and they do work like I thought they did.
And this is actually, this is a good example of why I can't show you the pretty way to output this yet, because I haven't told you how to transform JSON.
Because what I would really want to do is actually to create a new string that contains the first name and the surname one after the other as a single output.
That's what I would want to do. But that's transforming JSON.
And that's the perfect setup to where we're going in episode 159, because that is the final piece of our jigsaw puzzle.
So we now know how to pre-print. We now know how to search. So transforming is the last use case. And that's where we're going to next.
Now, you have two out of three challenges done. So I'm going to give you one extra challenge.
So then you have two last time and two this time. So it all works out perfectly.
So as an extra challenge, I would like you to find all the laureates awarded their prize for something to do with quantum physics.
[54:10] I.e. I want the first name, surname and motivation for each winner where the motivation contains a word that starts with quantum.
In any case. Starts with quantum.
Contains a word that starts with quantum. So the motivation could say a whole bunch of highfalutin nonsense followed by quantum physics.
Then it will contain a word that starts with quantum.
Right? Okay. So it's word. So I have two hints to give you.
The first is PCRE has a special symbol for saying this is a word boundary.
So starts with is the start of the string. There is an equivalent for start of a word.
So you need to remind yourself or Google the word boundary symbol in PCRE.
And you need to be very aware of the need maybe to double escape things because maybe the symbol for a word boundary has a slash in it.
[55:08] That's just mean, Bart. It is just mean, yes. So basically you are going to need to use the slashes because the weight of check for the start of a word does require a slash.
Okay, all I need to do is have New Year's now.
Yeah, it doesn't help really, does it? And I think I'm going to be at CES for a week before I'm going to be back to this. So I bet I'll be perfectly successful.
Yeah, because while you're at CES, you're going to have nothing better to do in your hotel room at night than do some JSON querying with JQ, right?
You're definitely not going to be going to parties. And there will be no gin and tonics there.
Vegas Memories and Craving for a Proper Old-Fashioned
[55:45] Isn't it in Vegas? It is in Las Vegas.
Yeah, okay. And when we get off the air, I'll tell you about our, the last time we were in Vegas with our friends for CES, and I got to know the old-fashioned girl.
Oh, as in the cocktail old-fashioned? Yes.
I would love to get a really good old-fashioned because I have had old-fashioned, but I don't think they're good.
I have a feeling I'm missing out on like a properly made true old-fashioned with good bitters.
[56:19] There you go. Right, anyway, sorry, I'm distracted completely now.
So we have our optional challenge. And yeah, so the final piece of the puzzle here here is to build new JSON based on existing JSON.
And that is generally the final step in our pipeline, right?
So you have lots and lots of input.
You find the bit you want. So you're narrowing down, narrowing down, narrowing down to get the piece of information you want.
And the final step is you actually want to present that information in the way you want, which means you need to construct new JSON, which is transforming JSON.
So that is the final piece of this puzzle here and that allows us that allows us to put our code in between things because you can get an api that fetches weather data in all the wrong formats and is all dirty and ick and you just want a nice clean piece of data that you're going to use inside a terminal command or for something else then you need to build a nice new json that contains only the keys you want none of the other garbage no dirty data everything perfect the universe is great and that is that is where we are going in two three-ish weeks or whenever so uh yeah there we are that sounds fun.
[57:29] Well, very good. I like this episode. Most of this made sense.
As soon as you start saying regular expressions, I lose my mind.
But other than that, I follow along with you perfectly. If all you ever do is starts with, you will still be able to achieve a lot.
Or if all you ever do is go to Stack Overflow and say, I want a regular expression for a valid email address and copy and paste, you're still flying.
Actually, that might be a really good use of ChatGPT. Hmm.
[57:56] I bet it would be just fine. Generate me a regular expression.
Yeah, it should be able to learn those patterns.
Ooh, that's a pattern matching machine learning how to make patterns. Ooh, ooh.
I like it. Well, when I talk to you next, Bart, I will talk to you next. Indeed.
Oh, that's circular logic too. Right, anyway, whenever that is, until then, happy computing.
If you learn as much from Bart each week as I do, I'd like you to go over to let's-talk.ie and press one of the buttons over there to help support him he does 98% of the work here I'm just the stooge that listens to him and asks the dumb questions if you go over to let's-talk.ie you can support him on Patreon you can donate via PayPal or you can use one of his referral links I really hope you'll go over and help him out in the meantime you can contact me at Podfeet or check out all of the shows we do over there over at podfeed.com.
Thanks for listening and stay supportive.
[58:57] Music.