CCATP_2024_05_25

An audio podcast where Bart Busschots is teaching the audience to program. Associated tutorial shownotes are available at https://pbs.bartificer.net.

2021, Allison Sheridan
Chit Chat Across the Pond

Transcript

[0:07]
Well, it's that time of the week again. It's time for Chit Chat Across the Palm. This is episode number 794 for May 25th, 2024. And I'm your host, Alison Sheridan. This week, our guest is Bart Bouchat's back with Programming by Stealth 166 of X. We're closing in on the end game on JQ, aren't we, Bart? We really are, depending on how you count it. I think I make a joke in the final sort of final thought section that if this was a book, we're actually arriving at the bit that says the end. But like the Lord of the Rings, there's another bit that says epilogue. Which will be next time but really this we're really rounding it out today this this is the end of the mainstream things i think most people will need and then the next episode is things i know some people will need no one's going to need at all and i don't everyone's going to need different bits so i'm sort of thinking of it like a tasting menu where you get like 12 courses, and we don't go into any of them in too much detail but you'll like some of them I don't know which ones, alright well that sounds fun so basically the bit I've been leaving to the end is the bit where you shorten your code in such a way that it becomes more dense and therefore it would be really hard to explain early in the series but when you know what it's compressing you won't have a problem with it and you'll in fact greatly appreciate the fact that we're compressing our code and so you'll end up being able to write the same logic.
[1:35]
Easier and i've front loaded the most difficult concept at the start of the show notes because there's one of these that is well when the penny drops you'll be fine but it's one of those pennies that may get a bit stuck because it's it's obvious in the same way linux is obvious in hindsight.
[1:56]
Okay so anyway basically we have been exploding arrays all the time right that has been at this stage it's almost become a joke where i say right you explode the array and you catch the pieces by putting it inside square brackets you explode the array we catch the pieces or we start off with a dictionary we turn it into entries we mess with it and we then turn it back into a dictionary so from entries to entries from entries to entries from entries to entries right that's sort of been what we've been doing for the last quite a few installments, and we don't need to do that we can actually operate on the array without taking it apart We can manipulate it in one piece and we can manipulate the content of dictionaries while they're still assembled. So we can just do one piece and say, yeah, I want this changed. Just do it to everything.
[2:45]
Um, but we will start with a function that is designed to, the word I use is digest. Basically give you an answer based on every element in the array and it'll be one answer. So take the array and turn it into a single thing. And there's a function for doing that in one step as well. That one's kind of hard to explain in the abstract. So I have lots of examples there. So hopefully that will help with that one. But of course, I set you a challenge at the end of the previous installment. So we are still having a lot of fun with our Have I Been Pwned data set. Early in the series, I was very much enjoying my Nobel Prizes. But at this stage, I think I know everyone who's ever won.
[3:28]
And I got a bit tired of that data set. Whereas with my real life hat on, I have unfortunately needed to become very familiar with the Have I Been Pwned data set. Because these data breaches, they won't stop. stop it just they keep happening annoying things so actually needing to figure this stuff out is a genuine problem for me so i've been solving it anyway so why not share so with that in mind we had a jq function for searching a an export from a breached domain to look for every one of our users who was caught up in the breach du jour and that search was a little naive and it was.
[4:09]
Intentionally a little naive because I didn't want to put too much on people at once so the challenge was then to make it less naive and instead of simply telling us which person had a match tell us some useful information about those matches so what actual person was in what breach with what title and what was in the breach what was the breached data so instead of just getting back when you search for link just getting back you know bob alice and john but you know okay well there's probably 10 breaches that match the word link so which one two three or four of those was bob involved in and which ones were alice involved in and which ones were tom involved in right so this time we're actually going to know who what and where which user what was breached where it was breached so that's that's the aim of the exercise here and for bonus credit the the final piece was to not only search on the name of the breach, because the name is actually, it's sort of a unique ID. It's what is under the hood uniquely identifies it. But there are never any special characters in the name.
[5:20]
They never have spaces. They never have dots. They never have accented characters. But when you look on the website, they do have dots and spaces and accented characters and stuff. And so sometimes if you start on the website and try to paste it into the breach report, it doesn't match. I had this problem for real with a recent preach which had a period in it, And I was like, what? Oh, that's mean. You sent me an email saying 20 of our people were involved in this. And now when I search, it finds zero. What?
[5:48]
But it was because the title had the period, but the name didn't. So the bonus extra is to search both the name and the title. And if either match, we're happy. So basically, we need to add in an or clause for our bonus credit.
[6:02]
And of course, we got to practice our data enrichment. So just to remind everyone that when you sign up for domain notifications, you get to download the JSON file that literally just puts, it's a lookup table of the username part of email addresses mapping to an array of strings. And those strings are just the names of the breaches. So it doesn't tell you anything about the breach. It just says, you know, Bob, start an array, LinkedIn, Dropbox, whatever, right? Close the array and nothing more. but you can download even without signing up there's actually some of the api endpoints are free and one of the free endpoints is tell me about all the breaches you know and you get a giant big json file which goes into great detail for every one of those breaches and it's a dictionary indexed by that same name that's in those arrays and it tells you what date the breach was on and how many people were caught up in the breach and what was breached and lots and lots of information about each breach. And so we want to use that JSON file to enrich the results in our domain breach file so that we get a richer answer, which we do with the minus minus slurp file, command line flag we learned about two installments ago, I think, the stage 164, I think. At least. So, yeah. So that is our challenge.
[7:24]
And so as per my wonderfully boring naming convention, mention, you will find the sample solution as pbs165-challenge-solution-basic.jq in the installment zip file. And for the most part, it's extremely similar to the challenge solution to the previous time, which is, I did actually say, use that as a starting point. And before we look at the.
[7:51]
How it works. I want to show that it works, but also just to remind us the shape of the data we're working with. So the export, as I said, is a dictionary named breaches, which is a lookup table linking email address usernames to arrays of breach names. And so I'm going to be really focusing on this poor E. Green person who was caught up in the Dropbox breach and this M. W. Kelly person who was caught up in Dropbox, LinkedIn, the LinkedIn scrape, PDL, and something called KOMO, which I find hilarious to say, even though I have no idea what it is. I also find it funny that you don't call it KOMO since it's M-O-E. That's a very fair point, but KOMO just sounds way better. It's funnier. There we go.
[8:35]
And then also just that big data dump, it contains, you know, name, title and data classes is what we really care about, which is an array of what was breached. So for Dropbox, that was email addresses and passwords. So, if we run our challenge solution, and we say, you know, minus minus snurp file, the big list of breaches, we say minus minus args, what we're going to search for, and if we search for LinkedIn, we get back one answer, that MW Kelly was caught in LinkedIn, which contained email addresses and passwords. Okay. Now, if we look up, MW Kelly was in LinkedIn and LinkedIn Scrape. So, why did we only get one answer back? It's because LinkedIn Scrape did not contain passwords and we were only interested in breaches with passwords. So the script is working as we had hoped. So let's run it again. And this time, let's do a way more generic search. Let's look for any breach with the letter O. And you might think that would return. Yeah, only on our data set, it only returns two. It returns MW Kelly. Well, OK, so it only returns two. I'm going to tell you about. I did truncate the output, actually. M.W. Kelly was caught up in Dropbox and K.O. Moe, as we're going to properly call it this time. Oh, now I wrecked it. You were all happy. Yes.
[10:04]
And what was I going to say about it? So we found K.O. Moe by searching for the letter O. But if I had actually searched for what it looked like on the website, site ko.moe, we would not have found it, which is where the bonus solution comes in, and that we'll find it later. Okay, so now that we've seen that it works.
[10:29]
How does it work? What do I want to draw your attention to? Well, really, the interesting part of the solution is what we do as we filter those entries in the breaches lookup table down to just the ones we're interested in. So that's the bit I'm going to focus on in the snippets of code. And so the first thing we do is we explode that list of entries. So we're going to have our list of entries. entries and because we're going to go ahead and explode all of the breaches for that person we need to keep track of who it is whose breaches we're about to explode. So the very first thing I'm going to do is I'm going to save the account name. Now we have converted breaches into a list of entries which means that the key is now the username and the value is now the list of breaches So if we want to save the username, we say .key as $accountName. So that's us saving the account name in a variable called $accountName. And the reason we didn't do that before was we weren't going through the entire list of all the breaches? Well, if we did, we were never able to connect it back to a person because we hadn't learned about variables. Tables. Right. But the end of the last one, we ended up with .key was what we piped out, which gave us the names.
[11:55]
But only the name, and it didn't tie the name to a specific breach. So its connection back to the breach was lost. Got you. Okay. Yeah. Or we didn't even try. We never exploded the list of breaches to give us more information. Yeah, this person was involved in one of those breaches we mentioned. Therefore, we'll tell you about this person. But now we're going to go deeper. I'm going to explode right into the individual breaches. And so we need to take with us that username. So then, then as I, you know, signaled, we're going to explode that list of breaches. So dot value, open square bracket, close square bracket. So now that we've exploded the list of breaches, the thing in dot is the name of a breach. So Dropbox, ko.moo, whatever. Because it's the value at that point. Yes. Cause it's the value was an array. We've now exploded the array. So what's left now in dot is individual breach names. So we now need to do our searching. So we can easily search.
[12:53]
Against the name of the breach which is sitting in dot by simply saying ascii down case pipe that to contains our search string piped to ascii down case so the input to ascii down case is dot we pipe that to contains which is now the lowercase version of the breach name check that against our search which we're also passing to ascii down case so that's checking the name right that doesn't tell us whether or not there was a password so now we need to do the data enrichment dance to figure out whether or not the breach that we've just found contains passwords so we say and, and then our breach details is the file we slurped in and it always wraps everything you slurp in in an array even if that's an array of length one so that's why it's breach details open square bracket, zero, close square bracket. So that's just getting us into the data structure. Okay. Now, inside that data structure, it's a lookup table indexed by the name of the breach. The name of the breach is sitting in dot. So to dive into the right entry in that giant big lookup table, we say open square bracket dot close square bracket to go into Dropbox or LinkedIn or whatever the current breaches we're processing.
[14:16]
Right. And what we care about is the data classes. So we've gone and grabbed the relevant data classes for our breach. We piped that to contains and we're looking for passwords. So if that comes out as true, there were passwords and the name has already matched and we've said and so that will select when both of those things are true.
[14:38]
And what's finally been piped through is still going to be the ASCII downcase result of the search string to the name? Yes. Okay. Well, it's not going to be downcased. It's going to be whatever came in because select returns whatever it got unchanged. So we've downcased it to do its comparison, but select spits it out unchanged. Select never changes anything. Select is a everything or nothing, everything or nothing. So it comes back out exactly how it came in. Okay, so it's just doing the matches, but now that it's got it, now it goes back and says the thing that it started with. Okay. Yes, precisely. So it's a filter, right? It's filtering out everything that doesn't meet the criteria.
[15:15]
So what that ends up doing is giving us our nice... We now have sitting just the name of a breach, just Dropbox, and we still have the name of the person in the variable. So to build our output, we just use dictionary construction, open a curly bracket account name colon dollar account name because we saved it in a variable yay, breach name well that's still sitting in dot so breach name colon dot another easy one the breach title and the breached classes we have to do the data enrichment dance again so it's exactly the same breach details open square bracket zero close square bracket dive into the appropriate breach open square bracket dot close square bracket pull out the title and then pull out the data classes and that's it okay yeah, That's, it's so clean when you do it, Bart. Build it up slowly and spend hours at it and then write some short code that looks elegant. It's the spend hours on it bit. I had a little trouble figuring out how to, how to start, uh, how to start small with it. So, but I see what you did. Yeah.
[16:23]
So for the bonus challenge, then it was just to search on the title as well as the name. And the reason is for stuff like KOMO, where if you just search for KO.MO like it shows up on the website, it won't be found until you use a bonus solution, then it will be found. Now, there's actually very, very little needed to turn the basic solution into the bonus solution. It's literally adding in an OR statement.
[16:46]
Now, lots of brackets because we need to check the name or the title and all of that gets put into brackets and it contains passwords. Roots so that's what is so much nesting not finding that was the problem with it not finding that that one uh breach that he was in was because it had uh the the name had the what i forget it had a dot period but a period but the but the title did not so had you searched both you would have found it precisely precisely and so literally it is just a matter of a just a matter of adding in an or but because we're grouping it together with an and we have to put brackets around it all so it looks a lot longer but it really is just open an extra bracket add in the or clause and close the bracket so what's in the show notes is the basic solution followed by the bonus solution and you'll see the difference is or we do it on the title as well okay right so that is that is our penultimate challenge that's our second last challenge there is one more at the end of this installment, but then we are done.
[17:48]
So let us move on to the concept of manipulating without exploding. And we're going to look at three very useful features. So first off, we're going to start with the one that's the most mentally exercising, which is an operator. We haven't had a new operator in a while. So not a simple function, a whole operator. So there's a lot more detail on an operator. And the word I'm going to use to describe what it does is we distill an entire array into a single value based on some sort of piece of logic. And so an example of distilling would be the length function, which distills an array to a number, its length.
[18:31]
The add function, which adds every element in an array together, distills the array to a single number, the sum of all of its elements. Right? That's what I mean by distilled. You take an input of many values and you get an output of one. So like with a still, lots of stuff goes in, alcohol comes out. But it can't be five things turn into two. No, it's always one. The way you're using the word, it's into one. Okay. It's always into one. So it's an array into a value. Array to value. Array to value. Which is actually something you end up doing quite a lot when you look at it more deeply. So we're going to start there with that whole new keyword, reduce. So the keyword is reduce, right? We take an array and we reduce it to a single value. And then we're going to look at applying one piece of logic to every element in an array all at once. And the function for that is called map, because you map an operation to every element in an array. So it's quite a common name for this functionality is map. And you might notice MapReduce is actually the two functions that are used to implement stuff like Google and a lot of statistics are implemented based on MapReduce. It's a very common paradigm. It doesn't make sense to me because of statistics, but I always hear people talk about MapReduce, MapReduce, MapReduce and JQ can do it.
[19:48]
And then the last thing we're going to look at is applying one piece of logic to every value in a dictionary. So a dictionary is key value pairs. The map values function, as its name suggests, doesn't touch the keys, but it goes into every key and updates the matching value. So map does everything in an array and map values does the values in a dictionary.
[20:15]
Okay, you're using two different words. Let me just make sure I understand why in the text versus your verbal explanation. Your text says applying the same edit to every element in an array is map. And you said the same function. So a function would be something that edited that. Yeah, the same operation, the same change. Make the same change to everything in an array. Yeah, the whole point is change. So map again is only to the key, and map values is only to the value? No, map is to an array, so an array only has values.
[20:51]
Gotcha. Okay. Oh, there's the difference. Okay, one of us is in an array and one of us is in a dictionary. Okay, good. Got it. Okay, so we're going to start with the heavy lifting one last time for JQ. So we're going to look at the reduce operator. And the reduce operator is so powerful. I can't remember if it was in a blog post or in the actual documentation, but the author basically said the reason reduce exists is because it's how a lot of built-in functions are created. And I was in two minds as to whether or not to even make it a public function, but it's so useful I made it public anyway, even though most of the time you're going to use reduce in disguise. Because in fact, every time you've used a length function, you have been using reduce. It's just, it's been wrapped for you and tied up in a little bow because that's how it's actually done. So reduce is like the brains behind stuff we've already been using. Okay. But there are brains we can play with too. So we can make up our own extra functions that apply the same kind of a thing to arrays. So because it's an operator, the syntax is a lot more than just reduce open round bracket, right? It's a full operator. So, the syntax is the keyword reduce, then a filter, which we're going to call the generator expression.
[22:12]
You'll see why when we describe it. So reduce some filter as some variable name, open a round bracket, one filter, semicolon, another filter. So reduce filter as variable, round bracket, filter, semicolon, filter, close the round bracket. That's a lot of moving pieces. Yeah, right? That's four pieces I need to describe to you. So the reduce function starts by saying, give me multiple values so the generator expression is there to make many values and nine times out of ten you explode an array so you say reduce okay name of array open square bracket close square bracket but in theory you could reduce anything that outputs something.
[23:01]
So basically it's saying reduce this thing but this thing has to be generated so you need a generator expression in order to create the thing that's going to get reduced bing bing bing eg uh exploding exploding array exactly okay i got one down so see if you can keep then it says as and then we have to name a variable and the name we give this variable is going to be each individual piece because this is going to be a loop right we're going to loop through all of the things we've generated and so we need to give a name to the current piece so we get to make that name so So that's what we put there. We make up the name. And then inside the roundy brackets come the two pieces of logic to do the work. So this is like an accumulator. We're going to build up an answer. We're going to start with an initial value, do the same thing once for everything in our list, and then the final value is what gets returned. So we start with something, we update it, update it, update it, and the final update is the answer.
[24:03]
A single value? A single value. So it's, I call it an accumulator, right? Think of it like the one memory in your calculator. When an accumulator is a variable that starts with one value, is updated, and then it's finished. So you start with a value, you update it, and then that's what you output. Okay. So it's like a bucket, right? We start off with a bucket, and you get to put something in the bucket as a starting point. And then every time through the loop, you get to change the bucket. bucket and whatever's in the bucket at the end that's your answer it's one bucket all the way through and you get to manipulate it at every step okay i'm gonna do something i don't normally do in the show but all of a sudden your volume went way down whoa not not not terrible but just went down about i don't know 10 or 15 percent literally whack my microphone so i probably actually moved it yeah that was it yeah there she is microphone and it moves by an inch it's funny when you go when you go like this it gets harder to hear and then you come back and it's easier to hear yeah Yeah, there we go. Okay, right. Good. Thank you, Alison. And it's not a dodgy cable. I physically knocked my microphone out of my face.
[25:08]
We were so excited about accumulators. It kind of was, yeah. Okay, so I'm hoping this is going to make more sense when we get to see an example. Oh, yeah. Right. So the first one is the starting value for your accumulator. And the second one is how do I change it? What do I do every time? So, to show how this works, we're going to redo a few basic JQ functions that really do exist. So, the first one, we're going to redo length. We're just going to make our own version of length. So, we're going to say reduce dot open square bracket, close square bracket. So, whatever array you sent to me, I'm going to explode it. That's how I'm going to get my pieces. I'm going to say as dollar item. I'm never going to use $item because I haven't I don't care what's in the array I just care about how many but I have to give it a name so sure item whatever dollar I whatever right I name it because I have to I start counting at zero so my initial expression is just zero my bucket starts at zero for every item in the array I take the current value of my bucket and I add one and.
[26:18]
Okay, and you do that by saying? Dot plus one. So current value of the accumulator plus one. So if I... What if the value in the array was telephone? The string telephone. Right, the telephone would be in dollar item. Yes. I'm not touching dollar item. I am saying the current value of the accumulator plus one. I know, but I don't know what... Okay, let's do it with the array 2, 4, 6. It could be the array pancake waffles 2. Do it with not numbers for me. Okay, we'll pretend it says pancake waffles. Telephone book staple.
[27:00]
Horse battery staple. Horse battery staple, great. Horse battery staple, three. Okay, perfect. So the first time through the loop... So the very first thing it says, reduce our array horse battery staple by exploding it. So we're going to have three exploded pieces here. We're going to name each exploded piece dollar item and we're never going to use that name ever again the first time through the first time through is horse dollar item is horse yes okay but we don't care we don't care so before we go through at all we start our accumulator with zero, so we have horse battery staple waiting and our accumulator is zero so that's the four pieces, We are handed horse. We don't ever look at horse. We say, what is the current value of the accumulator? It is zero. Plus one. The accumulator is now one. We throw away the horse.
[27:55]
So wait a minute, wait a minute, wait a minute. How did it know that it was zero if we haven't gone through this once first? I would think horse would have been zero. Right, right, no. So that's why we have one expression, semicolon, another. The first expression is the initializer. So with a for loop, you would say i equals zero, which happens before the first loop, right? This is like that. Okay. So it starts at zero. Usually it's zero the first time through, and you're saying it's already one our first time through. No, no, it's zero before the horse arrives. Before the horse arrives, it is already zero. Then the horse arrives. Okay, the horse is still in the barn. Okay. So it is zero and now we look at the first thing in the array, which is horse. And so we say zero plus one. So the accumulator is one and the horse is finished. Then we get to battery. The accumulator is already one. We say one plus one. The accumulator is now two and we throw away the battery. Then the staple arrives. The accumulator is two. We say dot plus one. So the accumulator is now three. We throw away the battery. There are no more pieces. Three is the output. Okay.
[28:59]
Okay. So that is length. It is literally reduce dot open square bracket, close square bracket as whatever zero semicolon dot plus one. So let's implement the add function, which adds all of the numbers in an array together. It's also a standard JQ function. So the syntax is going to be similar. We say reduce dot open square bracket, close square bracket. So we're going to be handed an array, explode it, as $item, this time we care, we're adding up the elements in the array, so we definitely do need to know what each element is called, so we actually care that we call it $item. We say start the accumulator at 0, before we have added any numbers, there is nothing. If you give me an empty array, I will output 0. The update operator is the current value plus $item.
[30:00]
So this time we're not sending horses, batteries, or staples. This time we're sending 246. Okay. So we start with the array 246. We explode it into pieces. Before we process the 2, we start our accumulator at 0. Then we meet the 2. So we have 0 plus $item, which is 2. So now the accumulator is at 2. We throw away the two. Next, we meet four. So we have four, sorry, two plus four is six. Throw away the four. We already have six plus the new six gives us 12. We throw away the six. There are no more things. 12 is the output.
[30:39]
I wish everybody could see as Bart was trying to do the math in his head of six. When he got to six, he was like, okay, wait a minute. That's a different six. And I saw him like, look up. you know how you look up when you're trying to add yes i knew it was gonna be tricky right but do you see what's happening we we have yeah yeah we have one variable and we're updating it every time using the logic we add so whatever we had now add it whatever we had now add it, there is no built-in multiply function but we can make one, We can say reduce dot open square bracket close square bracket as dollar item one semicolon dot star dollar item. Now, I chose one because if you multiply by zero, it doesn't get you very far. I figure one was probably a better starting point. So if we send in two, four, six. I don't think you need to do the math for us on that. No, I'm not going to. It's 48.
[31:34]
You can copy and paste the JQ command into your terminal and then we'll tell you it's 48. But we can of course do way cooler things than just adding it up the factorial function is a very good thing to do with reduce because it's a very repetitive thing that you you build up so the factorial of a number is one time it's every number up to that number multiplied together so the factorial of three is one times two times three which is six the factorial backwards bart it's three times two times one. I've never thought of it as one times... Which is identical. I know, but I've never thought of it that way. You start with the number you want, you multiply it by the number lower than that, lower than that, until you get to one. You don't start at one. Nobody does that. I do, because it's easier programming. It's mathematically identical. You're right, the textbook says it the other way around. The factorial of four is four, three, two, one, or one, two, three, four. One times two times three times four. We can implement that. by reusing the range function we met in the previous instalment for doing our multiplication tables. Because we don't have to explode an array, we can do anything we want inside that reduce to make some values. Now, the range function is programmer type. The first argument is the starting value, the second argument is one more than where it actually stops.
[33:01]
So if you want to go to three, you actually have to say one semicolon four. So that means that we want the range from one to dot plus one, which is so annoying. So if the number four is coming in, you're saying it's going to go from one to five. Well, we say five is the second argument, but range will never give us the five because range always stops one short. So range will give us 1 to 4, but we'd say 1 to 4 by saying 1 to dot plus 1, because range is annoying. Okay. So anyway, range, our range of numbers, as $i, because I got tired of typing item, 1, because we're doing multiplication, semicolon, dot, star, $i. Okay. That will do it, right? That takes 1 times 2 times 3 times 4, and on it goes. Which means I can tell you, without doing all the math in my head, that the factorial of 5 is 120. And factorial goes up very fast, because 6 times 120 is 720-ish. It's got to have a 4 in it somewhere, doesn't it? 740 then?
[34:20]
I could open the terminal and find out. You could really, couldn't you? Yeah, anyway, I think it's 740, I think. My second answer, not my first answer. oh the blind leading the blind on this one so wait a minute so you gave it as the input you gave it you gave it five right which is the factorial five is 120 i thought we were doing four we're doing whatever right it's whatever we pass in is going to be dot, so you change that five to a six and dot will be six yeah but the example you gave was one to four wasn't it that was before that was in the lead-up like the factorial of three is the factorial of four is but in the code jq minus n five pi produce i did it with five that was a bigger number i hadn't told you the answer to that one but the text right above it says range one uh semicolon four would would produce one two three right yes that's me just describing how weird the range operator is. I didn't want to give too long of an example there. Okay.
[35:28]
It is 720, you're right. I shouldn't have changed my mind. Okay, so that is Reduce, which is simple, elegant, and makes your brain hurt. Linuxy. Yes, yeah. Okay, so that's the heavy lifting. Combining the Reduce thing in there too really made it fun. True. So, that's the heavy lifting. Now we get to coast out on two easier functions. Let's start with map, where you're going to process an entire array in one go with map. Map is a function, so as a function it's just map open bracket and in this case it always has exactly one argument which is the filter we would like to apply to every element in the array and inside that filter that we pass as the first argument dot represents the current value of the array. That's all there is to it map open bracket what you'd like to do close bracket that's it so as a simple example let us take an array of numbers and convert them all to their absolute value and so we do that by saying map abs abs is the function for absolute value so we just say apply abs to all of our input so if we give that the array 1 minus 2 3 minus 42 we get back 1 2 3 42 to.
[36:53]
Hold on. Hold please while I scroll back and read what we said at the beginning. So I thought these were okay, so map is not like reduce. It doesn't go down to one thing. That's what I was looking for. So reduce distills our array.
[37:09]
Map transforms our array, and map value transforms our dictionary. Yes, because it's like a loop. We're basically doing a whole loop in one go. so effectively we're mapping the function abs to every entry in the array yeah i like it and it really is just that simple map abs that's it um we can also really simplify our filtering of things by mapping to select if you google for things you will see map select all over the place because it lets us shrink an array down without exploding it so instead of exploding it putting all the pieces through select we just apply the select straight to the whole array because select either outputs absolute nothingness it outputs the empty value not null or it outputs the original value which means that everything that fails the select disappears from the array so when you map select it literally wipes it out of existence if it doesn't match the filter so you start off with that actually changes what i just just finished saying which is map always comes up with the same number of things no if you've got got four things and one of them isn't a match after you go through map select there'd only be three.
[38:28]
Yeah, every input is processed once. But if the result of processing that input is that the input goes away, well, then it's gone.
[38:38]
You've just destroyed it. But that's really quick. It gives nice short code for just shrinking an array down to meet some sort of a criteria. So you can do a map select on your previous array, your 1, negative 2, 3, negative 42, and say, only give me the positive values, and you would have only had 1 and 3 left when you're done. OK, precisely, precisely. It's clean. Now, in my example, I'm falling back to our physics code or our Nobel laureates because a little under the hood here, these show notes, remember, we changed the order of things a bit and then we ended up discovering that JQ was way cooler than I thought. Well, we're now back to the show notes I wrote sitting on the tarmac in Brussels airport at the end of January in the snow on a three hour delay. That's when these show notes were written.
[39:24]
And that's how long they have been sitting in the branch called, it was called PBS 162 plus and then 163 plus and then 164 plus. It kept on moving, right? So that's how far back these go. And at that time, I was still fascinated by Nobel Prizes. So anyway, we're back to the Nobel Prizes data set. So we know that if in the past, as we were working so often with our Nobel Prizes, if we wanted only the physics prizes, we would start by exploding and capturing dot prizes so we open a square bracket dot prizes open square bracket close square bracket pipe that to select dot category double equals physics, close our select close our square bracket and then we pipe all of that to reverse so the things are in the right bloody order so that they know um they're newest first and i want them oldest first, so that's how we've always done it before well with map select that becomes way simpler logic we say dot prizes pipe map select dot category equals physics pipe reverse.
[40:26]
Huh isn't that easier to read isn't that easier to get your head around no square brackets all over the place it's just give me the ones that pass that category you don't reverse them, yeah now there was that question mark stuff we had to do before where if it didn't exist it gave does this avoid that problem no because there's always a category right if we were working with With, say, the laureates array or something, we'd still have to do the question mark because sometimes there are laureates and sometimes there aren't laureates, but there's always a category. So you can do the question mark without exploding the array? Would it be just dot prices question mark? Yeah. And then that whole middle filter just gets skipped if there are no prices. It's like, okay, nothing to reverse. No error. It won't give an error. It'll just go, okay, fine. Well, wait a minute. So would the entire thing not work if we'd said dot category double equals? equals, well, that's not going to, that wouldn't work, but, Anyway, I think I understand. It'll do, basically, it's a shortcut for what's above, right? So anything we could do before, we can do with map select, but it's just that we have a much shorter syntax for it. Instead of exploding and recapturing, we just do it all map select. There's no actual difference in the logic. It's just we don't waste our time exploding and recapturing. We just map.
[41:41]
And really, that is all there is to the map function. But it has a useful side effect. So JQ doesn't like throwing errors. I know we sometimes think it does, but actually JQ goes out of its way not to throw errors. Not the way I write them.
[42:02]
It's trying its best, I swear. I've never written a first draft that didn't have syntax error. Unexpected line end at line 32. Okay, syntax errors, it can't help it. But the map function is for working with arrays. What if you throw a dictionary at it? It goes, okay. Okay. I will ignore the keys. I don't know what a key is because I'm an array function. I'll ignore the keys. And it just gives you back an output of the values processed as if it was an array with just the values. So if you want to convert an array, sorry, if you want to convert a dictionary to an array, array if you just map dot it will just turn the dictionary into an array because it will just give you back the values unchanged but wait a minute it doesn't have.
[42:55]
Oh oh so it just gets rid of it just erases them okay the keys just disappear because map has no idea what a key is so if you start off with a dictionary of weekly sales data that's indexed you know, mun, one amount, two, wed, thir, fry, and you just want the numbers. I don't, I just give you an array of numbers. I can count from one to seven. If you just say map dot, you will get back the array and the order will be the same order as the dictionary was. Wow, that is cool. That is a really cool little side effect. So that's just a really useful little bonus topic there. So last function, this is it. This is our final JQ function that is not bonus extras. We have map values and its job is to take a dictionary and to update the values in the key value pair so it's just like map but it's only going to apply to the values and it understands oh dictionary you say right so it understands what by the way for for people listening uh who aren't looking right at the notes to get right in your head and start memorizing its map underscore Yes, it is. Okay. And let us, as an example, let us double our weekly sales. Let's go fix the books. Yeah. Map underscore values, open bracket, dot star two, close bracket.
[44:19]
That will simply apply dot star 2 to all the values in the dictionary. So Monday suddenly becomes 4,686 instead of 1,343. No. 2,343. Yeah, whatever it said before. I'm too lazy to scroll up. So really, it's that simple. Have you found a reason to use that one in anger?
[44:44]
Map values, yes. Yes, because if you have data that is, say, it's numbers that are numbers, not numbers, that are strings, not numbers, you can map values to string, or sorry, to number. Oh, that's nice. Right. Or if you have strings that are mixed case and you really need them all lowercase, you can just ASCII upper or ASCII up case or ASCII down case to all the values. So there are actually quite a few times when that is very useful, actually. Didn't we have trouble with the Nobel Prizes? Was it the date that was a string when we wanted it to be a number? The year. The year is a string, therefore you can't do math on it. Yeah. You can't do proper math on it because it's treating it as symbols instead of numbers. Right. So just like map won't throw an error if you throw a dictionary at it, map values won't throw an error if you throw an array at it. Because it only cares about values anyway. And an array is just values. Oh. So it'll do, okay, fine. I'll update all your values. And it will give you back an array. So map always outputs an array. Map values outputs whatever you gave it in. So map values, dictionary in, dictionary out, array in, array out. Map, array in, array out, dictionary in, array out.
[46:08]
Okay. Now, there is one more very important and really subtle difference between map and map values. And we've hinted at it already by saying, because the select is an example that it doesn't have to be a one-to-one mapping, right? You can have each input is processed once, but if that process produces zero or a million outputs, how does the function react? Map will take them all. If you have each entry in the array mapping to 5 million entries, well, you'll get 5 million entries in the output array, and they'll just appear all next to each other. So where it was one value, they'll just all appear at that point in the array. And it's perfectly fine. Give me as many as you want. I'll just stick them into the output array.
[47:01]
Map values only ever takes the first one, and it ignores everything else. Oh, interesting. Because it's thinking in terms of dictionaries, and the concept of a single key having multiple values makes no sense. So it's in dictionary world in its head. And so it just, it's as if you piped the output through first and it will always give you zero or one. So you can still make it go to nothingness, but you can't have more than one. You've either made it go away or you're giving it one new value, but map values will never give you multiple. Just discards the other ones. And to show that, it's actually kind of difficult. So I came up with a really contrived example. What if we take an array of arrays, and we map it to the function to explode?
[47:48]
So that will effectively flatten our arrays, right? Because the first value is the array 1, the second value is the array 2, 2, and the third value is the array 3, 3, 3. Three so if we map that to dot explode the outputs through map will be one two two three three three, because the first array becomes one though that's all it is absolutely i could see useful yeah yeah i mean what we have we've done there is we have flattened a two-dimensional array by saying map dot explode now bart paused there but it's one comma two comma two comma three comma three 3,3. They weren't in clusters in any way. Precisely. All the values just come out and get stuck into the output array one after the other. I could see that being useful. Yeah. Now if we try the same trick by running that through map values, which is perfectly happy to take an array...
[48:44]
But it will still just give you the first one. So that gives us 1, 2, 3 as the output. So it's giving us the, it's, okay, because the first value was square bracket 1. The second value was square bracket 2, 2. So it still took the 1. It took the first two of the two 2s, and it took the first three of the three 3s. Three 3s, yeah.
[49:08]
Because it ignores everything after the first output. It's like, yeah, I just wanted 1. That's really hard to say out loud. Isn't it just? And we end up with the same thing happening when we give it a dictionary to each of them. So if we give it the dictionary, A maps to the array 1, B maps to the array 2-2, and C maps to the array 3-3-3, and we run those through map, we get out 1-2-2-3-3-3-3, because again, when you send it through map, you always get an array out, even when you put a dictionary in, you always get an array out of map. Map just for clarity you added an extra three so one comma two comma two comma three comma three comma three okay i gave us four threes if we do that to map values we get back a1 b2 c3.
[49:58]
Yeah yeah so subtle but important that last subtlety is the end of our core syllabus us on jq let's say we have a little epilogue next time with some really cool bonus extras but that is we now know frankly we now know more jq than most jq users know there are a lot of people who use jq for very simple things like pretty printing json that comes out of an api or pulling out one or two keys most of the people who use jq use about 10 of what we've learned, we have learned I estimate about two thirds maybe three quarters of everything that there exists in JQ wow but we probably know more than most of us are going to need ever, and we have covered almost everything I've needed for doing some really quite advanced stuff with my work hat on where I really went very very deep down the rabbit hole.
[51:00]
There's a reason we have the epilogue episode, because I did end up in places where I needed some even more advanced features than what we've learned. But honestly, we have learned a lot of JQ here. All the show notes to now have you extremely well armed for handling JSON data from, frankly, with our web programming hat on, outputting from APIs. That is where we see JSON all the time, right? When we wanted to find out details about our IP address, we got back a bunch of JSON. When we were doing the currencies before that API became paid for, it was a bunch of JSON that came back. When we were doing the weather, it was a bunch of JSON that came back. And so JQ is perfect.
[51:39]
I think it's like when you buy a new car, you see that car everywhere. I see JSON everywhere now. I mean, it springs up. When I was interviewing the gentleman about the head tracking software that he had written to allow him to control a computer, he had a programmer with him who was working on the code and I was like well there's some JSON it's just it's always every it's everywhere well I've recently I say recently about two years ago at this stage learning YAML for the first time YAML is everywhere absolutely everywhere I didn't recognize it before YAML is everywhere well yeah you're sneaking in and on me where I don't want it yeah sorry about that um so I do have one final challenge to test what we've learned today so we're going to start with our output of breaches for our fictitious domain.
[52:31]
And when we get it it's dot breaches which is indexed by our username and then it's just an array of strings well why don't we enrich that a bit by doing a few things so first off let's simplify the array, the output by getting rid of the dot breaches and dot pastes and just going straight to the lookup table of user.
[52:57]
To what they were caught up in. So we don't, instead of having to go dot breaches dot Allison, dot breaches dot Bob, dot breaches dot Joe, just dot Allison, right? Just take that breaches, which is what we care about, and put it at the top level. So that's the first thing. So let me make sure I understand, this is Is this our fake group of people, or are you talking about the official export from Have I Been Pwned? No, the fake group of people.
[53:28]
Okay. So our good friends, KW Kelly, MW Kelly. MW Kelly, okay. Yeah. So at the moment, MW Kelly just maps. So it's Breaches, and then it's J.O. Sullivan, E. Green, MW Kelly, A. Hawkins, P. Trainer. But that Breaches is just a waste of our time. And then we have the pace underneath it, which is completely empty. So top level becomes J. Sullivan, E. Green, and MW Kelly, et cetera, right? Right. Just that's what we care about. Stick that at the top level. Okay. All right. So that's the first transformation, which doesn't involve anything new. That's stuff we've learned before. But what we want to do then to our remaining MW Kelly, et cetera, is instead of having an array of boring strings, we should have an array of dictionaries where we use the big export from the API to add some more detail. Into each of our elements in the array. So we're going to map that array of strings to become an array of dictionaries. See where we're going here. And what we want to do is have those strings be replaced with a dictionary indexed by, what do I say, what do I say? I've scrolled too far. Name, title, and data classes. Name, title, and data classes, right? Right.
[54:55]
And we're going to use the map function to do that all in one go. And then I'd like you to add an extra key called your exposure score. Right. Your exposure score is a numeric value representing how pwned you are. And you calculate that by looping through the breach, starting with zero. And every breach you're caught up in that does not contain passwords you get an exposure score of one more and if it does contain passwords ten more oh no, right so that's a reduce function right that is classic reduce right take our array there's an or thing going on in there yeah right it's a more powerful reduce but we're reducing it so that gives us a chance to map and reduce okay.
[55:48]
That sounds fun. Yeah. And that is our final challenge. So, like I say, we really have done a lot of JQ. But we have a little epilogue next time for just some cool little bonus features, some of which everyone will like. It'll just be a different subset. I'm going to be curious. This will be like the senioritis thing where the last lecture the professor gives, nobody cares, you know, the final's already been turned in, you know, we're just going to have a party and listen and learn some fun stuff. Pretty much. And it's also, it's actually a really good analogy because oftentimes that last lecture is the professor's pet topic. Like technically it's not even on the syllabus and I'm not allowed to ask you in the exam, but gosh darn it, I think this is cool. So I ain't going to tell you all about it in the last lecture. And the only reason you're here is because you know I'm going to give you a hint about the exam paper at the very, very end of the lecture. So you're going to sit through my pet topic all the way through. There is no exam, though, so I'll spare you that.
[56:51]
Right. Well, that's all she wrote. That's all I wrote, anyway. All right. Well, this was fun, Bart. I enjoyed myself. Well, excellent. And until next time, happy computing. If you learn as much from Bart each week as I do, I'd like you to go over to let's-talk.ie and press one of the buttons over there to help support him. He does 98% of the work here. I'm just the stooge that listens to him and asks the dumb questions. If you go over to let's-talk.ie, you can support him on Patreon, you can donate via PayPal, or you can use one of his referral links. I really hope you'll go over and help him out. In the meantime, you can contact me at Podfeet or check out all of the shows we do over there over at podfeet.com. Thanks for listening.
[57:37]
Music.

Error: Could not load transcript. Please try again later.

Reload

Loading Transcript...