CCATP_2024_05_11

An audio podcast where Bart Busschots is teaching the audience to program. Associated tutorial shownotes are available at https://pbs.bartificer.net.

2021, Allison Sheridan
Chit Chat Across the Pond

Automatic Shownotes

Chapters

Introduction to Programming by Stealth
Exploring Variables in JQ
Unveiling the Challenge Solution
Analyzing Data Enrichment
The Magic of the And Clause
Understanding Variable Usage in JQ
Creating Variables in JavaScript
External Variables in JQ
Variable Binding in JQ
Setting the Stage
Exploring Code Logic
Grouping and Organizing Data
Code Efficiency and Best Practices
Transition to Advanced Concepts

Long Summary

Alison Sheridan introduces Bart Bouchats in a new episode of Chit Chat Across the Pond, diving into the intricacies of JQ programming. They kick off with Programming by Stealth number 165, shedding light on the significance of variables within JQ and how they are pivotal in manipulating lookup tables despite the unconventional language structure. Together, they revisit a past challenge involving JQ script creation to identify breached users based on search patterns and case insensitivity. Delving into the solution, they detail the process of converting dictionaries, filtering arrays, and utilizing ASCII down case for case insensitive comparisons. Bart emphasizes the importance of the contained explosion approach in JQ coding, offering valuable insights into problem-solving strategies and practical uses of JQ functions.

The conversation progresses as the main speaker delves into the complexities of using external variables and their impact on data transformation in JQ. They highlight the minimal need for variables in JQ due to its design for streamlined data processing through pipes. However, they point out a scenario where variables become essential for reshaping data, stressing the importance of knowing when to effectively employ variables. The distinction between manipulating input and variable binding using the "as" keyword is dissected, underlining the specific syntax peculiar to JQ. Through detailed explanations and examples, the speaker navigates the nuances of syntax and the correct definition of variables in JQ for efficient data transformations.

The main speaker further breaks down a complex example to elucidate the concept of variables in JQ, showcasing how to calculate averages and print multiplication tables while depicting the behavior of variables in the process. They clarify that defining variables does not impede the flow of data, elaborating on the role of the "as" operator. By exploring practical examples like transforming data structures to solve real-world issues, the speaker provides a comprehensive understanding of leveraging variables and various functions within JQ to manipulate and transform data structures effectively.

Shifting focus to transforming data through various techniques, the main speaker emphasizes creating lookup tables and manipulating arrays seamlessly in JQ to avoid overly complex code. They introduce a challenge involving returning dictionaries with detailed breach information and tease advanced topics to be discussed in the upcoming final episode of the series. Encouraging support for Bart through his website and expressing gratitude for listener engagement, the main speaker closes with a reminder of their contact information, wrapping up a rich discussion on mastering JQ programming intricacies.

Brief Summary

Alison Sheridan and Bart Bouchats dissect JQ programming in Chit Chat Across the Pond, highlighting the essence of variables in manipulating lookup tables and solving challenges. They delve into JQ script creation to identify breached users, emphasizing the importance of ASCII down case for case insensitive comparisons and the contained explosion approach. The discussion expands to using external variables strategically, distinguishing between manipulating input and variable binding in JQ syntax. By exploring examples like calculating averages and reshaping data structures, they underline the significance of variables for efficient data transformation. The conversation concludes with insights on creating lookup tables and manipulating arrays in JQ, teasing advanced topics for the next episode and encouraging support for Bart's website.

Tags

Alison Sheridan
Bart Bouchats
JQ programming
lookup tables
variables
JQ script creation
breached users
ASCII down case
data transformation
external variables
data reshaping

Transcript

[0:06]
Introduction to Programming by Stealth
[0:07]
Well, it's that time of the week again. It's time for Chit Chat Across the Pond. This is episode number 793 for May 11th, 2024. And I'm your host, Alison Sheridan. This week, our guest is Bart Bouchats, back with Programming by Stealth number 165. How are you doing today, Bart? I am doing good. The listeners can't see, but my coffee has ice in it because today is summer.
[0:31]
Tomorrow it will be raining, but today it is summer. Yeah, it was 23 degrees Celsius. Your one day a year. One day for now. It's nearly exam time. I'm hopeful. Usually when the students have to study, I get to bathe in the sunshine. But, you know, fingers crossed.
[0:46]
Anyway, yes, I'm good. All right. You promised us we were going to learn about variables this week, even though we're not supposed to ever need variables in JQ, right? As I believe I literally have a heading, seldom is not never. ever. They shouldn't be the first thing you reach for because you're probably doing it wrong if that's the case. But when you have actually reached for everything else and it doesn't work, they're there for a reason. They are actually genuinely needed sometimes. It's just JQ's an odd language, as we shall discover.
[1:17]
Exploring Variables in JQ
[1:18]
But yes, we absolutely are doing variables, which will allow us to finish our exploration of how to transform lookup tables. Because if you have a lookup table that represents a many-to-many relationship, like, say, users and data breaches, if it's indexed one way and you want it the other way, you actually can't unscramble the first egg and then scramble it again the other way without variables. And that's why we're doing variables variables now instead of when I had initially planned, which was at the very, very, very end. My intention was to keep variables as the very, very last episode. But we actually can't do lookup tables without them. So this is the perfect point in the series to hang them because we genuinely have a need for them. This is one of those rare cases where you actually need them. There is no other way. So I thought that was nice.
[2:11]
And of course, I'm literally two weeks ahead of the class on this because I'm learning JQ along with you all. So it's not entirely surprising my best laid plans have taken a few detours, which I don't think is a problem. I think that's how we roll here.
[2:25]
Unveiling the Challenge Solution
[2:25]
Um we also i'm gonna blow the dust off our challenge from oh how long ago we've had a lot of fun with light episodes about laser beams and vacuum cleaners and what else have we done a lot of fun stuff i actually forget but it was it was a pbs for a while yeah so much so that we had a listener actually write us to ask if we'd gone on hiatus like no no no we just got distracted real life got in the way you've had family stuff i had family stuff we had things going on but But anyway, way, way, way back when I had set a challenge at the end of PBS 164, which was to create a JQ script that finds all the users caught up in a breach that matches a given search pattern case insensitively. So the idea being, if you put in drop, it will find anything that contained a drop, whether that be Dropbox or I'm sure there's something else with drop in it. Or if you put in link, it would be anything to do with LinkedIn or Qlink or anything else with a link. and don't worry about capitalization because goodness only knows how Troy chose to capitalize things if they appear mid-word. I think he's quite fond of so-called Pascal case, which is like camel case, but the front one is capitalized as well.
[3:38]
So anyway, case insensitive and do a pattern match on the search string that's passed in. And that actually requires us to remember, again, blowing the dust off, five things we have learned before, but I'm calling them out very explicitly because I was rusty when I went to write my own sample solution. So if I'm rusty, everyone is probably rusty. Unless you're listening to this four years in the future and you're listening to it all back to back and you're like, hiatus? What hiatus? It was 164 and now it's 165. But in real time... And I get rusty in the two weeks. So if it's six weeks, I haven't got a chance. Yes, precisely. So the things that we have come across before that we're going to refresh in this challenge solution is converting a lookup style dictionary, into an array of entry style dictionaries with the function to entries.
[4:29]
We're going to make repeated use of our design pattern of exploding an array, doing some stuff and then gathering the pieces back together by wrapping the whole kit and caboodle in square brackets. So you open a square bracket, you explode the array, then you do a whole bunch of work and then you close off the opening square bracket to collect it all back together. We're also yet again going to make use of the any function within a select so that we filter a thing that contains an array. We want to keep the whole thing, but whether or not we keep it depends on any one of the elements of the array being a thing. So we don't want to explode it. We want to look inside, see if any of them match, and keep it all. So that's why you have a select any. We're also going to make heavy use of ASCII down case to do case insensitive comparisons. If you change everything to lowercase before you do your compare, hey presto, it is now case insensitive. And then we're going to use the minus minus arg command line option to pass a named variable into JQ.
[5:37]
So with all that in mind, you will find the sample solution in pbs164-challengesolution-basic.jq. And that will find all of the users caught up in the data breach. And I called my variable $breachsearch, because search strength for breaches. And we immediately start because in Troy's data set, everything we care about is in a dictionary named breaches. So I immediately start by saying, give me dot breaches, piper to two entries, which means we now have this array of entries. And then we explode and catch.
[6:19]
And the first thing we do when we explode is we pass it to the select any. And then we're interested in going through every single value in the array of. So the data structure is name of breach, sorry, name of person, and then an array of all the breaches that person is caught up in. So if we want to know if, yeah. This is a piece I just wanted to jump in right here. It seems to me like I often, in trying to get to the solution, I have to do the manual labor of going and figuring out what the answer is going to be in order to derive my solution to get that answer. So by that, I mean I have to go in and I have to look for, say, LinkedIn. And then I go through and I look at that breach and I see inside that now it's got these values inside for the breaches. It shows me who it is. It shows me, for example, we're going to be talking about what kind of data sets, data classes are in there. Like, is there passwords in there? And I have to sit there and I have to figure it all out. Well, I've already done all the work.
[7:27]
No, you've done the work once in a very small subset, but you haven't done the work on a giant, big, real-world data set. So you figured out the shape of things and you solved the problem, but you still then have a very useful automation. So I spend a lot of time using JQ with piping stuff to first and last, so that I can see the date, you know, say the fifth element of an array. If I have an array of 5,000 entries and I want to see what shape is each entry, I'll pipe it to first. And then I'll look at it and go, ah, okay, so these are all this shape, and then I'll start to form up my solution. Yeah, I guess what's artificial is I'm looking for one or two, and what you'd really be looking for is the 75 people who got caught up in the breach or whatever. Okay, that makes sense. And of course, your solution doesn't care about the scale of things, right? That's the power of the code. Once you figure out how to solve the problem, you figure out how to solve the problem. So you now have a solution that works for any size of data set, big or small. So once we have exploded our entries, we're then going to check each one to see if anything in the array matches our lowercase version of our search string by using the contains function.
[8:43]
And contains is one of the many functions in JQ that changes its behavior depending on whether you give it two strings or two arrays or whatever. And we're giving it a string and then we're asking it to compare that to another string. string so then it does the sensible thing which is a substring. In other words if the name of the breach has as a substring our search string then we match. In other words we keep the entire entry otherwise it all vanishes into nothingness. And so at this stage what we have are a whole or only the matching entries for the people caught up on our breach whatever it is we searched for but we have the whole entry right we have J.O. Sullivan and every breach J.O. Sullivan was caught up in and M.W. Kelly and every breach M.W. Kelly was caught up in but all we really wanted was who the people were right who were they that were caught up in any of our breaches so the only thing we want to keep is their name which is in the key key I hate saying that but you know the key named key and so all we need to do then is pipe that to dot key so that we only keep the key and that's it. So while my sample solution file has 22 lines in it, they're almost all comments because actually when you pull it apart, it's six lines of code.
[10:01]
So I did comment very heavily because it's been so long since we've done any of this that I really did want to break it down into very small pieces and sort of move through it.
[10:15]
It does help. I really like how we've gotten into the habit of the explode it, do a bunch of stuff and assemble it back together. I'm getting comfortable with that repetitive motion.
[10:27]
Hmm. It is definitely one of the atoms of using JQ is, you know, I think of it as a contained explosion, right? I've wrapped my explosion in the square bracket, so I'm not going to lose any pieces. They're still going to all be there for me when I'm done. And then I can work away inside and know that I have everything when I'm finished. That's in my mind, that's how it is. It's a contained explosion. So that was what we needed to do for our basic credit.
[10:51]
Analyzing Data Enrichment
[10:51]
And that, you know, allows us to search, say, if we search for the phrase online, we can see that that's online or Spambot will be the matching breach, which gives us J.O. Sullivan and A.W. Hawkins. But the bonus credit was to don't bother telling us about breaches that didn't include passwords. Let us assume your privacy isn't what's important. I just care about your passwords, which if I was a corporate overlord might well be what I cared about. You know, have you compromised my work email account as opposed to is your personal privacy, me the employer doesn't care about compromised. Anyway, leaving aside the cynicism of it all. It's a good exercise in making use of data enrichment. So to a very large extent the solution is going to be the same shape, Because you're going to go through all of the entries one by one. And yes, you need to find the breaches that match your search string. But then you have to do an and. And also check if that breach contains passwords.
[11:57]
So it's the same shape, but the bit in the middle where we're doing the checking gets a little bit fatter. Because we have more checking to do. But the big picture shape is the same. And the reason for setting this bonus was to bring a sixth concept into play, which is the newest concept we had, which is this minus minus slurp file command line argument. And that's an argument for saying, go fetch a whole other piece of JSON and stick it in a variable name. So it's not the input and it's not our script. It's extras.
[12:33]
This I also know. information this exactly this i also know and i'm going to give it this name and for a genuinely sensible reason that an external file can contain more than one thing slurp file always wraps whatever it found in an array so our external file contains one data structure that's a big dictionary describing every breach troy hunt knows about in the have i been pwned service but hypothetically it could have been five different small dictionaries or something so the reasons minus minus surf file always wraps it in an array is because it can't know what it's going to find when it goes off to that external file so it wraps it in an array for you and if it found multiple things it will have kindly collected them into an array and if it only found one well you have to remember to go to name a variable open square bracket zero close square, So I just sort of mentally tune that out. It's like, okay, there only was one thing, so I'm always going to the zeroth entry. So everything we have is going to be in the variable $breachDetails, open square bracket, zero, close square bracket.
[13:39]
And right there is where I complete being able to follow you. I have read ahead in your solution and I'm lost in the next piece. So that's where I got stuck. So keep going. Okay. So once I did my slurp file, we now know that that external data structure is a dictionary containing the details of each breach and the keys in those dictionaries are the names of the breaches. Breaches so dropbox will be a key that then gives you lots of details about the dropbox breach and linkedin so to to we have this bad delay i know it sounds like i'm talking over bart to everybody but he's not hearing when i start so that's why i keep talking over him um so breach details square bracket zero square bracket is the first breach whatever it is call it no it's no it's the first person no it just means it's the entire file is in zero because it just wrapped our file in an array. So it's actually, the dictionary is at zero.
[14:40]
Okay. All right. So if I want to get the Dropbox details, it's open square bracket zero, close square bracket dot Dropbox, or open square brackets Dropbox, close square brackets, both of the words. Okay, so what you wrote next was, so you wrote dollar breach details, open close bracket on zero, and then you wrote open close bracket on dot. What does that mean? Okay, so you're jumping ahead a little. So I will get you there. But right now I'm going to say, put a pin on that for just a second. So before we get to that, in our dictionary, it's important to know what it is that exists to describe each breach. So I ran $breachdetails, open square bracket, zero, and piped that to the keys function to get me a list of what does TrialHunt know about the different breaches. And so we have data added, all of this stuff, but the one we care about is called data classes. So I needed to figure that out. So this comes back to your thing about how do I figure out the shape of the data? I ran it through keys. Well, I found data classes. Okay. But I don't know how to do the glue in between. That's after. You've got something before data classes. Thank you.
[16:03]
Okay, I will do later. I'll still try to hold. Yeah, because the dot means the thing I'm processing now. So dot only has a meaning within a point in the code. So the question is, what does dot mean at that point in the code? Remember, dot is the thing I'm doing now. But it's not normally in square brackets.
[16:25]
Okay, but the square brackets... Has it been that long? Well, no. No, so it's in square brackets if we want to use it as the key. So if you put something in square brackets, you're saying, give me the dictionary entry with this value. So if you want the dictionary entry for the thing I'm processing now, then you would say the dictionary entry for dot. Dot is, so dot might be Dropbox, so then you're saying square brackets Dropbox.
[16:56]
So dot is the thing I'm processing now. right but we so we don't have anything being piped into here there's no pipe into here we're in the middle of we've ASCII downcased and found things that have breach search, to ASCII downcase okay so we are in okay so that line of code is inside our select any, Okay. Yeah. So the section of code starts by piping our list of entries to select Annie. And then we're using, we're saying for Annie, loop over every breach in the array of breaches. All right. We're saying dot value, open square bracket, close square bracket. We're exploding the value array, which is our list of breaches. So that means that every time that code from from line one to line seven it's happening once for every breach in the array so if we're processing the person let me scroll up to my data set if we're processing the person mw kelly that is going to happen one two three four five times the first time dot will be dropbox the second time dot will be ko mu the third time dot will be linked in the fourth time dot will be linked in scrape, and the fifth time dot will be PDL.
[18:19]
So dot is different each time it tries because it's doing an Annie. So the first time through, it's Dropbox. So we're saying, go into that file we took from elsewhere. It's wrapped in an array, so just take the first entry in the array. And then we're saying, in that dictionary, give me the key Dropbox. The second time we're going through, We're saying in that dictionary, give me the key, whatever I said was second, KOMU. The third time, give me the key LinkedIn. Okay, so let me see if I can see where you're going. So if the first time through, it's Dropbox, then that's what dot will be. So when we get down to the and, and then we're saying breach detail, square bracket zero, that's all the breaches. Dot inside square brackets for some reason is Dropbox. Not just dot but dot inside square brackets right because we're saying use the the thing we have now as a as a as a key in the dictionary like okay okay zero is how you get i think i see what you're saying yeah yeah so we're using it to go into the dictionary okay and then at that point then we can say and then we want to look at data classes within that so it's dot data classes correct and data classes is an array.
[19:40]
Of password, email address, date of birth. Okay, so it doesn't need to be in square brackets because it's already an array. Right, it's not a variable value. Data classes is an actual specific thing, whereas dot is different every time. You have to put in the square brackets because otherwise... So if we only cared about Dropbox, would it have been.
[20:03]
$breachedetail0.dropbox.dataclasses? That would have been perfectly correct syntax, yes. Yes. Okay. But so since we're looking at dot as it's going through, we have to put it in square brackets to tell it this is going to be that next key. Okay. I think I got it. I didn't think you'd be able to get me there. Yay. Okay. So now we have an array that contains things like passwords and email addresses. And we want to see if that array contains the string passwords. So remember I said that the contains function does something different depending on whether you give it arrays or strings. If you give it arrays on both sides, it will return true if the array contains all the elements in the array you pass as an argument. So we are saying contains square bracket, the string passwords, close square bracket. So in other words, if that array of data classes contains an entry passwords, return true. Which is exactly what we want. So that and will now successfully get true if the breach that we're checking contains passwords. So we first check to see our ASCII down case stuff, and we check to see if the enriched data has passwords in it. And only then do we return true. So we have actually just added an and clause to our existing logic.
[21:27]
The Magic of the And Clause
[21:27]
And that is really the magic key is that and clause and the fact that we were able to pull the and and this right. And now, you know, there's extra information. That's also a giant big step, of course, with the minus minus strip file. So that is our homework. And it was I'll be honest, it was a heavy lift for me because I had not been at it for six weeks. So I can only imagine, actually, I can imagine that the part of our audience who do their homework straight after listening to the episode were fine. Because it was all fresh in their minds. And then the part of our audience who are like us and who left it to the last minute probably struggled as much as we did. That would be my guess. Right, let us now move on. Entirely new topic. How does JQ think about variables? variables.
[22:12]
Understanding Variable Usage in JQ
[22:13]
So the documentation goes out of its way to say that JQ has been designed to minimize your need for variables. And if you're wondering, well, why? What makes it different? The key is that JQ is designed to be like the terminal. It's full of plumbing. You can make variables in the terminal, but you actually don't do it very often, right? How often do you make a variable on the.
[22:40]
Not very. Not at all. I do it from time to time because it saves me some typing sometimes, but most of the time you don't. And it's because JQ works in a very similar way. It's also why it's not needed here. So inputs and outputs automatically get connected together whenever you use your pipe symbol. So immediately, a lot of places you'd use a variable to store an intermediate state that you need to use later. But this is designed for data to flow from one step to the next. So you don't really need to store data to pass it on to future steps. It goes there automatically. magically. So that's one reason variables don't happen very often. The other thing is that within a pit of pipe, within a filter to use the correct term, every function within the filter operates on the same input. So they effectively share that. So when you ask for, say, length and you ask for or add, both of those are getting the length of the entire input and adding the entire input. So they have effectively shared a value without needing a variable because they're both working on the input.
[23:49]
So just with those two facts, that actually mostly covers why you generally need variables and other things. And the documentation puts it like this. There's generally a cleaner way to solve most problems in jq than defining variables so they don't say always but they do say.
[24:10]
Creating Variables in JavaScript
[24:11]
Stop and think so so my general approach is just to sort of go do i need to maybe i do and then i'll go on so i would like to illustrate the point by doing something very simple in a traditional programming language and i thought i'd blow some dust off our javascript knowledge and i would write in javascript a little command line javascript um a little javascript file that will average an array of numbers that are piped to it via standard in in json format now you don't need to know the hows and the whys but when you use the node.js javascript engine you can run it on the command line and it can behave like a bash script, You can pipe things to it because it understands standard in. Now, when we're working on the browser, that doesn't make any sense. So we're not used to dealing with standard in. But on Node.js on the command line, you can access standard in. And the way you do it is using something called the file system module or FS, which has a function called read sync.
[25:17]
To be honest, it doesn't really matter the details. I've put it in techie terms in the show notes, but honestly, const myNumArray equals json.parse the thing they piped into the script. So basically, NumArray is just the numbers they piped to the script. That's all that matters. So we're storing an array of numbers that get piped into the script. So now we want to get the average of those numbers. Well, to get an average, you add them all up and divide by how many there were. So we're going to need a variable to hold a running total. Then we're going to have to have a loop that loops over the array of variables and adds them to the total. So for const num of numArray total plus equals num. And then we can do a console.log total slash numArray.length and that will give us our average.
[26:09]
And we can see this in action by saying echo the string square bracket one comma two comma three close square bracket close string pipe node name of file dot js and that will send that string of json into that javascript function it will do the average and it will show you that it is in fact two great so we have now written in javascript the average for any array and it works but we needed two variables arguably we needed three variables because constant num of array is technically a variable definition as well. But even if we're going to be a little bit less picky about it, we had to store the array and we had to store the accumulator. So we needed two variables, absolute minimum.
[26:53]
You might say, well, Bart, it's actually possible to combine some of those steps, if you take it all and put it right into the for loop definition. So you can have the world's ugliest for loop. So you can say let total equal zero, then for const num of json.parse require sf readfile sync standard in in utf-8 please. Great, so we've now sucked the array in without ever saving it, right, the array has not been saved, it is entirely anonymous, no variable. We've looped over it, great. We have the total. How do we get the average? The array has no name. We never saved it. How many were there? Oh.
[27:37]
So our only solution is to make another variable to count the elements as we process them. So we have swapped saving the array to a variable for saving the counter to a variable. But you know something? No matter what way you try to fit this carpet into this room, it's going to pop up somewhere. You need variables to do this in JavaScript. There is no way to avoid it. We do the same thing in JQ, as I put in the show notes. Look, mom, no variables. Only we're not going to crash into anything off on our bicycle.
[28:09]
You simply say add slash length. That is the full JQ for doing an average. The add function takes all of the inputs and adds them together. The length function takes all of the inputs and tells you how many there were. Sum divided by number. That is an average. So it can remember two things at the same time. They're both processing the same input which is the array right so there yeah so they're effectively saving it for free you haven't had to make a variable it is the input jq has saved it for you and you're just both accessing it and so if you echo the same array to the jq version it will again yeah two because of course it's two but no variables so that's why most of the time Also, look how much simpler that code is, right? No loops or anything. You just say, add them all up, divide it by how many of them there were. Add slash length. It is an elegantly beautiful language for processing JSON data. You see why I like it, right?
[29:11]
So most of the time we don't need variables. But we have already seen the use of a kind of variable that I've told you is different because the variables are defined outside of the JQ. They're not in our filter. They're used by our filter. They're actually sitting out in the terminal and we pipe them in using minus minus arg or minus minus arg JSON or minus minus slurp file. So we have given them into the JQ filter from outside. And it's very useful to be able to pass something from outside to inside, right? We use it to make our scripts way more usable because instead of hard coding in one script to search for Dropbox, a different script to search for LinkedIn, you just write a variable name in your script and you use the minus minus arg to tell it a different thing to look for each time. So it's a way of making your script reusable to have external variables be able to be pushed in. And the minus minus slurp file let us know extra thing. So it allowed us to know information that wasn't our input or wasn't our script. And so that's useful. But we didn't make those variables in JQ. We gave JQ access to variables from elsewhere.
[30:23]
External Variables in JQ
[30:23]
That's why I call them external variables, right? It's an important but a subtle difference.
[30:29]
Yeah, I thought you were using, it was kind of a technicality there for a while, because I was like, you are so using variables, but they were external variables, so we will allow it. Yeah, they're generics is sort of the way to think about it, right? They're a generic value that's been passed in from outside. It's an X. You haven't made it be something. Someone gave it to you. Whereas making your own variable, you give it a value inside your code, which is a different thing.
[30:54]
Now, rarely is not never. ever. So as I've said, there is no way for us to take the default shape of a have I been pwned export, which is every breach a user is involved in. So it's a dictionary where the users are the key. And so Bob is in these three breaches and Alice is in these three breaches and Bart is in these 20 breaches because he's been on the internet for a long time. Right. But and that is great when when the question to be answered is, what do I have to worry about, right? I have never looked at my breach report before. This is the first time I've seen it. Where am I? And that's the first time you use, have I been pwned? That is the question you want answered. But once you have a subscription, you actually have a very different question. Because what will happen is you get an email notification, five email addresses on your domain have been caught into this breach we added yesterday. Well, now the question you have isn't, what breach is Bob in? Now the question you have is, who was caught up in that breach that happened yesterday?
[31:59]
Which means that you actually want people indexed by breach name. So you want the opposite shape. You don't want names with an array of breaches. You want breaches with arrays of names. And so we want to transform the lookup we get by default into the lookup the other way around. And that transformation cannot be done without variables. Because in order to transform it, you have to remember something after you explode it.
[32:29]
And we don't know how that's what i'm always saying don't explode it if you don't recollect it because you've forgotten the pieces well in this case we have to remember a piece and we have to explode and the only way to do that is a variable remember it blow it up but you still got it right right now that makes sense now so rarely is not never but i do always stop and think do i I really need a variable? And then I go on and use a variable. But I do actually genuinely stop and think every time, is this, am I doing it right? And then I move on. The key word for making a variable is as. So the operator we use is the as operator. And this is where we get back to the fact that the JQ designers really didn't want you to make variables very often, because in most languages when I talk about the assignment operator I mean the thing that makes variables but jq has an assignment operator it doesn't make variables it manipulates the input so we have used equals pipe equals plus equals minus equals to manipulate the input right so we take a dictionary which has three keys and then we say plus equals and now it has four keys but it's the input we're manipulating. We're not making variables, we're manipulating our input. And that is the assignment operators.
[33:58]
So what does JQ call making a variable? It calls it variable binding. This is actually a deep computer science technical term. So we have the symbolic binding operator as.
[34:13]
Variable Binding in JQ
[34:13]
So that is how you make variables. So as is the keyword that makes a variable.
[34:19]
So like saying is equal to, this is just the word as? Correct. Correct. I'm drawing attention to what he just said, because in speaking it, it's difficult to hear it. It's almost like you want air quotes around the word as. And in the show notes, Bart is always very careful to use the code syntax whenever it is a code word like as. But being able to see that something is in courier font, you know, monospace font for just two letters, it's kind of hard to tell in the show notes. So you have to look closely. Is he just using the word as, or is it as? Yeah. It's an interestingly named operator. It does actually end up in code that looks quite nice, because it's something that does a calculation as dollar variable name. So expression as dollar variable name is how you write it. Yeah, I'm not complaining. It's just hard to see sometimes. It's like, I want syntax highlighting the show notes that as would be in red or blue or something. I think it depends on your markdown viewer, because when we publish the show notes, it's in a different color and in a little box. And in my editor of choice, it's actually in a gray box. So I think maybe we have a little complaint against Visual Studio Code, isn't it?
[35:36]
Uh, no, actually I wasn't using Visual Studio Code right now. I have been using, uh, Marked to just to proofread the show notes. Oh, that's interesting. So Marked's theme is a bit lacking there because type of word doesn't really exist. Well, that theme. Don't, don't complain about Marked. I'm in the Swiss theme. It might be, I'll go through and try different themes and keep jumping in and interrupting you to tell you what it looks like now. My memory of Marked is I quite like the Swiss theme because it's quite minimalist and I like minimalist. That may be too minimalist anyway yeah actually i just changed it to the github theme and now i can see it in little boxes so yes change your edit your marked edit or markdown viewer to be whatever you need to be able to see as yes anyway the syntax is something to make a value as dollar sign variable name which is a good reminder that in jq all variable names must start with a dollar sign which we We don't find all that stressful because we've just done the bash language where that is also the case. So that doesn't stress us out too much. But it's something to make a value as $VARIABLENAME.
[36:47]
Setting the Stage
[36:47]
So, as a somewhat contrived example to keep things nice and simple, let us say that we don't just want to average two numbers. We want to return a nice English message that says the average of those four numbers was 7 or whatever. Well, now we have a problem that we have to remember how many numbers we had after we've done our math, which has turned all of our input into just a single number. And we can't do that without a variable. So the other thing I need to say very clearly here is that when you do an as, it is a filter in your chain that behaves like a pass-through.
[37:34]
So the input to the as filter and the output from the as filter are the same. So a bit like when we did debug, it didn't change the input. It just had a side effect of printing to the screen whatever the current value of dot is. In this case, the side effect is we make this variable come into existence, but we don't actually change what's flowing through the pipeline. So you pop it into the pipe at any point and it doesn't clog up the pipe. Right the input flows clean through right it's an important point you're adding a variable but that's a whole different thing right the actual flow of data has been unobstructed it is whatever came in is what comes out and it just flows neatly through so that's an important point to say so if we want to write this fancy pants uh this fancy pants average function the very first thing we actually need to do is save how many numbers we got. So the very first thing in our filter chain is now length as $numNumbers.
[38:40]
So length is something that will make a value. It's going to take the input and tell us how many there were as $numNumbers. So in this case when we call our function with the numbers 1, 2, 3, length will be 3 so $numNumbers will be 3. The input is our full array. The output is our full array. So when we arrive at the next step, which pipes that to add slash length, it's still the full array, right? Doing the length as dollar numbers has not clogged up the pipe.
[39:13]
Can I ask you why you didn't write it as add divided by dollar num numbers? I tested it and it worked. It absolutely works just as well. It is completely equivalent, yes. Okay, but you could have done add divided by length without needing a variable. You really don't need the variable until the next step. Correct, correct. Okay. Because you're absolutely right. Whether you get the average by dividing by the variable you saved or dividing by another call to length, they are the same. You now have the output of that filter is a number, a single number. Everything else is gone. That is what has fallen out of that part of the pipe. So, when we then come to make our string, the average of those blah numbers is blah, dot contains the average, but the only way we can write in how many numbers we had originally was to use our variable. Because otherwise we would have lost it. So therefore it works. We say the average of those dollar numbers numbers is dot. Dot being the current value. And when you pipe in the array 1, 2, 3, you get back the average of those three numbers Resist 2.
[40:27]
Nice. It's a somewhat contrived example, but the point is it shows... I like that kind of example where it's real simple and you can see exactly why it works and why it wouldn't have worked without it. And it proves it doesn't block the pipe, because the very first thing we do is our variable assignment, and we have not stopped the flow of the array, right? The array that we passed in has flown perfectly through to the next step where we added it and divided it by its length. So that is the key point. Defining a variable doesn't block the pipe. Now, this leads us to a very interesting question. Length returns one value, the length. But it's perfectly valid to say explode an array as some variable name. What does jq do if the expression before the as makes many values? It effectively does the same thing as exploding an array. It loops over all the possibilities. Only it doesn't change the input. The same input gets executed once for every value of the variable, but what changes is the value of the variable.
[41:45]
So you effectively clone. So not only does it not block the pipe, it can clone the pipe. So if you make the variable have five values, the input comes out five times and the variable has five values.
[42:00]
And it's not going through the pipe five times, it's actually five parallel paths, essentially? It will become five parallel paths. Yeah, just like when you explode an array, everything else in the filter chain happens that amount of times. Well, the same thing happens after you do an as, where you have multiple variables. The whole thing just happens multiple times. It is effectively a loop.
[42:21]
It really is effectively a loop. Which means that we can play with it a bit to demonstrate this. Which gives me an excuse to bring up another one of those wonderful little functions jq gives you for free it has a function named range which produces a range of numbers and you can use it in one two and three argument forms in its one argument form it gives you if you say range four it will give you zero one two three so it's the of course it will i know it's out by one so it's from 0 to n minus 1. If you give it n and m, so if you give it two arguments, if you say 0 and 5, it will give you 0, 1, 2, 3, 4. It will stop at m minus 1. Of course it will. But you get to start it wherever you like.
[43:13]
Or you can do three argument version, a start, a finish that isn't really a finish and a step size which means it will now go to from n to m minus s so if you say range 5 25 5 you will get 5 10 15 20 not 25 that's because computer scientists hate us that is because computer scientists hate us yeah but anyway it does work which means we can use our range operator to play with looping with the as operator. Because we can just say, give me a bunch of numbers. It's effectively a for loop. What we've got here is effectively a for loop, right?
[43:58]
Sorry, I accidentally hit my keyboard while I had some text selected and I jumped to goodness only knows where in my show notes and had to do a command Z or I made a complete mess. I hate when that happens. The joys of look mom, no variables, keep scrolling as operator. Here we are. So I was trying to think of an example we could use to really illustrate that it's behaving like a for loop. So I sort of fell back on first year computer programming where the first thing you learn to do when you learn about loops is to print out the tables. So we're going to do multiplication tables in JQ. So we're going to print out the, basically whatever size of tables you pipe in. So if you pipe it in five, you get the five timetables. And so our code is simply, we say range 1, 11, which is 1 to 10, as $n.
[44:51]
Then we pipe that to... so that's just put so dollar n is one the first time through to the second time through yes, and so on okay so so our input is the number we're going to pipe into our little script for printing at the table so it's going to be five for the five times table six for the six times tables right that's our input and that input is going to go through unchanged and then everything Everything else after that as is going to happen 10 times and dot is going to be the same, but $n is going to be different.
[45:30]
So the string, whatever we passed it, followed by the character x, followed by the value of $n, followed by the character equals, followed by the value of dot multiplied by $n. In other words, the actual math. And we're just going to have that as our, that is our final step in this very short chain, right? Two elements in the chain. Save me a variable, going to have 10 values, and then print this string. And so if we echo 5 into that script, we get 5x1 equals 5, 5x2 equals 10, 5x3 equals 15. So you can see that dot was 5 all the way through, right? The value that was passed in the current value was the same every time. What changed was the variable. Okay. So our input was cloned.
[46:25]
I know that's a silly trivial example but it's it's really clear to me the way it's presented i like that now you can catch the pieces just like you can with anything else so we could wrap all of this in square brackets so that instead of printing out a whole bunch of strings we get an array of strings so we could just wrap the whole kit and caboodle in square brackets and And now our thing will produce an array of the table. I have a question. So the original code, and I'm not going to read it, before you wrapped it in square brackets, said pipe that to quote, and it had the code that would say 5 times 1 equals 5, unquote. But your output does not show those as strings in quotes. But when you did it going into square brackets to make it an array, it is in quotes. Why is that? Correct. That is because what I forgot to say in English but did type in my show notes is that when we called jq we gave it the minus or flag for raw.
[47:29]
And I actually have a comment in the show notes that says minus or for raw output i.e. strings not wrapped in quote. So we say echo 5 pipe jq minus f name a file minus or. Okay, but you didn't do a minus r Why did you do one of them in minus R and one not? Okay, so minus R means I just want a list of strings, and it looks silly on the terminal for a list of strings to be wrapped in quotation marks. But when I wanted to give me an array of JSON as the output, well, of course, it has to be strings or it's not valid JSON. It's just... Okay. When it's an array, it doesn't make sense for it not to be strings.
[48:08]
It's nonsense for it not to be strings. So I didn't use a minus or. I didn't use a minus or when I wanted it as an array. Okay, good. Yeah, no, good question. And I actually did, I made a point of putting a comment in so I wouldn't forget to say that. And then I forgot to say that. So there you go.
[48:23]
Exploring Code Logic
[48:24]
And of course, there's no reason you can't embed this in other stuff. So you could update your script so that instead of taking a single number as the input, it could take an array of numbers and give you many tables. And then you could explode that array and then make your variable. So basically, you can nest these things as much as you like. So the final version is just to show you that you can contain the looping, just like you contain explosions by wrapping it in square brackets. Our final version of this silly script takes as its input an array of numbers numbers, it then explodes and catches the array of numbers. And then inside that explode and catch, it then catches the range, which is effectively exploding the variable $n, if you want to think of it that way. And so now we end up with an array of arrays that are our table. So if we give it 1, 2, 3, we get the 1 times tables as one array, the 2 times tables as another array, the 3 times tables as another array, and all of that is wrapped in an outer array. The only reason I put that example in is to show you that these variables behave just like exploded arrays. You can contain them. If you put them as a little child of something, it's contained. So that's the only reason for that.
[49:50]
Yeah, so I'm hoping that makes sense. Yeah, it does. Excellent. Right, well, that is actually all there is to know. But in order to understand the why we care, I have two very practical examples. And the first lets us close the loop on our lookup tables. So we're going right back to the original problem that me with my real world hat needed to solve, that I got so frustrated because I knew just enough jq not to be able to solve the problem but to know that it was the right tool I literally I knew it was the right way to fix this problem and I couldn't do it and that's why this series existed the whole reason we started on jq was because in real life I ran into this I said no we're going to solve it and this is this is ultimately me scratching my own itch completely so imagine you have a domain with thousands and thousands and thousands of users and you get an email from have i been pwned that says there was a giant big breach discovered last week and 300 of your users were caught up in it and it's your job to email all of those users and let them know and make them change their password, i needed to programmatically find who those 300 people were and the export i got from have i been pwned is, as we described earlier, the wrong shape. So we need to change its shape so that it's by breach, not by human being.
[51:17]
So to do that, we have quite a solid chunk of code here. And in my show notes, I show you the code and then we run it. And then I explain the code. So maybe in reading it out, I'm not going to read out the code and then break it down. So let's jump to the breakdown. So actually, that's how I read the show notes was exactly like that. I looked and I went, uh, where does he break it down? Yeah. So the first thing we need to do is remind ourselves of the shape of the data. So it's a top level dictionary named breaches, which contains, which is itself a lookup table of username to arrays of breaches. So MW Kelly to the array Dropbox, Kayomu, LinkedIn, LinkedIn, Scrape and PDL. That's the important thing. so everything we care about is in dot breaches and if we transform it the output the desired result of all of our work is a new lookup table where the keys are the names of the breaches so we're going to focus on dropbox and when we know we have it right when what we get back is is the key Dropbox has the array E green and MW Kelly, right? Because those are our two users caught up in the Dropbox breach.
[52:37]
So I'm hoping you can see the before and the after. Before it was J.O. Sullivan is an online spammer, MW Kelly's and all those things. After it's name of breach, who are the people? Name of breach, who are the people? Name of breach, who are the people? So that's the problem to be solved. And to understand that big chunk of code above, I'm going to zoom in. I'm going to zoom in on Dropbox, which means I'm going to zoom in on Egreen and MW Kelly. We're going to watch Egreen and MW Kelly become Dropbox.
[53:11]
Okay. So we're going to focus in on those bits of our example data to see what each line of code is doing. So the first thing in our big chain is that we take dot breaches and we pipe it to the two entries function, which is going to transform that lookup table into an array of little dictionaries, each with the key key and the value value. And in this case, what that means is that we get open curly bracket, key, e-green, value the array with one string, Dropbox, key, mwkelly, value the array with the strings Dropbox, Kayomu, LinkedIn, LinkedIn, Scraped, and PDL.
[53:56]
So we've taken our lookup table and broken into this list of entries, as JQ calls them.
[54:03]
Now, in order to get to a right answer, we have to do two, we have two problems here. That we have to do. We have to make individual entries for every unique mapping of person to breach. So we need to go from having MW Kelly represented by one entry to MW Kelly represented by five entries. MW Kelly Dropbox, MW Kelly Kyle Moo, MW Kelly LinkedIn, MW Kelly LinkedIn scraped MW Kelly PDL. So we need to, instead of that being an array, they need to be all separate entries.
[54:40]
And we have to flip the key and the value. That flip is easy, right? Once we have it broken apart into little pieces, flipping which is the key and which is the value is easy. You just flip them around. The hard part is how do we, how do we break it up? How do we turn one entry into five entries. That's actually where we end up needing our variable. So the before is that key MWKelly value the array. The after is key Dropbox value MWKelly, key column U value MWKelly, key LinkedIn value MWKelly, right? You get the idea. So again, I've shown that in the notes. So that's what we're trying to achieve. So how do we do that in JQ syntax? Well, Well, we're going to do our old trick of explode and catch. So around it all are our square brackets. And the first thing we do is we explode.
[55:33]
So we now have, as the current item being processed, one of our uncollapsed entries is now dot. So the first thing we have to do is save that username. Save MWKelly. So the first thing we do is say dot key as dollar username. name. We now have held on to the fact that we are about to explode all the breaches MW Kelly has caught up in. But we've kept the fact that it's MW Kelly's explosion.
[56:03]
Held on to that. So now we take dot value and explode it. And then we pipe the exploded parts of the value and we construct a new dictionary.
[56:14]
So dot value at this point is the five breaches. Correct. And so we explode that. So now five times we're going to make a new object. So we explode it and then we pipe it to open curly to close curly. So we're making a new dictionary five times. times the key is going to be the current value in other words the key is going to be drop box then the other one then the other one then the other one okay the value each time well whose breaches are these they're mw kelly's breaches how do we know because we remembered we kept it we exploded we've lost it only we haven't lost it because we kept it so we just pop in mw kelly every time. So now we have five little dictionaries that say breach name MW Kelly, breach name MW Kelly, breach name MW Kelly, which gives us the output we want. When we start at the very top, the thing we've exploded is all of the breaches, correct? So the very first explosion is... Explode and recollect the entries. Yes. So yes, Yeah, so we're exploding each individual entry, isn't it? Yes, yes, yes, yes. Okay, so what's coming through is a breach, right? And you grab the username for... Wait, have I got it backwards again? Okay, so what's coming through is...
[57:40]
What's coming through is a key with a username and a value, which is their breaches. But it's a whole bunch of those. So each time it comes through, it's got one key and however many values. And then we're going to swap them around and then it goes through it again. Or the next one comes through the pipe. Exactly. Exactly. Exactly.
[58:02]
Right. Because there's two explosions here, right? Because .value gets exploded. Right. And so when we do that, we now have our entries in the right shape for our final answer. But we now basically have key Dropbox value e green, key Dropbox value mwkelly. So we now have to recollect those to make a new array to collect them back the other way. And so we learned in the previous installment six weeks ago of the group by function. And so the group by takes an array of dictionaries and then it makes an array of arrays of those dictionaries and it puts them together based on some property you tell it of how it should put them together. And we're going to say put them together when they have the same key. So we're going to say group by dot key. And so now we have an array that contains... And dot key now is the breach name. Exactly. Exactly. Okay. So, now we're going to have a big parent array that contains a child array for all of the entries for Dropbox and a different child array for all of the entries for LinkedIn and a different child array for all of the entries for whatever other breaches there were. So, we now have arrays of these little entries.
[59:19]
And then the last thing we do is we take all of that and we effectively collapse
[59:28]
Grouping and Organizing Data
[59:23]
that down into our little array that we want where there's only one for Dropbox. So we need to take an array of entries and turn it into one entry that represents that whole array. And the key is exactly the same every time, right? It's Dropbox to one person, Dropbox to the other person, Dropbox to the other person. So we just need any one of them. So I chose to take the zeroth key as my key for my collected object. So I just basically say we explode it out and then we say the key becomes dot zero dot key. So fine, we just take the zeroth key. The value needs to become a fresh array. So what we're going to do is we're going to explode our entire array of entries, and then each time keep only the value. which is the username. So we're making an array that contains only the username. So our key has become the key from the first entry, and the value has become the values from all the entries. That is our collapsed answer. That is actually everything we need. We now have the right key and the right value. So we send that to two entries, and it makes a lookup table.
[1:00:45]
I feel like i'm going to do a comma when you said you chose the zeroth key why okay so we have an array of entries where it's entry one says key dropbox sorry yeah key dropbox value mw kelly entry two in the array says key dropbox value e green oh oh so any one of those could have been the key but you don't know how many there are but there will be a zero okay bing bing bing exactly there's always a first one right okay so that one i can know is there and then yeah so that that is that is why we take zero dot key okay okay.
[1:01:30]
We have been recording for longer than an hour because i have gone out of do not disturb Yeah. Oh, that's what happened. Okay. Right. So that is our first practical example, which solves our problem of changing the shape of any lookup table now. But I want to take us right back to our Nobel prizes because you've asked me a few times, wouldn't it be very sensible to have a simple lookup where I can say Marie Curie and you can tell me the year and the category of her prizes. Or Albert Downstein, what year for what? Like that is the most obvious lookup table to build from the Nobel prizes but we can't do it without variables because the laureates need to be exploded.
[1:02:16]
And if we explode the laureates, we lose access to what came before. Yes, yes, yes, yes. I wanted to do that like a year ago. Exactly, exactly. So let us actually solve that problem. I love this. This is really coming together now. So we take our whole big data set of Nobel Prizes, and we're going to wrap everything up in a nice catch and explode. So we're going to say dot prizes and explode it, but it's all safely contained in our square brackets. The very first thing we're going to do is two variables. We're going to remember dot year as dollar year, dot category as dollar category. So we have now safely known what year and what category. Now we're going to explode the laureates.
[1:03:01]
We put a question mark after it because some years there were no laureates because there was no prize added. So to stop our errors, we just say dot laureates, open square bracket, open square bracket, question mark. Stops the errors. so now we have all the little laureates i copied and pasted my logic from the answer for pbs 161 to always get a nice name whether or not they have a surname or not i just copied and pasted that logic so if it's the world food program it doesn't give world food program null it just gives us world food program so i'm not going to go over that again that is just copied and pasted answer from last time. So the key in our entries is going to be the name in nice pretty format. The value is going to be a new dictionary I'm going to construct which has a key named year with the value we saved in the dollar year variable and a key named category with the value we saved in the dollar category variable.
[1:03:55]
Then I'm going to say group all of those by the key which is the person's name and then I'm going to do exactly the same code we did before where we take the 0th key and then we basically make a single array out of all of the values. And now what we get is that for Marie Curie, we get back the array. Dictionary, year 1911, category chemistry. Second dictionary, year 1903, category physics. Marie Curie won the prize twice, once in chemistry, once in physics, 1911 and 1903.
[1:04:31]
So, the only problem with, well, I'm glad you taught it to us in this order, because I'd have been using variables all over the place earlier, when I couldn't figure out how to do stuff, I couldn't figure out how to go in and back out, and I'd just be going, ah, just use a variable. Variable do you do you go to prison if you use a variable when you don't need one in jq are there penalties i don't think you're at a prison i mean it would be less efficient on a big data set but the highest price you'll pay is your code will be longer more complicated and harder to maintain, so you're kind of putting you're putting your future self or your colleagues into extra exercise why is it harder to maintain i mean why is it well the more complex it is to maintain well because if you if you think back to the average example add slash length is going to be way shorter and simple than anything you do that involves an extra step where you have to pipe it to something to make a variable yeah but all the time in in your code learning javascript there were examples of where you said, okay, I'm just going to name this something so that I can use it instead of using the JavaScript that I could have used. You give it a variable name so you can reuse it and not have to write that again and again.
[1:05:50]
Even though it's it means something when it's written in JavaScript and I always feel like it's abstracted that I have to go back and say, okay, what did he define this variable as? Okay, he defined it from this thing. But I could have been just using that thing.
[1:06:06]
Okay, that's always true. Okay, but if you use variables where you don't need them in JQ, the end result is going to be code that's a lot more convoluted. You're going to be swimming upstream, which means you're going to be doing things
[1:06:21]
Code Efficiency and Best Practices
[1:06:20]
in a way that JQ isn't comfortable with. So the inevitable outcome of doing things the way the language wasn't designed to is your code will get longer and messier.
[1:06:28]
Because that's just not how JQ works. And so if you try to bash it into a shape it's not good at, you will be ugly code. And ugly code is difficult code. It's like if you try to use JavaScript to do something in a Perl-like way, you'll get icky code. If you try to use Perl in a JavaScript-like way, you'll get icky code. You'll probably eventually get it to work, but you'll probably have taken 10 lines to do what you would have done in five simple clear lines if you'd done a Perl thing the Perl way, a JavaScript thing the JavaScript way.
[1:07:00]
My general rule in any language is don't try to beat the language into your shape. Work with the language to do things the way the language is a language has a philosophy i mean languages are good at different things because they have different philosophies there were different design decisions made and if you fight those decisions you may end up on a road to nowhere or you end up on a road to somewhere but it turns out that you've gone 20 of different by-roads and gone through them, pothole-y mess and it will never be an elegant solution that's easy to read. It will be a hot mess. And in general, that's what happens if you're using variables when you shouldn't in JQ, you just make everything more complicated. And it won't Okay, fine. Yeah. It's inevitable. It's what happens when you don't do things in harmony with the language. So I have a challenge for you.
[1:08:01]
So, you can either start with my bonus challenge solution, or your own solution to the bonus part of the previous challenge, doesn't matter which. Either way, update the script so that instead of just returning the usernames caught up in one of the matching breaches, which doesn't actually tell us a lot, right? If I search for drop, and it came back with five users, well, were they caught up in drop box, or were they caught up in drop tables or something? What actually was breached when they were caught up in Dropbox, how bad was it? So what we really want isn't just all of the users caught up in a matching breach, but we'd actually like to know something about that breach. Tell me more. You did a match. These things are important, but tell me more. And so I would like you to return not just the usernames, but to return dictionaries for each match that give me the username, the breach name, the breach title, which is the human-friendly version of the breach's name. And that shouldn't say breach ID, that should say breach name. I wrote that in the show notes before I realized Troy Hunt called it a name, not an ID. Okay, I'll fix that. And also give me the list of data that was breached, i.e. breached data classes. All of that data is, of course, in our data enrichment part.
[1:09:31]
And then, if you'd like some even more bonus credit.
[1:09:37]
Yes. So for a little bit more bonus credit, use an or statement so that if your little search string matches either the name or the title, you'll accept them because some breaches have special characters in their name. I ran into this recently in the real world and I couldn't find the name of the breach because it had a dot in the so the title had a dot in it and the name didn't and it took me forever to find the breach because I didn't know what Troy had named it and so I ended up having to put this logic into my own code to search for either the name or the title because the title will be the bit that's in the news right how it looks in the news articles is what's going to be in the title field whereas the name is going to be some sort of a simplified thing with no special characters or whatever, So that's why you actually should search both. So if you're searching... How did you know you were missing it? Well, because I got an email saying 300 of your users were caught up in this breach. And when I tried to search for it, it got zero users found. And I was like, well, I've done something wrong. Okay. That gets back to the, you have to go look for it in order to find it. Yeah, and I guess as you're building up a script and you're debugging it, you are checking your work, right? That's sort of the act of developing is checking your work. And then when you're confident it works, then you use it to do the heavy lifting on its own.
[1:11:00]
Right, right. Yeah. Yes. So there we are. We have a challenge. That sounds like fun. To make our search. Yes. We're going to end up with a very useful search that tells us valuable information. And we're going to be able to find things, whether or not we do them on the English e-name or the simplified name.
[1:11:22]
Transition to Advanced Concepts
[1:11:18]
Right. So just to put us big picture, zoom out for a second. So at this stage, we're finished with lookup tables. In fact, we're finished with almost everything. We have one big piece that I have been waiting to do because it's really important.
[1:11:33]
That we do it when we're ready because it's really important and it is actually possible when you have a whole array of things or a giant big dictionary to apply some sort of a transformation some sort of logic to every element in the array without exploding it so instead of exploding you can do an in-place manipulation so instead of exploding catch you can do in-place manipulation which shortens your code massively and that is where we're going to go next and that is that is the last really important thing and then we're going to finish up with one final episode of all of the little bits and pieces that are all useful and that i want to have in show notes where people can find them but they all come with the caveat forget that i promise you you will need one of the things we talked about in the last episode.
[1:12:34]
Sometimes i don't know which one and for each of you it will be a different one right right, so having the like the canonical listing at the end yeah exactly so i don't know which of them you're going to gel with and which of them you're going to think is silly and useless, but we're going to go through them all because they're all the ones that i think are i do actually think we should cover in the series we should know they exist if not now and i have found myself rather surprised a few times where i've seen something and ah yeah that's cool but sure when would you ever need that and then two weeks later in work for real i go oh this is when, and so i i have i've surprised myself quite a few times where i've thought right who'd need that and then it's like oh me i need that okay sorry we're going to finish up with some of those advanced topics. So everything will be new, but not necessarily mainstream.
[1:13:28]
Precisely. And some of the, you know, depending on who you, the audience are, you may glom onto some of them and go, oh, cool. Or you may go, I don't care. Don't feel bad. Everything in the last episode is optional. It's all bonus extra. But I know it will be useful to some people. As opposed to the rest of the class that was mandatory. You know what I mean, right? I generally only cover things I expect to be broadly useful. Useful but that last episode i expect it to be all of it to be useful but not broadly, okay okay right and that will bring us to i think the end of what turned out to be a very enjoyable series but way longer than what i thought which was ah yeah three or four episodes alison this won't take long uh and i read ahead and didn't ask very many questions right.
[1:14:18]
Yeah well i just sort of meant the whole series right how many episodes i ended up doing on jq queue you know i don't anyway anyway we'll find out next time no in two times two times precisely right well until then folks lots more happy computing if you learn as much from bart each week as i do i'd like you to go over to let's dash talk dot ie and press one of the buttons over there to help support him he does 98 of the work here i'm just the stooge that listens to him and ask the dumb questions. If you go over to let's-talk.ie, you can support him on Patreon, you can donate via PayPal, or you can use one of his referral links. I really hope you'll go over and help him out. In the meantime, you can contact me at podfair.
[1:15:03]
Music.

Error: Could not load transcript. Please try again later.

Reload

Loading Transcript...