Transcript

[0:00] Music.

Launching the beta version of xkpasswd.net

[0:08] Well, it's that time of the week again. It's time for Chit Chat Across the Pond.
This is episode number 786 for February 3rd, 2024.
And I'm your host, Alison Sheridan. This week, our guest is Bart Bouchas with another installment of Programming by Stealth. Hello, Bart.
Well, hello there. Well, I've got a big thing to interrupt the show with before we even get a chance to get started.
Just a few hours ago, Helma Vanderlinden and I recorded Chit Chat Across the Pond 786.
I'm sorry, sorry, 785, the one right before this one. And we did that because we are now officially announcing the launch of the beta version of xkpasswd.net, along with the GitHub work that you can do and the help that we need in order to get this thing started.
And it's a wonderful episode where Helma explains basically how she kind of went around behind Bart's back and did most of the work to get it started.
Which I'm very grateful for. Right, right. Not in a bad way.
Extremely grateful. In an open source-y way, right?

[1:14] Yes, and even better than that, because in the open source-y way, there was no reason for Helmut to ever do me the courtesy of checking with me what I would like and talking through the details.
But Helmut chose of her own volition to do that and then to actually, like, you know, do what I said would be nice. Which is extremely kind of her.
And in open source land, there's no reason she couldn't have forked it and done whatever she wanted.
Right, right. She was extremely kind.
It's amazing. So it's actually technically working. And shortly there will be a URL people can go to. Don't know how soon that's going to be.
Depending on how wibbly wobbly timey wimey things are, depending on when this episode gets published and your episode gets published, Sometime in the very near future, beta.xkpasswd.net will host this new code.
And because my server is being shut down in three days, actually, my server is being shut down in three days. So this is happening soon.

[2:13] The WW website is going to redirect to the beta, but I'm keeping it branded as beta so that people understand that the features they love are not gone.
They're just not there yet. In between phases.
Yes, they are. They're on holiday. They'll be back, but they're on holiday.
Yeah, so as of right now, technically, you can press a button and generate some passwords, but it's the generic preset.
You can't set up your own presets. You can't choose from any other presets.
There's very little you can do beyond that, but you can press a button and get long, strong, memorable passwords that you can actually type.
And you can choose how many. Yes, you can choose how many. So anyway, we want you to listen to the episode. It's just chat across the pond, 785.
And Helma and I had a great time chatting and going through the process of how she got there.
And I tried to do it as a little more generic than, I'm sorry, a little more non-programmy. We tried to keep it as a high level as we can.
But it might stretch the definition of the word light a little bit.
But then again, we had an astrophysicist on to talk about her Nobel Prizes under the light topic. So yeah, Nocilla Castaways, we're made of strong stuff.
Our life is quite strong.

[3:26] I like it. I like it. We quote that. So with that's enough further ado, we should we should get stuck in on the work that we're going to do here today.
Indeed. So we are on our JQ journey, and I had promised you last time that we would spend this episode learning about all the different cool functions that JQ has for transforming data.
So we learned how to build strings using interpolation, how to build arrays using the square bracket syntax, how to build dictionaries using the curly bracket syntax.
And we learned how to do some, you know, basically the idea that we could transform the data into different shapes.
And then we sort of, we got a look at some functions like Ltrimster and OrrTrimster and stuff like that.
But I said, there's loads and loads and loads of functions. And we spent the next installment talking all about the functions. and then I realized that I was not doing things in the right order, so I've changed my mind.
We are going to learn how to use JQ as a true programming language today, because I am so many shades of fed up of putting all of my JQ code on one big line I can't read.
I would like to break it into separate files where we can put things on separate lines and not tie ourselves into knots.
So this episode is dedicated to saving all of our sanity by using JQ as a true scripting language instead of just as one big string.

Using JQ as a true scripting language

[4:47] Oh, that sounds wonderful, because it is a little hard to read.

[4:52] All right, and the more we're doing, so if I were to introduce you to all the cool functions, your one long string would get even worse.
Oh, okay. So I figured before I make your one long string worse, how about we stop doing it as one long string? So that is the plan for today.
And we also have a challenge solution.
And I believe that you had a very productive time of the challenge.
I had so much fun with this one. I spent a lot of time, and by a lot of time, an embarrassing amount of time compared to everybody else, but I spent probably six or eight hours almost getting to the conclusion of the first challenge, the primary challenge.
But I didn't quite get there. And so Bart said, I hadn't given up or anything, but Bart said, hey, you want to have a play date where we do a little buddy programming?
And I said, sure, that'd be fun. fun and he got me over the hump of the piece that I didn't get on the main part but I was so excited by that about that that I kept going and I succeeded on the extra credit as well you did and not only did you succeed because you did share your solution before you recorded and you did it better than I did your solution is better than the sample in the show notes um they're all if they can both be right Bart absolutely they are both correct and I always say there's an infinite number of correct answers, but yours has the advantage of being shorter, clearer, and easier to read. And I like pretty code.
So your code is prettier and I like that.

[6:22] Okay. Okay. I did, I did get a little assist from a GitHub co-pilot was part of it where I was asking it some questions.
And this is again, where I want to give AI a thumbs up is that I don't, I seem to to have trouble with remembering terminology, but I can ask it questions without knowing the terminology.
Yeah. I can say, okay, what's that thing in JQ, you know, where you got an array and you need to look inside it. What is that called again?
You're going to like open it or whatever to go. Oh, I gotcha.
Yeah. We're going to explode the array and it'll do it for, it'll show me the syntax.
You know, that's a silly example, but that's the kind of thing I can do.
And so I was asking it questions about like, how do I, how do I say I want to, I want to only get the elements of the dictionaries that have laureates who have surnames.
If it doesn't have a surname, I don't want it to have it. And it came up with a piece of code that I ended up being able to use.

[7:16] Excellent. And I do like that Microsoft called it co-pilot to make it very clear that this is not a tool to replace the human.
This is a tool to assist the human. And as long as you use enough words that are going to show up in correct answers, the fact that it's really good at language matching means it is going to figure out the right part of the Internet universe to go send you to.
And it tells you where it found its answer. So even if it's wrong, you're still better off because you know where to go look. Yeah.
And it actually was wrong, but the piece I needed to learn was something interesting.
So I got, it's like having a co-pilot who's just about as good as me, but it's better at remembering terminology maybe.

[7:57] Anyway. Yeah. Maybe the trainee co-pilot, but yeah, anyway, that's cool.
And also you, you, you learn something independently that we are going to learn either this installment or next.
I can't remember where it is in the show notes because I reorganized them all.
So I don't remember what's in which show, but you're ahead of the class on some things as well, which you were able to learn independently.
And that is the whole point of this, right? Because that's what real coders do in the real world. We don't know everything.
We just know, we just have enough experience to be able to look it up and to understand the answer when we find it and to be able to use it.
And the fact that you independently successfully used a very useful function is another reason your code is way shorter than mine.
Because you're doing it better. So what was the challenge?
So we wanted to transform our Nobel Prize's data set, which I have many critiques of.
I think it's quite dirty data, even though it has all the facts in there.
I don't like how they chose to do it.

Transforming data into simplified JSON file

[8:54] And I would like you to transform it into a new JSON file called NobelPrizeList.json.
That's basically a simplified version of the data, which is really very focused on on just the list of who won and what they won.
So it should have just four keys for each prize, the year, which prize it was, so was it medicine or physics, how many winners there were, and just the names of the winners' strengths.
So it's a much simpler data set than the data set they have, which tells you what share of the prize, who won, and all that kind of stuff.

[9:30] So to start building up the solution, solution, the first thing I figured to do was to filter our list of prizes down to just those that were actually won.
Because a bunch of the prizes, they have entries in the original data set because, they decided not to give them to people.
And then there's an entry in the original data set that says why they decided not to give them to people.
But I don't want those in the final list. So I I started by exploding the prizes array and then sending that into a select where I verified that the laureates key was of type array, making use of the type function to determine the type of dot laureates.
So where there were laureates, it would be array, and where there weren't, it would be undefined.
But obviously only array double equals array, so therefore only the prizes actually awarded would make their way through.
So, one thing that bothers me about this is that you have to study the structure of the stuff you don't want in order to figure out, like, you looked at it and went, hey, I notice that there's no array if there's no laureates.

[10:39] You have to look for the bad parts.
I guess it's normal, though. You go through and you try to do a query and you go, hey, what's this garbage?
Oh, I don't want that garbage. What's different about it? How can I exclude it?
Right, because usually you'll stumble across the need to exclude it by trying to do something and ending up with errors saying cannot index undefined or something like that. And you go, why is it undefined here?
And then you do a select where .lariats pipe type double equals undefined.
Then you have a look at those Nobel Prizes and you go, oh, you're weird. I don't want you.
And then, you know, you write a rule to filter them out. That's certainly how I've ended up stumbling onto these oddities in the data set.
You know, I get an error and then I try to figure out why.
So the next thing then is to start assembling what we do want.
So having removed the ones we don't care about, I then pipe it into a new filter where I start to build my dictionary.
And the easy The easy ones to build are the year, where we just basically say year colon, and then I take dot year and pipe it to the to number function, because I want my years as numbers, not strings. I would like to only do math on them properly.

Visual separation using parentheses for clarity

[11:47] You put that way into parentheses, and that's a way of, that doesn't change it programmatically in this case, right?
It's year colon, and you did parentheses dot year piped to number, close parentheses, just as a way to make it visually separate.
So you could tell what you were doing?

[12:04] Yes, and also because I don't know the rules for which dozen doesn't have higher precedence than bracket always.
Okay. So you're right that in the case of comma, comma would have won.
But I get very confused sometimes and sometimes my queries break and I just throw brackets around things.
I could go look up the order of precedence, but I'll just throw brackets around it to be really explicit. Well, it's like when you do a multiplication, right?
Three plus two times five. Well, two times five is going to win.
But if you put parentheses around it, then your brain can see it.

[12:33] Yes, that's a perfect example. You haven't... Yes, yes, exactly.
You've just been clear about what's real.
The prize then is just me renaming the category. So prize colon dot category, that's easy.
And the number of winners is just the laureates array piped to the length function.
So that gets us three out of four straight away, no problemo.
The last thing then is we need to build the array of winners, which is a little bit more complicated because A, we have to dive into the laureates array this time and then we have to build their names because the laureates array does not contain their names as one piece of data.
It contains a first name field and or, well, no, always, yeah, and, not and or, and maybe also a surname field.
There's always a first name, but there isn't always a surname.
So the easiest, as a way to get close to the final answer, we can say that the winner's array is going to be formed by wrapping inside square brackets to reassemble an array.
The explosion of the laureates piped into the string interpolation.

[13:45] Backslash roundy bracket dot first name, close that one, space backslash roundy bracket dot surname, close that one, close the string, close the array.
And that will give us a lot of the right answer.
So if we look at the prizes where only human beings won that is in fact the right answer so the 1903 physics prize has three winners Henri Becquerel, Pierre Curie and Marie Curie and they look perfect but unfortunately in 1904 the peace prize was awarded to the institute of international law null wait no where's that null coming from well that's our friend surname or sorry surname name not being populated.
And this is where your results and my results differed.
So you checked whether or not the surname exists and then use the alternate operator to go and do a different thing.
So in your case, on one side of your alternate operator, you were just doing what I just did there. First name followed by surname.
And on the other side of the alternate operator, you were just taking the surname.
Sorry, just taking the first name.
Right. That was in the extra credit version.
That wasn't the extra credit version. Because I did use the other method to get rid of it on the first, before going into the extra credit.

[15:08] Okay, perfect. Which gives me an opportunity to say that the way I sort of guessed people would have a go based on what we've already done in the series is to use the alternate operator to say either stick on the surname or empty string, which is at least nicer than null.

Improving the solution by removing extra spaces

[15:25] Oh, I didn't get that. Okay. So now we have Institute of International Law space, because the space between first name and surname is still there, but that's better, better.
So the last thing I did then for my simplest solution was to use the or trim str function we learned about for removing the quotation marks in an example in the previous installment to just pull that space off the end.
And then I had a working solution to get us to Institute of International Law.
Oh, okay. And that's full marks.
So I actually did half of each of those sort of in my solutions.
In mine, I just took the original solution you had that didn't work, and I said, trim off space null.
Yeah, that's actually clever.

[16:12] I was thinking about why you were teaching us about that, putting an empty string in there.
I was like, I wonder why he's telling us about that. I don't know where to use that information.
I was looking for an excuse to use the alternate operator. Oh, okay.
Well, so I did the rtrimster to get rid of space null in the first one, but in the second one, and that's where ChatGPT came in, was I found that there's a select parentheses has, and that means, so I was able to say select has, quote, surname.
And then I said give it, so in other words, if it has a surname, give me first name and surname, name and then the uh alternate operator wait alternate alternate yeah alternate operator and then just give it first name if that's all it had so i sort of did half of each and that is a really nice solution so my way of getting to full credit so what i said in order to get bonus credit so this was already full credit my way of forgetting bonus credit was to never add the space to then have to go and remove it.
And the reason your solution works is because of the power of the JQ only data type empty.

[17:24] So in JavaScript, things are a value or they are null.
And because the JSON language has a concept of nullness, there is N-U-L-L is a keyword in JSON.
And so null is supported in jq so jq needed to have like an emptier version of empty or a more nothing version of nothing than null so jq as a language invented a new data type called empty which means genuinely absolutely nothing like totally absolute absence of anything in jq is empty and so the select function when you look into the documentation it says that it either returns the thing you gave it, or empty.
So what you were doing with your select was evaluating it down to either the value for the surname or empty.

[18:23] And you can actually make empty yourself.
So there is a function in JQ named empty, which returns empty.

Using the "empty" function for absolute nothingness

[18:32] No matter what you do with it, it always returns nothingness.
So with the alternate operator, you can instead of saying an empty string or whatever, you can just say or empty.
So slash slash empty is something else you can do.
And I love the documentation for the empty function. So, this is the full documentation from the JQ official docs for the empty function.
Empty returns no results. None at all. Not even null. It's useful on occasion.
You'll know if you need it. Smiley.

[19:04] Okay, that's awesome. Well, you referred to it in your hint, but I couldn't tell.
I thought empty was a verb. verb.
I thought we were going to be using it to empty something. And so I looked at that and I couldn't figure out how to use it.
And I spent so much time, probably of the six hours I spent working on this before I talked to you, I probably spent four of it trying to figure out, show me the ones that aren't null.
And what I should have been looking for was the ones that aren't empty.
That is also true. Yeah. It turns out, I don't think you can search for is null.
I don't think that's a thing.

[19:46] Maybe you could get it as a return of a query? Like if it returns the query null, then...
Well, and also the type function will tell you if something is null, because if you sent something null into type, it will give you back the string N-U-double-L, which you could then do double equals against.
I couldn't double equals off that dang null. That saved my life.
I'll show you the... I wrote down every command I wrote that didn't work, and my notes are 358 lines long.
So I try with comments.
It's a type function though. So you would say type double equals null, not null double equals null. Anyway.
I tried type not equal to null and that didn't work.
Select that laureates dot surname type, pipe to type not equal to null and it didn't work.

[20:36] Possibly a bracketing issue, potentially. eventually the pipe wasn't happening in the wrong place possibly um so my approach to never having to take away the space was to use the fact that if you join an array if it has two elements it will put the space in and if it has one element it won't so my approach was to make an array of the names and to use the alternate operator to replace the surname with empty if there is no no surname, which means that you either have a two-element array or a one-element array, which when you pipe them to join will give you first name space surname or first name.

[21:17] Oh, because you also talked about the join thing. I was like, what is he joining?
So that was how I ended up fixing it, without using anything we hadn't mentioned before in the series.
So you said dot first name, dot surname, slash slash empty. Oh, you put them right up against each other, no spaces there. Is that on purpose?

[21:37] Inconsistency on my part. Both are valid. I usually do space them.
Okay, but slash slash empty. Okay, dot surname or empty.
So in other words, my array will either have one or two elements.
So when I join it with a space, it will only put the space in if there's two elements.
And it won't join anything if it doesn't find a surname. Right, because then it'll be a one-element array.

[22:04] Okay. So, you know, it is a solution. There are an infinity many of them.

JQ as a True Blue Scripting Language

[22:10] Yours is nicer than mine.
But there we are. But I learned more learning yours. So that's good.
That is true. And yeah, so, and everyone got to learn twice as much because they got both of ours. So there we go.
Right. So moving on to today's topic, then we are going to start treating JQ as a true blue scripting or programming language.
And so the first step in doing that is to, instead of saying JQ space, a filter space, some files to go read, we would like to say JQ, go fetch that file over there, which contains your filter and then apply it to that file over there.
So the first step in this process is the minus minus from minus file flag, or it's much shorter and friendlier friend minus F, which says get your filter from a file.
So jq space minus F space some file name dot jq space NobelPrizes.json will apply the content of something dot jq to NobelPrizes.json.
Okay, so the name.jq that's after the minus F flag is a file that is just going to contain the query.

[23:24] It's going to contain the filter, which could be as long as you like, because now you're free to have giant big filters all in that nice file.
Okay. So there's a couple of rules in what goes into that file.
Now that you have the luxury of having a file instead of a single string, string, you can do things like add comments into your JQ code, which is very useful.
And it uses the shell script style of comment. So once you get to an Octothorpe symbol or a pound symbol or a hash symbol or that thing with the two lines and the other two lines, whatever we're calling it today, from there to the end of the line is a comment.
So if you put them at the start of the line, the whole line is gone.
If you stick them at the end of the line, it's just that bit from the end is gone.

[24:06] So just like we've gotten used to in our shell scripting. So that's nice and easy. So I like to add comments to the top of my files.
So you'll notice there is a file in the show notes called pbs160a-1.jq, which is actually my challenge solution for the homework just sitting in a file.
So it's still all on one line, but it's just exactly the same string as in the sample solution, but in the file instead.
Instead and so when you run jq minus f pbs168-1.jq nobelprizes.json it gives you the same answer you would have got if you'd done the giant big whole thing from the sample solution okay so the first thing i did to make take us to the next step was to add a comment at the top of that file so there is a dash 2 version of the file which is it prettier so the first thing i did was add a comment at the top that says this JQ script refactors the Nobel Prizes data set as published by the Nobel Prize Committee into a simpler form.
The input it would like is JSON as published by the Nobel Committee and the output will be simplified JSON.
And so that's the structure I like to use for all of my .jq files.
What does this file expect to be given and what is this file offering you out? So input and output.

Code Layout and Styling in JQ Filters

[25:26] And yeah, there are other things that I put in comments which we we'll learn about later.
The next thing then is code layout. Now that we have a separate file, can we have stuff on more than one line, please?
So instead of it being this giant big mess. Yeah.
Can you break it any old place you want?
Yes, you can, because the way it works is the pipe and the comma separate your filters.
So the pipe takes the output of one filter as the input put to the next filter and the comma says do this filter and also do this filter.
So those two are effectively your end of statement.
So it doesn't care if you put new line characters in. It doesn't care if you put lots of extra tabs and spaces in.
So you can lay this out any which way you like because it will know that one filter ends when you meet a comma or a pipe and so it's perfectly happy. happy.
This then brings us to the question of, okay, so we have infinity possibilities. What should we do?
I checked the documentation to see if there was an official style guide, because in JavaScript there's an official style guide, which is why you can have stuff like JSLint.
To, or J-E-Lint, E-S-Lint, E-S-Lint, some amount of letters.

[26:48] E-S-Lint can apply the rules because someone wrote the rules.
So I thought maybe there's rules for J-Q and therefore I can do exactly the right thing and we won't have a big argument about this. There are no rules.

[26:59] So I looked at the examples in the documentation and I noticed the pattern followed a bunch of of similar languages so there are a bunch of languages out there for querying large data sets the term of art is data lakes so when you have a giant big amount of data that's unstructured it's often called a data lake and so if you live in open source land the app that does most of the data lake stuff is called splunk cool name for an app that we go looking for stuff you go spelunking around.

[27:34] And Splunk has a querying language called SPL, which uses pipes to separate the different parts of your query.
If you live in Microsoft land, the language for exploring your data lakes is called KQL, the custo querying language.
And I don't know why it's called KQL, but it is.
And KQL also uses pipes.
And both KQL and SPL always put pipes on their own new line.
So when you have one filter going into the next filter, they put the pipe at the start of the line, and then you can see your filters one after the other.
So I sort of think of it like a waterfall where one filter waterfalls into the next one.
Well, they just start the next one on a new line with the pipe at the front of it to show you, and I'm the next thing, and I'm the next thing.
That seems kind of clean, because like in Markdown, when you're lining up things in tables, it's a pipe, you know, it's just sort of a clean, this is a place to break. That makes sense.
I've been doing, I kind of played around with it just when I was trying to look at my code, Jesse what the heck it was and I was doing it after the pipe but it looks better at the pipe like the pipe starts the new line.

Using KQL and SPL in my work.

[28:38] Yeah, that seems to work best. And that's what the documentation tends to do.
And like I say, that's what you tend to get in KQL and SPL.
So that's what I'm used to. So with my work hat on, I live in both open source land and Microsoft land.
So I actually speak fluent KQL and SPL, which apparently makes me quite unique.
Most people pick one. I've just ended up doing both.
And now I have JQ on top of it. So I'm just using the same style I'm using in KQL and SPL in my JQ.
And thankfully so does the documentation for jq mostly so my three rules that i'm going to follow for the show notes and they are an invitation for others to do the same or feel free to do whatever you like because there are no rules but my three rules which are sort of guidelines, is all non-trivial filters by which i mean like technically speaking dot year pipe to number is two filters i am not going to put that on two lines because that will not make things clearer that will make things less clear.
So for anything where I want to break it up, I will break it up where the start of the filter starts a new line.

[29:43] If I am starting or ending a large array or a large dictionary, I will put the opening square bracket or the opening curly bracket by itself, go onto a new line and tab in.
And I will stay tabbed in until the end of that dictionary or array and then close it back out on a line by itself, like a code block. Now that's not exactly what you did in the example.
You did the pipe and then open squirrelly bracket for a large dictionary.
So it's not on its own line, but conceptually that's part of the pipe, right?

[30:14] Okay. So that's, you have then inside the filter you are then doing, but if you look at the very, very, very top, so the very first line of the example is open square bracket.

[30:24] Yeah. Which is exactly doing it.
So if you take rule one and rule two, you end up with the hybrid situation you see on, say, pipe space open curly.
That's sort of joining together rules one and two.
I'm going to ask a question because there's a... We'll go ahead and finish and then I'll ask my question.
Yeah, and so basically the filter separators I'm going to put at the start of my lines lines so that pipe space select not leave the pipe on the line above it you know dot prizes pipe and then a new line select i've just chosen to put my separators at the start it's right as i said they're my three guidelines so my question there's a difference between what was in the zip file and i can correct the zip file um and what's in the show notes you've got backslashes at the end of the lines in the text?
Are those just left over? I do, they are left overs of something else that the documentation says would work but doesn't actually work in real life and so I meant to take them out. I can take them out.
Looks like we're both taking them out. Yeah, which would be fine because Git will say that we've done the same edit and then we wouldn't complain.
Right, so they're not in the zip file so then that's fine.

[31:40] Precisely. As a little bonus extra. So now that I'm putting my JQ code in a separate file, I'm obviously going to be viewing that file in a code editor, which in my case, and I think your case, we have both settled on VS Code as the editor we love, which is an open source editor by, of all people, Microsoft.
And they named it after a product that used to charge two arms, two legs, and a house.
VS Code used to be stupendously expensive closed source poop.
And now they took the name and applied it to an amazing open source project.
It's free Microsoft. No, we take the name. We take the name.
I like it. So it's a completely different product. It has nothing to do with the old VS Code.
Sorry, it used to be called Visual Studio and this is Visual Studio Code. That's the difference.
Anyway, it's an open source project. It's gorgeous. And it has a plugin architecture.
So there are JQ related VS Code plugins.
There are two I recommend. end. The first one is a simple syntax highlighter.
It is the wonderfully named JQ Syntax Highlighting.

[32:46] You know, I've actually been writing my JQ just in COD Editor, just as plain text files, and it would have been a lot easier doing it.
I use VS Code for all my JavaScript and HTML and everything else.
I would have done it over there if I'd known this existed. Very cool.
So there we go. That's the first one. The second one I have also installed because Because what it lets you do is type JQ into, I don't know how much you make use of VS Code's command palette.
But if you hit shift command P on the Mac, you get like a terminal window for VS Code.
It's where you can type commands to VS Code. code.
And if you install the plugin vs-code-jq, you can type jq commands into the vs-code terminal while you have JSON files open.

[33:35] So if you have a JSON file open, you can just hit that key code, type JQ, and then type a filter.
And it will show you the output of the filter in the output pane in VS Code.
So you can use your JQ knowledge to search JSON files that you just have open.
So you could be writing code in any language, and you just quickly want to check for something in a JSON file.
You can just type some JQ straight into VS Code, and it will know what you mean.
So like you can search with regular expressions, you can now search with JQ if you install this plugin. So that's really cool.
I've had trouble figuring out what to do with that command palette.
Maybe that'll force me to try using it.
I do know you can open a plain old regular terminal right at the bottom, and that's kind of fun.
That is very fun. You can have multiple of them, and you can split them as well.
Yeah, I got real confused when I told it to do that. I didn't know that's what it was going to do. It's like, oh, no, there's too many of them.

[34:33] You can have a lot. So I used to have a Bash one and a ZSH one.
Whenever I was checking the differences between Bash and ZSH, I'd have them both open at the bottom of VS Code.
So I could check while I was writing those show notes. Yeah.
One thing that I don't think you mentioned or I missed it is at the bottom of, at the end of both of your JQ files that you're using as the input here, it says pipe at JSON.
Did you explain that? So the, I didn't really. So the challenge said that it needed to output the results as a one-line JSON data structure.
So in the format, you would expect it to come from an API.

[35:12] So that's what the at, so it's the formatting string. So last time we did at CSV to format a CSV. So at JSON formats as JSON.
So it's a JQ, it's a JQ filter that's formatted as JSON? Yeah.
Yes. And now I say format it as JSON. So you will notice that the Nobel laureates file, when you open it in a plain text editor is all on one line with no spacing.
That is the most efficient way to send JSON data across the internet. Oh yeah. Yeah.
Yeah. So JQ by default gives it to you pretty.
If you pipe it through the at JSON formatter, you will get it in that, compacted efficient format which is used to store and transmit json okay in the real world uh-huh yeah that makes sense that's all that's doing yeah now i recognize it a bunch of glop i can't read because it's all on the same line there you go yes which is why we use visual code stuff like vs code to pretty printer json as well because if you open a json file in vs code you can ask it to to format it sensibly for you.
Oh, that's another reason to play with it. Okay.
Yeah. So that is another, so that is the first thing we want to do with programming in JQ is to have our stuff in separate files. That is already powerful.
The second programmer's trick I would like to bring you is debugging.

[36:38] So I always tell you to visually imagine the shape of the data that goes into one filter and then what you would like it to be like when it comes out of the next filter.
And that will be the input to the filter after. And so in your mind, you're watching the data transform.
But if your assumption about how your filter works does not match reality, your mental model is getting wronger and wronger and wronger as you go from filter to filter to filter, right?
So I have told you in the past to build them up slowly by deleting everything after the point in time in the filter and seeing what it looks like.
Wouldn't it be nice to be able to put a probe into that point in the filter and just see what you have?
Yeah. So you're an engineer, so you remember those electrical probes you could stick on various bits of a circuit board to see what's going on there.
There exists a JQ function which takes as its input anything.
Its output is always identically the same as what you gave it as an input.
But what it does is write whatever you gave it a standard error.
So it makes no change to the data. It just outputs it to stderror.
So you can stick that filter anywhere in your chain and it will have no effect except to show you what the data looks like at that point in your chain.

Introduction to Terminal Plumbing and Streams

[38:04] So it's a probe. You stick into your query.
And it will appear on standard error as square bracket debug, and then some sort of value close the square bracket.
And if you send debug more than one thing, I think you'll get more than one thing here in the comma.
So to see what's going on, let's say, for example, we're going back to our...
We mentioned real quick that Taming the Terminal covered this, if people don't remember.
My apologies, because that's in my show notes. So I call this kind of a thing where you're messing around with standard input and standard error and all that, they're called streams on the terminal.
And you can mess with the streams, which I call terminal plumbing.
That is my own creation rather than official documentation.
But taming the terminal... Exactly. Taming the terminal 15 and 16 are called plumbing and crossing the streams. I may have been watching too much Ghostbusters.

[39:00] And they are episodes 15 and 16 and they talk all about the existence of standard in, standard out and standard error and how you manipulate them.
And so that is over on Taming the Terminal.
So we should say that the reason it's good that it's written to standard error, is because JQ's normal output is on standard out.
So your debugging isn't going to muck up your data.
So if you're writing some JQ to take one piece of JSON and turn it into another piece of JSON, it's coming from standard in to standard out, but your debug is on standard error.
Now by default they're both on your terminal but you could pipe standard out to a file or to another terminal command and standard error will still go to your screen.

[39:43] Or you can pipe standard error to a different file, your error file or whatever.
So you can poke and prod without mucking up your output.
So that's why it's nice that it's writing it to the other stream.
They're not on the same stream. That's the important thing. You're going to remind us in using this debug tool how to go look at standard error, I hope.
Well, by default, you're just going to see it, right? Because if you do nothing, standard error comes to the terminal window.
So by default, you're just going to see it. Oh, good. Which is nice.
So we're going to play around with an old JQ example where we are going to look at the command we built in the previous installment to render the details of the Nobel Prize for a friend of the show and Nobel Prize winner and Dr. Andrea Ghez.
So the filter basically starts with all of the prizes, explodes the laureates array, finds the one with the surname Ghez, and then it built a new string using string interpolation to basically say first name, surname, was awarded her prize for motivation.

[40:45] But I'm just going to stick a debug into the middle of the stream here.
So after the select for .surname double equals guess, I'm just saying pipe debug pipe.
So I just literally stuck it into, you know, we would have piped straight to the next filter, which does the string interpolation, but instead I'm just going to interject with a debug.
And when we do that, we see debug ID 990, first name Andrea, surname Gez, motivation for the discovery of supermassive blah, blah, blah, blah, blah, share for.

[41:17] So that is the content of Andrea's dictionary as it exists just before we do the string interpolation.
So that tells us all the keys that are available and what their values are, which helps us to do the string interpolation more cleverly if we were to do it again.
So I think it's actually, oh, yeah, yeah, sorry. It's a dictionary, but it's put inside of an array called debug.
Right, because that's the format debug uses. Yeah. Yes.
So that shows you you've gotten into the right one. You're where you think you are. And what the keys are called.

[41:49] Ah, yeah, yeah. Can you end it with a debug?
Oh, sure, you can put them anywhere. It will show you whatever is currently there.
So you just keep sliding that along. I could have used this last week, Bart.
I know, that's one of the reasons I, that's actually, our play date was when I decided to rearrange all of my plans.
I was like, no, we need to move this forward. Okay, I like it.

Separating Standard Output and Standard Error in JQ

[42:13] So because both standard out and standard error go through the terminal by default, the default output there shows us the debug followed by, Andrea Ghez was awarded her prize for the discovery of whatever, right?
So it shows us the result of the JQ filters and the debug all together on the terminal because both standard out and standard error connect to the terminal.
But if we redirect standard out to a file called, say, citation.txt, then what we will see is that the output on screen is only the debug statement.

[42:46] And citation.txt contains only the output from the jq so they are on separate streams so we they don't have to be mixed which is the key point right right so that is you know just highlighting the importance of the terminal ebit there so there are lots of cool things you can do So, you know, it is great that we can just put debug with no arguments and it will show us the value of dot effectively.
But we can actually do anything we like.
We can pass the debug command an argument that is a string and it will print us out that string instead of the value of dot.
So we can build our own debug messages, which can use string interpolation to include other useful things.
And one of the most useful things is the function named keys, which when you give it a dictionary, gives you an array of all of the keys that exist in that dictionary.

[43:51] So we can say debug open bracket and then the string, we have the following keys colon, colon then backslash roundy bracket dot pipe keys close that roundy bracket and that will pipe the current value of whatever it is you know wherever we are in the chain to the keys function and stick all of that into our string so now when you run the command it will say debug we have the following keys first name id motivation share surname, A lot shorter to read. It's not filling it up with all of the English and stuff.
It's only telling us the keys.
So you can imagine if your dictionary was really complicated, which had like maybe one of the keys had like an essay in it.
You don't want it to show you the full dictionary, right? Just show me the keys, please. So the keys function is fantastically useful.
Another one I like to do, particularly when I'm doing a lot of...
Yeah. The keys as it gives it to us, looks like it's almost in string interpolation format. at its backslash quote, first name backslash quote.
Why does it look like that? Because that's how JQ decided to implement it.
So the backslash is.

[45:06] Escaping the quotes. Yeah, so technically speaking, its debug and its value is a string.
And that string contains an array of strings, so it's gone So this is a valid string. We have the following keys, first name, surname, whatever.
But because it's a string that contains quotes, it's escaped them to make it a valid string.
Oh, I see what you're saying. And then this is back to why did this...
Okay, yeah, good. No, I like it. Gotcha. Yeah.
And actually, before I jump on to what I thought I was going to say next, I'm reminded that there is another function that the documentation mentions that I think is good to mention, is that you, sometimes you take one file and shove it into JQ, but you can take as many files as you like, right?
Because JQ says my first argument is my filter, unless you're using minus F, in which case it's a file name.
And then I can have a second argument is my first data file.
My third argument is my second data file. I can have infinity many data files.
So if you're working Working with multiple data files in a single jq command, it might be nice to know which file the piece of data you're debugging is in.

[46:18] Am I looking at a piece of data from file number one or file number 50?
Where is this piece of data that's causing me an error? Where is my problem, basically?
So there is a function called input underscore file name, which will tell you the name of the file JQ is currently processing.
So if you include that in your debug statement, then you know where you are.
So as an example, there are two JSON files in the zip file, ip-bartb.json and ip-podfeet.json.
We used them a few installments ago.
They just contain a little bit of information about the IP addresses for bartb.ie and podfeet.com.
And if we give both of those to the jq command, we can see what input underscore file name does.
So my jq command is debug processing the file backslash input underscore file name close back backslash comma ip is backslash dot ip address so basically it's a string interpolation that shows input file name and the value of the ip address key and the input to that jq command is ip dash star dot json so the terminal will expand that out to be both of those files okay.

Using terminal plumbing to hide non-debug output

[47:33] Now, because I'm only interested in seeing the debug statement and I don't want my screen cluttered with the actual answer to the JQ, I am using terminal plumbing to send the standard out to dev null, which is the computer's black hole.
So the only output we're going to see is the debug statements, which is just for our convenience here, because that's what we're interested in.
And what you will see on debug is processing file ip-partb.json ip is 37 139 blah blah blah processing file ip-podv.json ip is 104 21 34 so it's successfully telling us which file we're currently working our way through which could be very useful if you're working with a big folder of data i can barely keep track of where my problems are in one file so if it's multiple multiple files, and if they're formatted, you know, once formatted incorrectly or something.
Yeah. Right. Yeah. I mean, you may have a piece of dirty data breaking everything, being able to debug it out and see what are you and where are you is very powerful.
So the, you know, the input underscore file name is definitely very useful.

[48:42] So there are actually quite a few functions that are useful for exploring data structures.
So I I love using the length function in my debug statements because if you give the length of something, then you shove it through a select to filter it down and then you do the length again, you would hope that you have a reasonable difference.
So if I know that there are about 50 valid things in my data set and I start off with 500 and then I have 500 left, it's like, oh, that select statement didn't work.
Whereas if I'm down to 50, it's like, yay, that select statement is reasonable that's what I hoped you know or there should be half of these or whatever so just think when I was doing the is not equal to null.

[49:26] And then if it kept being the same number, it's like, well, I haven't found it yet.
I would know that it was still not. I mean, I knew it wasn't working, but I didn't know where it was working all the time.
Yeah, so I do that a lot. Give me the length of the array, do something, give me the length of the array again, and then see if what I'm hoping to happen has happened.

[49:44] Another useful one, if you want to just sample a piece of data.
So you might have a data set with 5,000 records.
And if you debug the whole data set, you're lost.
You're just in a sea of data. So the functions first and last take an array and give you the first or the last element, depending on which function you call.
So if you have 500 laureates and you just want to see what a laureate looks like, just, you know, laureates pipe first.
So didn't we learn that we could do it with zero and minus one?
Or minus, was it minus one?
Minus one as an array index, you absolutely can do it with an array index, but sometimes it's nicer just to have something really Englishy. Yeah.
You just have debug first, debug last.
Yeah. The other one is limit, which takes two arguments, which is a number of answers you want, a maximum number of answers, and a filter to go and make you some answers.
And so limit, so you could say limit five and then explode the array, and then it will give you the first five elements of the array.
But it could be limit five, do a select, and then you could see, okay, well, this select is only giving me things that look this shape.

[50:59] You may not want a thousand of them, but you might want ten.
Well, I would have liked that a lot in this. I started writing it out to a file because I couldn't see, you know, just this sea of data coming out of the Nobel Prize one. That would have been nice, yeah.

Checking for specific keys with the has function

[51:14] Yeah. Keys we've already mentioned, and has, which is the one you found with the help of Autopilot.
So has, and then takes as an argument a key name, will return true if the dictionary does have that key, or false if the dictionary doesn't have that key, which is very useful in select statements, but also in a debug statement.
Because if you're saying, well, I'm pretty darn sure all of these things should have a citation or whatever, and then you just pipe it, you know, has whatever you're looking for.
And if you see true, true, true, false, true, true, true, it's like, oh, my data set has a mess in it. Why is there a false in here?
And then you can do a select where whatever equals, you know, is false or whatever and have a look at it. But just being able to see a sea of trues versus a sea of falses can be very useful. So it has key name. I use a lot as well.

[52:02] A structure I will often do for myself, as I was saying, is to do a length before and after a select statement.
Now, that does mean that I need to turn it into an array before and after because the length function expects to be handed a whole array.
So if we say prizes pipe it to debug length that tells us how many prizes there were to start with, we haven't exploded them yet right it's just dot prizes to length then we pipe it into another filter which has open square bracket as its very first thing then inside that array we explode so dot and then explode it we pipe the exploded content to select and then we close our array again.
So what we have now done is we have built a new array which only contains the things that match the select.

[52:58] But it's a whole array. We haven't, we've exploded it, but contained the explosion.
So we started with an array and we've ended with an array.
We don't have individual pieces. We actually have an array again, which is something we weren't able to do until the previous installment.
And up until the last time when we learned about the square bracket syntax, tax, once we exploded an array, there was never any way to put Humpty Dumpty back together again, which meant we went out of our way to not explode things we needed to keep in one piece.
Well, this is how you reassemble Humpty Dumpty. You just put it all in square brackets.
So if we do that to debug length, explode, select, recollect, debug length, what you will see is that the amount of Nobel Prizes that happened before 1950, we go from 670 total Nobel Prizes to 245, that were before 1950.
Okay. Yeah, yeah, yeah. So, very useful. As I say, if I'm.

[53:55] If I'm working with a big data set, I like to use first and last, just to see.
First and last are useful because if you're dealing with edge conditions, if the first one is outside of what you wanted, that's actually going to be very obvious, right?
So if you want stuff less than or equal to 2000 and the first one is 1999, it's like, oh, that's an edge problem.
Or if you wanted, you know, less than but not equal to 2000 and the last one is 2000, well, that's an edge problem. So first and last are great for looking for edge cases.
Just to verify for yourself that you really are bounding things correctly.
Like it should have been a less than or equal to, but you used a less than or something. Yeah.
And also you can debug. So if you have, if the filter that you pass contains an and also operator, then the function runs.
The comma. Yeah. Yeah. Yeah, so it'll do it twice, basically, inside your debug.
And we can use the limit to give us a bigger sample.
So we can say debug limit five semicolon to separate our arguments and then explode the prizes, and then we will be debugging just five prizes.
Okay, let me think. So you were talking about the and also operator, the comma, but you didn't use a comma in that one.

[55:23] Yes, I did. Debug first comma last.

Debugging with "first" and "last" together

[55:30] So in my example of using first and last, I actually use them together.
So I say debug first comma last.
Okay, sorry. And then I show the output. That one was long. It was scrolled over. I couldn't see it. Okay, so...
Okay, okay, got you.
And as I say, that gives us the bounds, right? The first and the last.
Okay, that's good. So we see those twice.

Using keys function with dictionaries

[55:57] When working with dictionaries, I like to use the keys function.
So if you want to get the keys for the prizes dictionary, we can say debug first pipe keys.
So we take our prizes array, we take the first one, and then give me the keys of it. So they often go together as well.
I could have used that a whole lot sooner. Right.
This whole installment, I was thinking, gosh, we should, we could have used this a whole lot sooner. Yeah. But, but I wouldn't have known to care.
Right. Right. I wouldn't be as excited about this because I wouldn't have been going, oh, I got to open it up again.
Let me look up, you know, open the whole file and look through it and read it. And this is much better.

[56:35] Yeah. And as you've already discovered, we can use has to tell us whether or not something has a key, but a very powerful thing so we've already met the all function which will return true if all of its arguments evaluate true so you can use all in conjunction with has to make sure that every dictionary has a laureates array interesting yeah so instead of getting a whole list of true false true false true false it'll either be all true right hopefully or if it's all false like like oh there's dirty data here at least one of these data things in my data set is not like the others i got some work to do sing it sing it martin one of these things is not like the other not like the other one so at this stage we're doing pretty well for ourselves so we now have our jq filter as multiple lines nicely laid out so the more complicated it gets we don't get lost, we have the ability to add a little probe into the various points in our ever more complicated jq filter right because they're growing here and they're not going to get smaller when we learn more so we can lay them out so we don't confuse ourselves we can probe them with debug if you put it in a file.

[57:51] Wouldn't it be great to be able to give it some arguments into the file?
It's like, if I'm searching for Andrea Ghez's Nobel Prize, how different is the logic to find Marie Curie's Nobel Prize?
It's identical logic, just somewhere in my query is going to be the string G-H-E-Z instead of C-U-R-I-E, or whatever way it's spelled.
Okay. Okay, so wouldn't it be great to be able to pass in some parameters into our JQ filters that are now nicely separated out as separate files?
And the nice thing is you can.

[58:31] So this is going to be our first encounter with a much bigger topic we will come back to in a few installments, which is variables in JQ.
So this is a use of variables in JQ.
There are many other uses of variables in jq but jq is an interesting language, because the jq documentation makes it very clear that unlike in other programming languages where the first thing you learn is variables and the most fundamental thing is variables in jq variables Variables are considered an advanced feature and discouraged unless you're doing something suitably advanced.
And I'm going to quote you from the documentation because I have discovered something while writing these show notes.
The documentation for specific functions is quite, I would say, not beginner friendly is me being polite, but the documentation explaining the philosophy of JQ is actually very good.
So, this is what the documentation tells you about JQ's approach to variables.
Variables are an absolute necessity in most programming languages, but they're relegated to an advanced feature in JQ.

[59:49] In most languages, variables are the only means of passing around data.
If you calculate a value and you want to use it more than once, you'll need to store it in a variable.
In JQ, all filters have an input and an output, so manual plumbing is not necessary to pass a value from one part of a program to the next.
Many expressions, for instance, A plus B, pass their input to two distinct sub-expressions.
Here, A and B are both past the same input, so variables aren't usually necessary to use a value twice.
For instance, calculating the average of an array of numbers requires a few variables in most languages.
At least one to hold the array, perhaps one for each element of the loop counter.
In JQ, it's simply add slash length.
Now, add is a function that takes as its input an array and adds all the elements.
Length is a function that takes as its input an array and tells you its length.
If you add all the elements and divide it by the length, you get an average.
So that is the full JQ filter for averaging an array. Add slash length. Wow.

[1:01:10] No variables, right? Absolutely no need for any sort of variables or loops.
And that is, the JQ documentation on loops also says, don't use these most of the time. You won't need it.
Do you know who wrote this? This was the team that finally, after a year and a half of arguing together, two camps, you know, Sally wanted variables and Joe didn't. And they went back and forth, back and forth.
And finally, Joe was like, okay, fine, but we're going to make them feel stupid it if they break down and use them make them feel lesser if like you're you're you're not you're not understanding the spiritual philosophy if you break down and use it that means there's something you didn't understand it's a sign that's funny you say that because the closing sentence is definitely that sentiment so there is generally a cleaner way to solve most problems in jq than defining variables in other words if you're defining a variable double check that you're you're not doing things the hard way because you might well be doing things the hard way.

Variables in JQ for passing values to filters

[1:02:10] Still, sometimes they do make things easier.

[1:02:14] Fine. So you can build variables in the body of your filter.
But that's what the documentation just told us is usually not necessary.
We're not going to do that yet.
There is a way to do it. There's a whole operator for it. It's called the as operator. we will meet it in our very, very last JQ installment.
I have saved it to the end in keeping with the spirit of the documentation.
The last installment will be called Advanced JQ and we will cover variables in Advanced JQ.
But that's good because we won't grow to lean on them.
Precisely. And I don't want people to because there's usually a better way, add slash length, right?
Every time you're thinking, I need a variable, think add slash length.
And you go, well, maybe you don't.
But a really useful place for them is to take arguments from the jq terminal command to pass a value to a jq filter in the jq language.
And there are flags on the terminal that allow you to specify variable names and values that you can use in the filter.

Understanding Variables and Functions in JQ Language

[1:03:23] So I'm going to give you a practical example because that sounds like a word salad.
What I do also need to tell you is that inside JQ, so when you're in the JQ language, so inside your .jq text file, all variables, their names are prefixed with the dollar symbol. That's how jq knows it's a variable.
Function names are just bare names.
Variables are dollar something.
So it's always dollar. So it's not x, it's dollar x.
And now I'm going to confuse you. Because in the terminal, the dollar symbol has meaning.
So it would be very awkward if the jq terminal command made you put the $ in.
Because then the terminal would think the $ was meant for it, and it would just break everything.
So the jq terminal command does not use the $, but when the variable appears inside your jq file, it will have its $.
And it's a case of, don't break things in the terminal to fix things in jq.
Do the right thing in both places.
Okay. It's confusing, but sensible.
So there are actually four ways of passing arguments into the JQ.
So from the JQ command into a .jq file, and we're going to intentionally ignore two of them.

[1:04:53] So there are flags called minus args and minus JSON args, and they will produce positional arguments, they're called.
And they're really awkward to get to, because they don't go into your code as like $1 or something.
They go into your code as $args in all caps .positional open square bracket their index.
So the first positional arg appears in your code as $args.positional open square bracket 0.
The second one appears as $args.positional square bracket 1. That is horrible.
So we are going to use named arguments for our own sanity.
So we're going to do them using minus minus arg and minus minus arg json and they're going to let us make a named variable, these guys are a little weird we're used to terminal commands having options that have minus minus something space one value this is minus minus something space one value space another value, the first value is the name the second value is the value So, to pass an argument named x, you say minus minus arg space x space the value for x.

[1:06:10] Okay, we're getting real abstract here on what we're talking about.
I know, I'm seconds away from pulling this in, but I've got to tell you it, and then I've got to show you it.
I'm a little closer. Okay, keep going. You're a little closer.
So the two arguments are minus minus arg space namespace a string value, which means it will appear in your JQ code as a string. So of type string.
The other thing you can do is you can pass it in a data structure as JSON.
So minus minus arg json, space name space, needs to be followed by some json.
So if you want to pass an array, you would use minus minus arg json.
Because that way you can specify an array, or a dictionary, or whatever you'd like. So that's the difference in minus minus arg and minus minus arg json.

Using Minus Minus Arg and Minus Minus Arg JSON

[1:06:59] So, what if we wanted to pass a variable named dessert with the value waffles?
Because that's the kind of thing we do here.
So we can say jq, now I'm using minus n to say don't wait for any input, right?
So jq minus n, and then I'm going to give it minus minus arg, space dessert, space waffles.
So whatever is in my filter, there will exist a variable named dollar dessert that will have the value waffles.
I'm going to then make use of the fact that I now know about the debug function.
And I'm going to debug the interpolated string, I like, backslash, round bracket, dollar, dessert, close round bracket.
And that will debug out nicely, I like waffles.
So we didn't give JQ a file, and we didn't give it, all we gave it was arguments.
There's nothing for it to be querying. Right.

[1:08:00] Right, I said no input. Yeah, I know. I know you said that, but what is it?
If you're using JQ, you're not querying a JSON file.
You wouldn't query. I'm just doing a debug to show you... Nothing, I'm just doing a debug to show you the variable working.
I tried to simplify it, and I think I've confused you by trying to make the example simple.
Well, good, I just didn't know you could use JQ with no input.
But we've done that repeatedly. When we were demoing the math, did I reorganize that into next week? I may have reorganized it into next week.
I don't remember the order of my own show notes anymore. Do you know about the post operator?

[1:08:35] I don't think so. I don't think you do. But it's possible. Remember, now we're working with my memory, so that's not a good test, Barton.
Okay. So anyway, so we're able to just not pass it.
JQ, we're not telling it to query anything.
And we haven't written a filter. We have.
The first argument is the filter, debug, open round bracket, quote. That's the filter. So debug is a filter. Okay.
Well, debug is a function, and this filter consists of a call to the function debug, if we're going to be really persnickety about it. Okay. All right.
So this shows us we're able to use $dessert inside the JQ, but you use dessert when you're talking to terminal, without a dollar.

[1:09:22] There we go. There are the two takeaways. We have made a variable named dessert with the value waffles.
On the terminal side, no dollar. on the jq side it has a dollar all right so if we tried to so minus minus arg always makes a string so if i say minus minus arg n 42 and then i say dollar n is and then backslash n is it greater greater than 100, and then I do a string interpolation $N greater than 100, the output will be true.
42 is greater than 100.
What? Because it's F, it's the string 42. String.
Precisely. And numbers come after letters in alphabetical order, is that right? No, 4 comes after 1 in alphabetical order.

Comparing Strings and Numbers: Alphabetical vs Numerical Order

[1:10:22] So the string ba comes after the string a, a, a, a.
The string 42 comes after the string 1, 0, 0, 0. Oh, okay, if you're comparing it to numbers. Okay.
If you alphabetically compare numbers, 42 is earlier in the alphabet than 100.

[1:10:43] In the same way that a, a, a, a. Then why did it come out true?
Well, because 42 is after 100. 100. So greater than, yes, it is.
You just said it was before 100, alphabetically.
I, okay. I'm sorry. I am dyslexic. I made a mess of that. Anyway, the point is it's a string comparison.
Okay. I know you've tried that before, but it still hurts my head.
Right. Alphabetically, it's the wrong way around. Yeah. Because you shouldn't compare numbers alphabetically.
That's why I'm so cranky that the Nobel Prize people put the years in as strings instead of numbers. Well, we learned to number.
Exactly. So how do we get numbers in as arguments?
And the answer is we tell JQ that we should treat 42 as a piece of JSON, not as a string, by using minus minus arg JSON.
Then 42 will go in as 42, which means is 42 greater than 100 becomes false.
As it bloody well should.

[1:11:53] In your show notes, you say alphabetically quote 42 unquote sorts after quote 100 unquote.
Should that 100 be without quotes around it? No.
Well, no, because it's doing a string comparison, so they're both strings.
That's what's actually happening there.
And quote 42 unquote sorts after quote 100 unquote. Yes.
42 as a string sorts after 100 as a string, even though numerically it's before.
Okay. I don't see why. C is after A. A, 42 is after 100.

Understanding the Order of Words in a Dictionary

[1:12:42] Okay. I have to think about why. I can't see it. Okay. I thought I understood that one before.
In your dictionary, does the three-letter word bad come before or after the one-letter word do?
Are you saying... Sorry, the two-letter word do.
Do bad would be before do 100 is before 42 no but bad is before do because it starts with a b not a d not because of the number of characters before 42 100 is before 42 because it starts with a one even though it's got three digits that doesn't matter it starts with a one, okay okay there i saw it not that that was i bet that wasn't interesting to anybody but me okay Okay. All right.
So now when we, if you give it rjson n 42, you say rjson allows it to be a number?
Correct. So the variable named n is now coming in as the json 42, which is the number 42.
Because in json syntax, 42 is a number, not a string.
So then when we do a comparison, it is a numeric comparison.

Optional Variables and the Annoyance of Full Names

[1:14:01] Now one very last thing what if you want to have a variable that you don't always want to have, so what if you want to be able to pass a variable sometimes and you'll see why in the worked example, if you just try to do it and you can't basically you try as you might you can't say if dollar my my variable name is null.
It will just break. If you ever, ever, ever use a variable name and it doesn't exist, it's happening at compile time, it explodes your script.
So you can't have an optional variable name using the variable's full name. It's really annoying.
I tried every which way from Sunday. It doesn't work.
The only way to have optional variable names is to use the the variable's long name.
So when I make a variable named x, it appears as $x, which is nice and short.
It also appears as $args.namedx.

[1:15:06] So $args.named is a dictionary. Wait, space? No, .x. Okay, you didn't say the dot.
That's what I was talking for. $args.named.x. Exactly.
So $args.named is a dictionary where all of your named arguments go. go.
And so $args is the variable which always exists. So then the question becomes does or does it not have a particular key, which the has function will tell us.
So an optional variable is going to be $args.named.nameofkey.
So you can say has $args.named and then your name of your key.
So that does allow you to have optional variables that just long syntax, but it works and you can use the alternate operator to get a default value.

[1:15:49] So if you say $args.name.sumargs slash slash sum default, that will always evaluate to either the argument the user passed or your default.
And so then you can use optional variables inside your JQ.
So I've now been very, very abstract.
The whole thing I've been getting you to is searching for laureates by name.
So let's write a JQ script that we can search not just for friends of the show, Dr.
Andrea Ghez, but anyone's Nobel Prize.
So we already have our code for searching for a particular person.
So as a starting point, we have that code, which is in the file pbs160b-0, which search for Marie Curie in this case, by saying, take the prizes, explode them, select where any of the laureates have either a first name or a surname that contains Curie.
So we basically say first name, surname, stick them into one string and then check if it contains Curie.
Does that make sense? Yes.
So that code allows us to search for Curie.

[1:16:56] We can run that script. So we can say jq-f pbs160b0 nobelprizes.json.
And that will tell us that Marie Curie won three prizes, 1935 and 1911 for physics and 1903 for chemistry.
So let us replace the string curie with our variable, which I'm going to name $search.

[1:17:22] So pbs160b-1 is identical to the other file, but the contains, instead of saying contains the string curie, now says contains search, $search.
So to use this, we would now say jq minus f pbs160b-1.jq minus arg search curie.
Thank you. That is very important. It won't work otherwise.
Minus minus arg space search space curie space NobelPrizes.json.
So that will now run our script. No dollar in front of that search.
Jeez, that's going to drive me crazy. I know. I know.
I did warn you. so that we'll find exactly the same answers yay now what happens if we try to use our script with the search query with the lowercase c.

[1:18:16] You get no results. Boo! This gives me the opportunity to highlight a very important programming trick and to teach you a new function.
The function asciidowncase will convert a string to lowercase.
If you asciidowncase both sides of a contains operator, then it will do a case-insensitive search.
Both sides of it? You have to pipe? Right. asciidowncase pipe to your contains pipe to ASCII downcase?
So ASCII downcase pipe to contains and inside the contains $search pipe ASCII downcase.
Oh, you're saying you're ASCII downcasing the input and you're ASCII downcasing what you're searching.
Bing, bing, bing. Okay. Much better said than I did. Okay.
Yeah, so if you do, if you convert everything to lowercase and then do your contains, it will find it regardless of case, which you can prove to yourself by running the dash two version of the file and searching for Curie with a lowercase C and it works.
Okay. So we have successfully made a script that takes an argument named search so we can search for any Nobel laureate once we have figured out how to search for one Nobel laureate.
Which is an example of why you want variables.

Challenge: Adding Minimum and Maximum Year Arguments

[1:19:36] So, your challenge now. Yes. Sally was right when Joe didn't want to have variables.
Exactly. So when you break your code out, you do want variables.
So your optional challenge, should you choose to accept it, is to take my example of searching for Nobel laureates by name, and make it a bit cleverer by optionally having a minimum year and a maximum year.
And if they don't provide a minimum year, then it should be all prizes from the start until perhaps a maximum year.
And if they don't provide a maximum year, it should be all prizes from either the minimum year or the beginning.
So you should be able to find all of the curies before 1911 or all of the curies between 1901 and 1912, right?
So basically a minimum and or a maximum year, both optional.
But you always have to give it a search string. Okay.
I think I understand that.

[1:20:41] And actually, I've just realized I didn't make it. Okay, so actually, for full credit, you don't have to make them optional. For full credit, you just have to make it always work with search min and max.
And for bonus credit, make min and max optional. Sorry, I forgot to say that.
Oh, wait. Isn't the first...
Just don't do min and max that would make them optional.
No, no, sorry, I misspoke. So the actual challenge for full marks is just to make it work where there are always three arguments.

[1:21:13] So don't worry about making them optional. just solve the problem of there will always be a search string and a minimum year and a maximum year and make it work that way and then if you can make it more flexible and make the last two optional so for full credit you don't have to do the the difficult part of making them optional i misspoke there okay got you boy this was a lot this was but on the one hand it's only three things, But it's three things that enable us to write arbitrarily complex JQ without killing ourselves, without driving ourselves insane.
So we can now put our JQ in a separate file, lay it out however we like, stick in little probes to tell us what's going on where, and pass in arguments.
They're the three things we've learned today, but they allow us to script with JQ.

Explanation of Syntax for Multiple Variables

[1:22:11] Okay. Okay.
Did you explain what the syntax should be for more than one variable?
So minus minus arg space name a variable space value can be used as often as you like.
You just do it again? Just minus minus arg? Just do it again. Just minus, okay.
Yeah. It's weird. Sorry, I know I skipped saying it because I was going to say it's weird for two reasons and I told you one reason and then I carried on.
So the first thing that makes it weird is that it's minus minus value value.
Normally it's minus minus one thing.
But in this case it's minus minus one thing and then another thing.
And also normally you have one minus F or one minus whatever.
But here you can have as many minus minus args as you like.
Which is weird. So it is weird.
Yeah, that is a very strange way to do it.

[1:23:05] Okay. I think I understood all of this.
Got a little stuck in the alphabetical order of numbers, but other than that, I think I pulled it off.
Which is not technically JQ related. That is, the lexical sorting holds true in all programming languages and confuses all programmers forever.
It is a fun way to make first-year programmers in CS100's heads explode.
So I think I finally got it. It's because the four comes after the one.
It didn't have anything to do with how many of them it was. Okay.
Yeah, it just looks so wrong. Your brain is just like, no! Oh, how, how, how?
But it's because you're doing it because you're treating them as letters, not as numbers.
Great, great. Lexically. All right.

[1:23:49] That is going to do it for now. And then next time, we're going to get to learn about all the cool functions, which is going to make our JQ filters even more powerful, where we get to do really fun stuff like deduplicating arrays and sorting arrays and lots of fun stuff that JQ can do.
But that is all coming up next time. And until then, happy computing.
If you learn as much from Bart each week as I do, I'd like you to go over to let's-talk.ie and press one of the buttons over there to help support him.
He does 98% of the work here. I'm just the stooge that listens to him and asks the dumb questions.
If you go over to let's-talk.ie, you can support him on Patreon, you can donate via PayPal, or you can use one of his referral links.
I really hope you'll go over and help him out.
In the meantime, you can contact me at.

[1:24:39] Music.

CCATP_2024_02_04

An audio podcast where Bart Busschots is teaching the audience to program. Associated tutorial shownotes are available at https://pbs.bartificer.net.

Chapters

Long Summary

Brief Summary

Tags