Generated Shownotes
Chapters
0:00:07 Introduction to Chit Chat Across the Pond
0:00:32 Diving into JQ Programming with Bart Bouchard
0:01:11 Manipulating Data with JQ in Programming
0:01:41 Exploring Mathematical Operations in JQ
0:01:48 Understanding Assignment Operators in JQ
0:02:06 Enhancing JQ Filters with Additional Variables
0:04:33 Reviewing PBS 160 Challenge Solutions
0:04:38 Different Approaches to PBS 160 Challenges
0:07:33 Dealing with Optional Parameters in JQ
0:14:07 Introducing Mathematical Operators in JQ
0:22:36 Leveraging Min and Max Functions in JQ
0:23:51 Power of MinBy and MaxBy in JQ
0:24:23 Expanding Functionality with C Math Functions
0:28:01 Exploring Additional Math Functions in JQ
0:44:47 Assignment Operators and Values in JQ
0:53:29 Overloading Operators in JQ
0:59:23 Transforming Nobel Prizes Data with JQ
1:05:26 Manipulating Arrays and Dictionaries in JQ
Long Summary
In Episode 787 of Chit Chat Across the Pond hosted by Alison Sheridan, Bart Bouchard joins as a guest to delve into Programming by Stealth. They explore the use of jq as a programming language on segment 161. Topics include manipulating numbers, strings, arrays, and dictionaries, as well as math functions like abs, floor, and more. Borrowing mathematical functions from the C library is also covered, with a focus on rounding functions like seal and round. Bart explains JQ's handling of these functions, especially when dealing with negative numbers, offering valuable insights.
The discussion shifts to utilizing JQ for cleaning messy data in JSON files to streamline manipulation processes. Starting with assigning values to variables, they progress to using assignment operators within the processed items. Emphasis is placed on data cleaning to avoid logic issues, with examples provided such as adding keys, ensuring prizes have laureates arrays, and standardizing display names. String manipulation functions like ASCII downcase and upcase are introduced, alongside challenges to sanitize Nobel Prizes data. The importance of addressing data problems early on to prevent complications later is stressed, with a nod to the future relevance of JQ skills in projects involving JSON APIs. The speaker underscores the critical role of data understanding and organization for effective data manipulation through JQ.
Brief Summary
In Episode 787 of Chit Chat Across the Pond, Bart Bouchard joins Alison Sheridan to explore programming with jq. They discuss manipulating data, using math functions, and cleaning JSON files with jq. Emphasis is placed on data cleaning and early problem-solving to prevent complications later on. The importance of understanding data for effective manipulation with jq is highlighted.
Tags
Episode 787, Chit Chat Across the Pond, Bart Bouchard, Alison Sheridan, programming jq, manipulating data, math functions, cleaning JSON files, data cleaning, problem-solving, understanding data
Transcript
[0:00] Music.
Introduction to Chit Chat Across the Pond
[0:08] What's that time of the week again? It's time for Chit Chat Across the Pond.
This is episode number 787 for February 17th, 2024.
And I'm your host, Alison Sheridan. This week, our guest is Bart Bouchard's back with Programming by Stealth.
And this time we're on 160 of X. We are learning a lot, Bart.
161, actually. We're one further than you think. Oh, no. 161.
All right. Well, I will fix the show notes now.
Diving into JQ Programming with Bart Bouchard
[0:32] Okay. Your show notes, because my show notes say 161. okay i'm sorry i had 160 up because i was working on the homework right up to the last minute so i will show notes now too that'll help even more right well i mean if we do last week's i'll be way better i'll have practiced but i read all of this week's episode this week's show notes so i'm raring to go i'm this is gonna be fun okay well last time we learned how to use jq as a more traditional programming language which means that we are now at liberty to write more complicated complicated filters because we can spread them over multiple lines, we can put them in a separate file, we can throw in debug statements.
Manipulating Data with JQ in Programming
[1:12] And that sets us up nicely to learn about the many, many, many ways in which JQ can manipulate numbers, strings, arrays and dictionaries.
And it's so many things it can do, it's going to take us more than one installment.
So today we're going to get started by looking at how to do math, which which is not fun, but it is important.
What are you talking about? Math is the best.
I never enjoyed it. I did manage to do well enough, but I never enjoyed it.
Exploring Mathematical Operations in JQ
[1:41] I know lots of mathematicians shouting at me, I didn't like biology either.
And I know lots of you don't like my beloved physics. It's all good.
Understanding Assignment Operators in JQ
[1:48] Then we are moving on to assignment operators, which is allowing us to set something equal to something, which we have not actually done yet.
We've created dictionaries from scratch, but we haven't actually said, here's a dictionary, now make the year be 2022 or whatever. We've only made stuff from scratch.
Enhancing JQ Filters with Additional Variables
[2:07] Well there is a halfway house between taking it as it is and making it from scratch it's manipulating what's there already and then we're going to finish with some functions for turning you know messing with strings which should get us through today but of course I set you a challenge at the end of the previous installment which was to some extent to practice at this idea of having your code in a separate file and laying it out and also to get used to the idea that you can pass pass variables into the code once it's in a separate file so that your separate file is generic because you can use it to search for Marie Curie or do you say there was someone with Barton their name who won a Nobel Prize yeah there's uh what's this character Derek Barton so I did a search for Barton I was looking for a name that maybe might be in there twice but not a million times like I looked tried Robert but that was too many I wanted to just look for a few you know, right yeah and there's only I think Feynman only has one prize so yeah the second one I did was Feynman that's why I thought okay Feynman's got to have a bunch I was like why does Feynman only have one and Einstein only has one so he's in good company.
[3:17] Anyway, in PBS 160, the final example of the day was where we had a separate file that would search the Nobel Prizes by name, but it would search all the prizes.
And so the challenge was to add two additional variables so that you could say, well, actually, I want all I want the prizes that match this name between a minimum year and the maximum year.
So the new part was two extra variables and adding extra logic into your select.
And there are, of course, an infinity number of correct ways to do that.
I chose to work within one select.
And so I took the code that was working in the example and inside my select, select, I just said and, and then one piece of logic, and, and another piece of logic.
So I just basically turned it into a three condition select with the and operator.
And what I added in was a check for dot year piped to number being greater than or equal to min year piped to number, and dot year piped to number being less than or equal to max year piped to number.
And that applies the appropriate filter.
Reviewing PBS 160 Challenge Solutions
[4:33] Which is enough for full marks.
Different Approaches to PBS 160 Challenges
[4:38] Very good. I took mine just a little bit differently. I did a select the year to number greater than or equal to the min year.
[4:49] And then and so I did the and inside of a single select, if that makes any sense.
And I did that first. And then I looked for the laureates within whatever data it found at that point.
Yes. So I actually did one select, whereas you did two. So I have my select for checking for my laureates and the year and the year, whereas you have a select for the years piped to another select for the laureate.
Right, right. It's the same logic, right? Yours uses far fewer parentheses, though. I kind of like that. I'm not going to lie.
There's a lot of parentheses when you do a select and an and inside of a select, where the things, the year to number has to be, all those things need to be in parentheses to be able to always be right. I noticed those parentheses got important later.
Yeah. Yeah, they do. They definitely do. So that was enough for full credit.
But for bonus credit, the question was, could you make it so the parameters are optional so that you can you know you always have to give the name you're searching for but could you only say give a minimum year and say yeah i don't care when after 1942 but i want everyone named bub after 1942 or i don't want anyone later you know i want everyone to be say a max year of 50 or whatever or everyone in the last century say um you know maximum of 2000 or 1999 99.
[6:12] Or maybe both or neither of those conditions at the same time.
So allow both conditions to be left off or one to be specified or both to be specified.
And the mildly annoying thing with how JQ handles variables is that it replaces the variables with their value at the very, very, very start of compiling your script.
So if you mention a variable that doesn't exist, whether Whether or not the logic will ever cause that line to execute, it doesn't matter.
If the variable is in there, JQ gets very, very, very confused and goes, and stops.
Just error, go away. Don't know what that variable is. Never heard of it. There is no max year.
And so the way around that is to use the longer version of our argument names.
So if we pass in minus minus arg and then give it a name and a value or minus arg json and give it a name and a value, we get a shortcut, which is the name with the dollar prefixed.
But it also exists in a long form, which is $ARGS in all caps, dot named, dot whatever we called it.
So if we're going to make it optional, we have to talk to it with its full name.
We can't use the $ shortcut because that will exist sometimes, won't exist other times.
Dealing with Optional Parameters in JQ
[7:33] And if it enters our code anywhere, JQ will be very cranky if it's missing.
[7:38] And then there are now an infinity of infinity of ways of dealing with the optionality.
So you found a fairly elegant way, actually, that I think reads easier than mine.
But I did it my way and wrote my show notes. So I'm going to go with my show notes.
Well, I like yours because it gives us the fun of explaining how to run everything backwards because JQ doesn't have the command that you want. I like this.
We're going to let Bart explain it first, and then I'm going to see if I can replicate the explanation.
Nation right so my approach for the first part for the full credit part was to have three conditions inside one select so check my name and check my min year and check my max year and so in order for that to pass i need to have true true true true and true and true so i figured what i want to happen is that if there is no minimum year i want that second and to always be true and if If there's no maximum year, I want a third and to always be true.
[8:44] So to make that happen, I was like, OK, so how can I detect the presence of minYear?
Well, args.nameD, if it has minYear, then it was passed.
So OK, args.nameD piped to the has function with the argument minYear will give me true if it's present.
[9:05] Which is the opposite. I don't want to short circuit things when it is there.
When it is there, I actually want to check it. it.
So I needed to make it be false when it was there. So I piped the has to not because there is no has not.
So I figured you take a has and you pipe it to a not. Hey, presto, we have a has not.
So that makes the args.named has was true when you say not. Now you've turned it into a false.
I've turned it into a false. And I have that on the left-hand side of the alternate operator, operator which means that it will evaluate to false if the argument was passed and the rule for the the rule for the optional or the alternative operator is that if the left-hand side evaluates to empty null or false then it will do whatever's on the right and return whatever's on the right so what do i have on the right i have dot year to number greater than or equal to to args named e min year to number. In other words, on the right is the logic.
So if the argument is present, then the right happens and the answer will be either true or false depending on the logic.
If the argument is not present, the answer is true, so the thing on the right never ever happens and the AND is auto-satisfied, and the same logic is then duplicated for the max year.
[10:27] Okay stop stop one more so say that last thing again so if if arc's not named i don't know why you keep saying name d i think of it as named like it's i'll tell you why i keep saying it name d because there is remember to take the d no the dns server that is the most popular dns server in the world is called name d for name demon and i am so used to seeing name d that i just say name d it's named you're right you know for an audio podcast it's actually probably more more clear people might hear args.name if we're if we're slow if we're late on.
[11:01] That so i'll call it name d just for consistency here so anyway so you've got args.name d uh you said if if it's true that it has a min year that's going to be true the not turns it into a false and if it's false or null that means it goes does what the alternate operator says to do over on the right hand side bing bing bing okay but if it's if it doesn't have a min year no min year has been uh provided past yeah past that makes that false the not makes it true which means the stuff on the right hand side doesn't happen correct so what is the min year then okay so the alternate operator returns the thing on its left or the thing on its right so if it's true the alternate operator it returns true, so you get AND true.
[11:58] There is no min year, right? But if there's no min year, what does it use for its search criteria? It does the whole file.
No, true. No, true. So what does true mean? It means that it will not apply a minimum.
So that whole and is as if I deleted the and.
[12:20] Okay, that's what I said. I said, so does that just mean it doesn't have a minimum year so it uses the whole, uses everything up to the max year.
[12:29] Yeah, okay, sorry, now it also turns into true. Yes, sorry, that's, yes, yes, because it only goes in one direction.
So it goes to infinity in one direction, and the other one goes to infinity in the other direction.
And the reason I spent so much time doing that is because I didn't want to have to hard-code in the year.
For the minimum, that's not a problem, because unless someone invents a time machine, there will never be a Nobel Prize before the first Nobel Prize.
But there will hopefully be more in the future so i didn't want to type in 2023 so i went out of my way to avoid putting in a maximum year you know it's funny when i was writing mine i wasn't thinking about putting in the minimum and maximum years in the alternate operator i was thinking about If I wanted to narrow my search within, in fact, I could start doing other funny things with the way I did it, but I did hard code in those numbers because I was thinking that I wanted to manipulate them as opposed to wanting them to be the default two ends of the current data file.
You're right, it won't be good next year, but the way I did it, I was able to mess with it and prove that it was fine, whether it would find our little friend Derek Barton, because I was searching for Bart.
I was able to manipulate both of those numbers and make it find it or not find it in order to test my logic to make sure it was working, and I think it works.
[13:57] So basically, I said select year to number greater than or equal to the args.named.minYear or 1970.
Introducing Mathematical Operators in JQ
[14:07] And then on the other one, args.named.maxYear or 2023.
Yeah, and that 1970 to make it match all possible back to infinity would need to be 1901, I think.
Oh, 1901, sorry. You were saying something about 1970. You're right. That would be 1901.
One yeah okay you showed me the sample you were still testing and so you had a 1970 in there and i got very confused and we all got very confused okay and i think i find i find three instances of curie i think which you said you are correct that is yes um maybe i'm only finding two well anyway that's uh maybe not as oh no no sorry three found three instances yeah so i think it's working Excellent.
That was a fun one. It wasn't as hard as the previous weeks.
[14:58] Yeah, the bonus credit was where the tricky bit was.
And I'll tell you what I did learn from that is, is GQ pretty new?
Because the internet's not full of responses about this.
I typed in $args.named and didn't get any results. results.
Like, or I, I said, okay, all, uh, you know, um, uh, optional arguments, JQ.
[15:27] Now, nobody's talking about it. Nobody on Stack Overflow, what is it, OpenAI, whatever, one of the chat GPT kind of things, I asked them, now they didn't know anything about it.
They just kept repeating how to do regular arguments.
I had very, I did not find much about it. I eventually figured it out, but.
[15:48] I think the majority of people use JQ for short little snippets on the command line.
So the putting it into a separate file and making your jq complex enough to need arguments is rare that's power user stuff, look at us go we're power users we are power users so running around the house asking steve if he wants me to um analyze any json for him and he said what's json okay installment how many far back is that anyway let us power on to some mathematical operators uh and functions um so So, JQ gives us some operators that do math, it gives us some free out-of-the-box functions that do math, and it lets us steal some math functions from elsewhere.
So we actually get our math from three places with JQ. So, we will start with the most obvious, which is the operators.
So, like every other programming language we have met, plus means addition, minus means subtraction, the star symbol means multiplication, the forward slash means division, division and the percentage sign means the modulo which is the integer remainder when you divide.
So 5 modulo 3 produces 2.
[17:03] Right. That one I had a lot of trouble with the first time you taught it to us in the first language you taught us.
Yes. Something I'm going to draw your attention to is that there is an operator missing that you may have expected to see.
Plus plus and minus minus are not on the list. They are not supported in JQ.
[17:24] So that's got to make looping trickier. Oh, well you don't loop that way.
You've been looping all along, right? Every time you exploded an array you've been looping.
You've just never had to be explicit about it. This is not the loop you're looking for.
Precisely. Okay, precisely. So in the examples below, I'm going to make use of the minus N flag a lot of the time.
Oh, actually, that sentence has been left as a little island all by itself.
That sentence that says note the use of the minus N flag in the examples.
I don't know what example is that belong to. I don't know when I typed it, but it should be right there. I noticed that was in an odd spot, but I had gone off looking.
In fact, I asked my little AI, what's the minus n mean again?
And then later on, I came up to it. And then I was, I'll see if I can find the right place to put it. It should be just above the table.
Okay. So basically, instead of writing, having to put all the other stuff around it, you're just using minus n so you can make quick little examples, right? Correct. But I won't be doing that for some time.
[18:27] So anyway, the other thing I'm going to make use of in the examples to save us a lot of typing is I have two files sitting in the installment zip.
One is called numbers.json, which contains It has an array of numbers, minus 42, 0, 3.1415, 11, and I thought you had said you were going to add an extra number into that file.
I did, but you said you didn't want me to change it that way, so I changed it someplace else.
Oh, okay. When there's negative numbers, things get weird. Negative decimals, things get real weird in some of these commands, but I built a little table.
You have pulled my changes, right? I have pulled your changes, so there will be a surprise when I get to them.
[19:05] I'll explain it. the other thing we'll be using is a file called menu.json which i think we used in previous installments which is an array containing dictionaries of three food items the all important hot dog with a price of 5.99 and there are 143 of them in stock there are pancakes they're cheaper there's less of them in stock there are waffles they're the most expensive item in this restaurant and they're the rarest at 14 waffles in stock um something else to note it's a little gotcha that caught me out.
I had sort of assumed that if you give these math functions a string they would give you an error to tell you you've done something silly. Some of them do.
Some of them don't give an error and don't do anything.
The classic example is the abs function for absolute value which if you give it the number minus 9.999 will give you 9.999 which is the the absolute value, but if you do it as a string... No, no, no. Oh, sorry.
Then you get back the string, minus 9.999, which is definitely not the absolute value you thought you were getting. I would rather it gave an error.
I would rather it get really cranky than silently do nothing.
[20:19] So my advice is always to number everything you're going to do math with, because if you're wrong about the type, you may be wrong in your math.
Worst case scenario, you've used up a couple of characters you've typed for your carpal tunnel syndrome allotment.
Precisely. And depending on what you're using JQ for, if you're piping it into some sort of command to send off to a Mars rover or something, you may do some serious damage if you get these things wrong.
Say feet instead of meters or something.
Anyway. So the functions we have at our disposal from JQ are abs for the absolute value, which means that our table, our array, our file of numbers become 4203.1415 and 11 because minus 42 becomes 42 because that's what absolute value means.
The other one we have is floor, which will take a decimal number and it will give you a rounded down version of it.
So if we give it the number 3.1415 and we pipe that into floor, oh look, we use the minus n flag in our example. We get back three.
If we floor 9.99, which obviously if you rounded it would be 10, but by flooring it you get 9.
[21:37] We also have our square root which outputs the square root of our input and I don't seem to have an example that's weird then you haven't pulled my changes because I put in an example I think you didn't get it, Maybe you haven't pushed, because I have both. I'll bar does that.
I just did one more change. Okay.
I definitely... Well, the show notes will have an example of square root, because ultimately there's no point in us talking it, because it's just your pipe.
It's a squirt. S-Q-R-T. I love saying it a squirt.
But if you don't have that change, you might not have the rest of my changes.
I have a commit called add table comparing round, seal, and floor.
[22:23] So I definitely have that change. Okay.
Well, we'll find out when we get there. So anyway, yeah, I just put in an example of take the square root of 16.
And now putting that in. That'll do. That will do.
Leveraging Min and Max Functions in JQ
[22:36] We have a min and a max. They work on arrays, which is useful.
And if we had known about this, this would have been an alternative approach to trying to deal with not having to hard code in our years for our previous example, because we could have used the min function, or the max function, more importantly, to, you know, do some clever stuff there.
So they both take an array as their input and will give you back the minimum or maximum.
So if we take our array of numbers, the minimum is minus 42 and the maximum is 11.
The other very powerful one is minBy.
So we have an array of menu items.
So how do you do a minimum of a menu item? Well, with min-by, you tell it what key you care about.
So we can get the minimum priced, say.
Actually, no, it's the minimum stocked is what I have in my example.
So we can find out what we have the least of in stock by saying jq, we take dot, which is our array of dictionary menu items, we pipe it to min-by dot stock, and it will tell us that waffles are what we have the least of.
Power of MinBy and MaxBy in JQ
[23:51] That is very powerful, the min-buy name of key.
And there is also a max-buy name of key. So we can tell that our most expensive item is also the waffle.
By, again, piping dot to max-buy dot price.
Yeah, that seems really useful. Yeah, it lets us do very powerful things with erasive dictionaries.
We can basically treat them as if they're erasive numbers.
We've just got to tell it which key in the dictionary is the one to treat as the number and then keep doing your min and your max. Very powerful.
Expanding Functionality with C Math Functions
[24:24] The other thing, so I said that we can borrow some extra stuff from elsewhere.
Well, the thing we can borrow, this, yeah. So the thing we can borrow is the mathematical functions in the standard C library.
Library so if your computer is windows no is linux mac or unix you will have the standard c library installed and if you have the linux runtime for windows you will also have the standard c library installed and the standard c library contains a whole bunch of math functions i don't even understand i don't know what an atan is i know i used to know but i don't know what anymore more.
All of this mad array of functions all exist.
And you will find that they're in math, they're in, yeah, they're in the standard C library, so depending on your version of standard C library.
Anyway, there's way, way, way more of them than I know about. And.
[25:24] Instead of the documentation for JQ doesn't list all of the C math functions because they're going to be slightly different depending on what version of Linux you have and what version of Windows you have and yada, yada, yada. So what it does is it tells you that there's a rule.
So you go read the documentation for your particular version of the C library.
And then there's a rule that says this is how the C function behaves in JQ. Q.
[25:47] And the rule is, for reasons I cannot grok, different depending on how many arguments there are in the C side of things. And this is very, very annoying.
[25:59] So when there's one argument on the C function, like say seal around, then that one argument is expected on the JQ side is the input so you have a number you pipe it to seal and then that gets translated by the way he's saying before you keep going he's saying seal c-e-i-l as in sealing yeah now start over say it again so that i fear they're going s-e-a-l what function is that okay so yeah because say one more time the c people don't like long things so floor is four letters so what's the opposite well it's seal uh c-e-i-l floor is five letters but go ahead that's a good point yeah well anyway logic anyway start it again so you start with a you take a number you pipe it to seal yeah and what jq is doing behind your back or for free whatever you want to look at it is it's taking that jq function it's saying oh whatever you gave me as input i'm going to put that as argument one and i'm going to call c for you and so on the c documentation it says i provide a function called seal and i expect one argument the thing you want me to round up and so that's how it works for all one variable one argument c functions so you might imagine that for a two argument c function it would be the input is the first argument and then you provide one argument on jq to become the second argument in C. Oh no no no no!
[27:27] It just decides to ignore the input and it expects you to give both the C arguments as JQ arguments.
[27:36] So you're completely losing me because I know nothing about C.
So I don't know whether that's a requirement for this part of the class, because none of this sounds weird.
It sounds like you take a number, you pipe it to seal.
There you go. Got that rounded up number from sealing.
Okay. So then look at how POW works, and it's not the bloody same. Okay. What's POW?
Okay. So that's okay. We'll get to POW in a sec. So seal.
Exploring Additional Math Functions in JQ
[28:02] So JQ gives us floor, which is a round down.
Sometimes you want to round up, so we need to steal that from the C function, so we take SEAL.
And then the third thing you might want to do is just round like a human being.
So if it's less than 0.5, we round it down, and if it's more than 0.5, we round it up. And I can't remember where 0.5 goes.
It's in the documentation somewhere. Probably up. I believe it goes up.
So that is round, which we can also steal from the C libraries.
And again, we just pipe a number to round and it works just like you would expect.
The last thing that you will likely want to steal from C libraries is a function to raise one number to the power of another.
[28:44] So if you want to square a number or cube a number or quadruple, not quadruple, that's the wrong word because that's multiplied by four.
Whatever the thing is for raising to the power of four. Raised to the power of four.
Okay. Okay. You need a function that can do that. And the function in the C library is called POW, P-O-W, short for power.
And in C universe, POW takes two arguments. What would you like to raise?
And how many times should I raise it?
In JQ land, the input to the POW function is ignored.
You have to specify as the first jq argument what you want to raise, semicolon as the second jq argument how high, which means you end up writing pow dot semicolon three.
[29:35] Because you put the, you had two pipe to pow. Right.
So two is the input to the power, but now you have to tell it, okay, take that thing I just got, semicolon three, for the second argument. Yeah.
Why? That makes no sense to me. Surely to goodness it should just put the two straight in as the first thing to see, and then we should only give it one argument to pow, which is how high we'd like to raise it.
But I don't know. Anyway, it broke my head.
It broke my heart. I thought the universe made no sense because all my math was wrong and then I reread the documentation.
It's like, oh, it's just weird.
So it bothers me that I have to read C documentation in order to write JQ.
[30:17] That's scary. I'll leave the ones you told me how to write.
To be honest, if you have a look, the JQ documentation has a table of all the most common C functions and of all of that table, the only three that made any sense to me were seal round and pow the other ones that were like they're all scary so like there is like log to base two and natural log they're all there but obviously yeah have at it if you'd like okay i'm gonna jump in here next uh because i got caught out in writing my time adder app with it and it's a it's just a fact of the way these math functions work, that seal, round, and floor do very logical things when you read them, but when you go to use them, they sound weird.
So seal, as Bart said, is rounding up.
So if you take 9.999, you pipe it to seal, it's going to go from that up to 10. Great.
What do you think happens if you start with negative 9.999?
Well, the next... Actually, should that be a negative.
[31:25] Yeah, I'm sorry. I've got the mistake in there.
It goes to negative nine because negative nine is a bigger number than negative nine point nine nine nine.
I just hits me all the time. I listen to a history podcast.
And when they're back in the B.C. times and they're talking about one thing being after another, it's all backwards.
And it makes my head explode every time.
Yeah so uh i'm gonna double check what i just said while we are uh live here but um now round does it differently so if you round 9.9999 it's gonna round up to 10 if you round negative 9.9999 it's gonna round down to negative 10 so it's so it went it kind of went the opposite way and then And floor does it a third way.
So if you floor 9.9999, that goes to 9.
If you floor negative 9.999, it goes down to negative 10.
So there's a table in, yes, it is.
[32:30] But holy cow, it's a mess.
Yeah, that first one, seal of negative 9.9999 goes to negative nine, not positive nine.
So there's a little table in the show notes. You could just, you know, take a little screenshot of that, stick that where you save things.
And when you're ready to use any one of them, look at the table before you do it.
Because I have a hot mess in my math on my Time Matter app because I didn't get, I was like, how could that be true?
Yeah. It does make sense when you draw it on a number line, but it does make my head hurt.
So thank you. Thank you for putting that in. Right.
So that is our math. That is all of our math. So the next thing we want to be able to do is we want to be able to assign values into things.
Now, in most programming languages, which is the place you start is by assigning values to variables, right?
That is like step one of our JavaScript journey all those years ago.
It was pretty close to step one of our Bash journey more recently.
[33:34] Whenever we move on to PHP, it's going to be step one of our PHP journey.
Whatever programming language you learn first in CS101, it's going to be assigning values to variables.
In JQ, that's not how we use assignment at all. In JQ, we use assignment to set values within the item currently being processed, i.e.
For setting values within dot.
Not for setting variables, for setting values within dot.
Within dot, okay. So you pipe, or you have some input into a JQ filter, whether it came straight from a file or something.
It comes into a jq filter and then inside the jq filter dot has a value and you can mess around with whatever's inside dot using the assignment operator so if you have say a dictionary representing nobel prize you could say dot year becomes equal to 42.
[34:36] But you're working within dot. Yeah, yeah, okay.
Now, most programming languages have one real assignment operator and a couple of helpful shortcuts to save you some typing, like plus equals and minus equals.
But there's only one actual assignment operator in most programming languages.
In the case of JavaScript, it was the equal symbol, which we learned to say becomes equal to, because otherwise we confuse it with the double equals, which means is equal to jq so we go on do we get to call it becomes equal to here.
[35:13] No i'm going to keep saying simple assignment because it has a friend in jq plain assignment plain assignment thank you thank you thank you thank you but if i'm reading it out loud if i'm reading it can i say becomes equal to sure yes yeah sure thank you um it has a friend so our normal equals i.e.
Becomes equal to has a friend which is called update assignment so the normal equals we call plain assignment and the other one is called update assignment and its symbol is pipe equals and regardless of which of those two operators you use and we are going to look at them both in detail in a moment but regardless of which one you use the thing on the the left must be a path within dot so it could be a key if dot is a dictionary it will be a key it could be a deep key so it could be dot laureates zero dot surname right that's a valid path right but it has to be dot something that goes somewhere so dot year is a nice simple one dot laureates would be the first Lariat, that's also perfectly valid, right?
But it has to be dot something. So.
[36:30] Becomes equal to or update equals and then whatever's on the right is the new value which again very sensible something gets a new value um and now let us look at plain assignment before we look about anything weird so in plain assignment we simply say whatever we want to update becomes equal to some new value so if we look at menu.json it's an array the first element element of the array is the hot dog.
So if we want to change the price of the hot dog, we could use dot open square bracket zero close square bracket gets us the first item in the array dot price becomes equal to 420.
[37:08] So if we run that as a jq command, we say jq dot zero dot price double or becomes equal to 4.0 and we give it menu dot json.
The output put will be hot dog price 420 stock 143 the price was for 599 599 i think yeah i was excited about the hot dogs getting cheaper at that point yeah i do a lot of price reduction actually in these examples so we have said what we want to and then we give it a new price so that is how our simple assignment works we can also use dot on the right hand side of a plain assignment so if we want to link for like an airline and we want to make our seats cheaper when there's more of them yeah cheaper when there's more of them then we could decide to update the price of our hot dog to to become one-tenth of how many of them we have.
[38:06] So we could say .opensquarebracket0.price becomes equal to .opensquarebracket0.stock divided by 10.
And if you run that through, we now see that our hot dogs have gone up in price. Oopsie-daisies.
We're now 14.30, but once I get a new stock of many, many hot dogs, they'll go right down in price.
[38:30] Anyway. Anyway, so you can see that we can use dot on both sides of the plane assignment operator.
Okay. Which means we can also use the plane assignment operator to reduce our price by a dollar.
But that becomes quite messy.
Dot open square bracket zero dot price becomes equal to dot open square bracket zero dot price minus one. That's very repetitive. of.
There must be a better way.
Of course there is because this is JQ and it's a data manipulating language.
So if you want to update a value based on its current value, so in other words I want to make the price less than it is now by one, that's the new value is directly based on the current value.
That's what the update assignment operator is for. That's why it's called update assignment. I want to change the value based on what it is now.
[39:29] So, and the difference is that when you use the update assignment, so pipe equals, on the left-hand side, nothing changes.
On the right-hand side, dot is no longer the item currently being processed.
It is the current value being updated.
Oh, oh, I'm going to forget that.
Oh, yeah. Yeah, this is why I'm going to, yeah, this is very, very important, or it will make your brain explode.
So if we want to reduce our price by a dollar, our code becomes a lot shorter because we can just say pipe equals dot minus one.
In fact, we can be really powerful if we explode our menu and then we pipe it into a, we basically we explode it and then unexplode it and inside, our, you know, our second pipe we can say dot price pipe equals dot minus one.
We can reduce all of our prices by a dollar.
[40:32] Okay, so let me say it out loud one more time. When we say dot price pipe equals dot minus one, I'm saying go look at price and now what's coming through with the pipe equals is the value in price not price.
Not the thing currently being processed. Not the dictionary or whatever.
So it's the value that's in price is what dot is, and I'm going to subtract one from that value. Oof. Yes.
Yeah. That's not nice. It's not nice, but it's very, very important.
So I'm going to give you another example to help sort of bring the point home.
But I don't think we can emphasize this enough. So we're going to take a very simple dictionary that just has three keys, breakfast, lunch, and dinner, and I'm being rather unhealthy.
I'm starting my day with some pancakes and I'm having a delicious BLT for lunch and I'm finishing with some pizza.
This is not a diet I recommend, but it is tasty.
So if that dictionary were an input to a plain assignment, then the value of dot on the left on the right is the full dictionary.
[41:40] So we could set the value of dinner to be the same as the value of breakfast by saying .dinner becomes equal to .breakfast.
And so that assignment would be to change the input dictionary to become breakfast pancakes, lunch BLT, dinner pancakes.
Also not a bad tasting meal.
And so if we run that... A little less red meat, probably. Yeah.
Cheese. If we run that, if we take that dictionary and we shove it through inside a JQ command, we can see this happening.
I'm not going to read out the code, but basically it's just the same filter I gave above.
Run through the dictionary I mentioned above. But if we do the same thing with a pipe equals, in other words, an update assignment, we get ourselves an error.
Because now on the right hand side, dot breakfast is meaningless nonsense.
Because the value of dot is pizza, the string. string.
And a string pizza does not have a .breakfast.
Hence you get the error.
[42:48] Well, yeah, .breakfast just doesn't mean anything at all to it.
Well, it actually says .is a string, and you're asking me to get the value for the key breakfast inside the string, which is an error. So it will give you an error.
Because it says there is no property dinner, or sorry, there's no property breakfast of the string, whatever, pizza.
Okay.
[43:14] Okay, that's taken me a minute, but I think I follow you now. All right.
So if we wanted to append to our dinner we could do it the long way we could say dot dinner becomes equal to and then we give it a new string where we then do string interpolation backslash roundy bracket dot dinner and nachos or we could avoid that by using an update oper our wait update update assignment update assignment yeah only the pipe is missing from the show notes which makes though particular show notes really confusing so if we say dot dinner pipe equals to then we could just say backslash dot and nachos which is a lot nicer so why did um why did we switch over to adding uh the pancakes and nachos or whatever pizza and nachos instead of solving the original problem of trying to change what we were eating for we wanted to make it be pancakes for dinner, No, I just wanted to show you the error. I just needed to create an error.
Okay, but could we have used the update operator correctly?
[44:30] Oh no, you can't use it at all. No, because if you're using the update operator, then the only thing you have available on the right-hand side is the current value of dinner.
So how can you get from there to breakfast? The answer is you can't.
So it's the update operator... It's just not made to do that.
Assignment Operators and Values in JQ
[44:47] That's it exactly, right? That is not the problem it solves.
Your seatbelt is useless at protecting you from an electric shock.
It's just not what it's for.
[44:59] Okay, so we also get a bunch of free shortcut operators.
So the update operator is very often used to add something to the current value, or to subtract something from the current value, or to multiply the current value by something. thing.
So we get shortcuts which are plus equals, minus equals, star equals, slash equals, modulus equals, percent equals, and also alternate assignment equals, which is very powerful.
And they are just shortcuts for pipe equals dot plus, pipe equals dot minus, pipe equals dot star, are pipe equals dot slash pipe equals dot percent pipe equals dot slash slash they're they're just handy shortcuts what we do not get from our list of handy shortcuts is plus plus or minus minus so i'm afraid to say the best we can do is plus equals one and minus equals one that'll be worse, which we can prove by how by using jq on the dictionary a has the value of two b has the value value of 2.
Then we can pipe that to dot a plus equals 1 and pipe that to dot b minus equals 1 and then we get a3 b1.
[46:15] Okay, that could work. Yeah, a much more useful one is slash slash equals, because this allows us to put default values for dictionary keys that may or may not be present.
[46:30] So if you, say, had a big data set of Nobel Prizes, and they don't always have a laureates array, and that keeps breaking your code, you could say.
[46:44] .prizes piper2.laureates slash slash equals the empty array and then reassemble all that back into an array by wrapping everything in square brackets now what you get is your Nobel Prizes, if there was a laureates array, it's left completely intact.
If there wasn't a laureates array, you now have an empty array.
And so you can do things like the length to figure out how many winners there were and it won't throw an error whenever there were no winners it will give you zero that is nice that is nice so you can use slash that equals to tidy up dirty data so if you do that early in your jq script then for all the rest of your code you don't have to worry about do i need a question mark here do i need to use the alternate operator here you basically just default in whatever the heck you wish was there all along and if it was if there is a value it'll be left completely unchanged but if there isn't your default goes in and hey presto now this next section is one for people with experience in programming in javascript and java and other languages and if you don't have much programming experience you would never have expected any programming language to be weird so you can can mentally tune out for five minutes.
But if you have come from another programming language, it's really important that we underline this point.
[48:09] In JQ, values are always passed, sorry, arrays and dictionaries are always passed by value, not by reference.
[48:23] In JavaScript, I spent an entire installment explaining how weird it is that when you say one array becomes equal to another array, that's not what you're doing.
You're copying the reference, so now you have two names for one array, and if you change with one of them, you're actually changing both of them, and you get spooky action at a distance, and if you're not aware of pass by reference, the whole universe is nonsense and it's really confusing.
So they're not identical twins with two different names. They're the same kid being called two different names. Exactly. So that is passed by reference.
And JQ does never, ever, ever, ever, ever, ever do this, which means it behaves like a normal human being would expect things to behave.
And like a programmer wouldn't.
[49:11] A JavaScript programmer. Well, yeah, and not just JavaScript.
Most languages will use references for complex data structures like arrays and dictionaries.
So if you take some JavaScript from our distant past, say let a become equal to the array 123, let b become equal to a, and then you push 4 into a, when you console.log b, what you get is 1234.
[49:37] Which is passed by reference doing its magic.
JQ is not like that. So, if we make ourselves a little jq file called pbs161a.jq, we can start with a dictionary that has one key named a, that is an array.
So we say, open a dictionary, a colon, the array 1, 2, 3.
And now we pipe that to a filter that updates the dictionary to add a new key named b, which is a copy of a. We say dot b becomes equal to dot a.
Then we update dot a by adding an extra value to it.
We say dot a open square brackets three close square brackets becomes equal to four.
The output of that script will be a is one, two, three, four, because we added an extra value to A.
B is nicely 1, 2, 3. No spooky action at a distance. It's exactly how we left it.
[50:42] So as I say, programmers are going, oh my God, I'm going to have to remember that. And humans are going, why are you telling us it did the blindingly obvious?
Good for you to do the blindingly obvious.
Right. Let's wrap up for today. When you started these lessons, you would often go back through the history of the way things used to be.
And I never wanted to, I was just, I would zone out because I don't need to know what they used to be.
I only need to know what they are now. but now I need to know because you've taught us so much stuff that now we have to know what you used to know.
Yeah. And the thing is, we're still going to be using JavaScript, so we need to live in both universes.
We need to live in pass-by-value universes and pass-by-reference universes.
And that's what it is to be a programmer. You need to know which hat I have on. What kind of a universe am I in today?
Am I in one of these universes or one of those universes?
[51:32] Right. Let us wrap up today by looking at some nice JQ functions for transforming strings.
And this is another place where I get to talk basic computer science at you.
So when we were in JavaScript land, I may or may not have used the programming jargon overloading.
So when you overload something in a programming language, what it means is the same operator does different things in different situations.
So very, as we would expect, when you give the plus operator two numbers, it adds those numbers together.
JQ has overloaded those operators. So you can use the plus operator on strings.
You can also multiply strings together and, very weirdly, you can divide strings.
All three of those operators have been overloaded. I know, I know. So...
[52:34] The thing to bear in mind in JQ world, the thing has to be the same on both sides.
So when you use plus, you either need to give it two numbers or two strings.
If you give it one of each, it will be cranky.
So that means you use a two number or a two string so that you know what you're doing.
But assuming you keep a nice symmetry jq will do different things depending on whether you give it numbers or strings and so okay adding two numbers is going to give you let's say we say that we add the numbers 22 and 20 we get 42 if we add strings what does it do it concatenates them which is very very sensible.
So if we take a string plus another string, we get a string, another string.
Overloading Operators in JQ
[53:30] And if we mismatch our types, we get an error. So there's an example in the show notes of a string plus 42, which JQ promptly throws an error about.
[53:40] So what does it mean to multiply a string?
Now, yeah, okay, so now I've just kind of contradicted myself in a very annoying way. so for addition they have to match.
For multiplication, different rules.
The thing on the right has to be a number. Yeah, so the thing on the right must be a number. So you can multiply a number or you can multiply a string.
So if you have two numbers, math happens. Great.
If the thing on the left is a string and the thing on the right is a number, what you do is you concatenate that many times with itself.
So if you take the string ho and and you multiply it by three, you get ho, ho, ho.
[54:22] You were proud of yourself when you thought of that idea, weren't you? Yes, I was.
I just picture Bart giggling by himself writing these show notes. Yeah.
So what does it mean to divide a string?
Well, in JQ, it means to split a string. Turn a string into an array.
So if you take the string 10, colon, 20, colon, 30, and you divide it by a colon, You get an array 10, 20, 30.
So you can take the string of a time, say, 30 seconds past 20 minutes past 10, and divide it by the colon, you get the array 10, 20, 30.
That is actually strangely useful. So dividing means splitting.
[55:09] Okay. Right. We then have some free string functions that we get.
So we have already met ASCII downcase, which is for converting strings of any case to being all lowercase.
It has a friend, ASCII upcase which will convert an entire string to uppercase.
We've already met L trimster for left trim string and right trimster for right trim string so actually, goodness me, we've already accidentally met all of these functions apart from the last one which is sub which performs a string substitution.
Institution so the input has to be a string and the output will be a string and you pass a regular expression as the first argument and a replacement as the second argument and optionally you can even pass a set of flags as a third argument so if you take as an input say the reversed version of the the date, so 2023, 11, 12, colon, something, and we substitute 0 to 9, 4 times that, minus 0 to 9, 2 times that, minus 0 to 9, 2 times that, colon, and then we replace that with the semicolon, the output is going to, sorry, we replace that with the empty string, and.
[56:35] Bloody JQ with its semicolons as separators.
If we replace that with the empty string, we're going to suck out the thing that matched the regular expression, which was 2023, 11, 12.
So we end up with just something.
So in other words, we've replaced a timestamp with an empty string.
[56:54] Okay. Which makes a lot more sense if you're reading than I'm betting than listening, but I can see what it's doing. Yeah.
Yeah. Yeah, and the reason I gave that example is because oh so often you need JQ to run through some timestamped data.
And the timestamp is useful to you as a human being for looking at stuff, but if you actually want to get the data, you need to make the timestamp go away.
And so this is actually quite a common substitution to want to do.
[57:23] Right, I don't know if listeners can hear, but there is a nasty code doing the rounds in the Booshot's household, told and I thought I'd gotten away scot-free and then my voice is going away which makes me oh you sound pretty good to me so I know you you said um you said oh goodness me I've already talked about ASCII down case and L Trimster and R Trimster but having these off all five of those together I think is still valuable in again copy and paste this little table put it where you save things that you need to remember.
[57:53] Precisely and also the fact that the installment is named string functions or string manipulation will help you find them you know coming back in time so yeah the fact that we've seen sneak peeks is not a problem so i have a challenge for you now that we know how to set new values inside dictionaries i finally can give a nice solution to the fact that i don't like some of the choices the nobel prizes people made in their data set so we are going to sanitize the Nobel Prizes using what we have learned today.
The first thing we are going to do is add a Boolean key with the name Awarded to every single prize in the dataset.
It will be true if the prize was awarded, and it will be false if the prize wasn't awarded.
Which means it's very easy then in future to filter out prizes.
[58:43] I only care about the years where they actually awarded the darn thing, so we can just say say pipe awarded double equals true, and then we're good.
So we need to make a new key named awarded, which is going to be true or false, depending on some logic, to tell you whether or not the prize was or wasn't awarded.
You want us to change the NobelPrize.json file or create a new version of it?
Create a new version. So the input will be the current data set and the output will be a new data set, which you can send to another file or to the screen.
I don't care where you send it, but I guess the first thing is to make it right.
And then, yeah, if you use terminal redirection to pipe that to, I think I called my nobelprizes.clean.json.
Transforming Nobel Prizes Data with JQ
[59:23] Okay. Okay, but it's literally going to say awarded colon true, awarded colon false, based on what our automatic tools or tooling will do to find it. Yes, which means we're going to need to have a filter.
We're going to need to use an assignment operator. and some sort of you know on the right hand side of that is going to have to be some logic that gives true when the award what the prize was awarded and false when it wasn't oh that sounds fun the next thing we need to do is ensure that all of our prizes have a laureates array whether or not there was one there to start with they should all have one okay then empty if it's not awarded but yeah okay then the next set of cleaning up is a little step deeper inside our prizes we We have a Lariat array which contains lots of dictionaries which also contain what I consider to be dirty data.
So sometimes they give the prize to organisations instead of human beings.
So let's add a key named Organisation which would be either true or false depending on whether they awarded the prize to an organisation or to a human being.
[1:00:29] And a very much related issue is that when they give it to an organization, they give the organization a first name, but not a surname.
So that's how we can tell them apart. That's how you can tell them apart.
But also when it comes to printing out names, it's really messy because when the winner's a human, we print first name followed by surname.
But if they're an organization, that doesn't work. And then we have to mess around with the alternate operator.
So instead, we're going to add an extra key into every laureate's dictionary called display name.
If the laureate is a human, it will contain, pre-joined together, the first name and the surname.
If the laureate was an organization, it will just contain the organization's name.
So from now on, whenever we want to print out a name, we just use the display name and no more faffing about.
[1:01:17] So all of your faffing about is one big challenge, and then we'll have a much cleaner data set. it.
I like this. So if we had learned, if we had, once we've done this, we don't have to do anything we did in any of the previous challenges.
Yeah, pretty much. But we did have to learn a lot to be able to do that. Yes.
But that is, I'm going to slightly step out of sequence here.
So these episodes are pretty timeless, but they are recorded in a specific time.
And when we started this recording, I had begun to make a lot more use of JQ in my professional career because an awful lot of real world APIs give their data in JSON format.
And with my new role and work, I'm spending a lot more time dealing with outputs from various APIs and things.
And I'm finding myself doing a lot of JQ for real.
A good example being the data set or the exports whenever your organization gets, Yeah, the exports from Have I Been Pwned are available in JSON.
And I disagree with some of Troy Hunt's choices as well.
Oh, the way the data's drawn? The way the data's drawn, yeah. Constructed? Okay.
[1:02:27] I have discovered in the real world that writing JQ filters is a game of two halves. The first half is cleaning the data.
The second half is doing whatever it is you actually wanted to do.
And if you jump to the second half first, you will do a lot of swearing and end up writing really messy JQ with question marks and all sorts of things all over the place.
But if you spend a few, you know, a few pipes cleaning things up, then you can just apply your logic to your heart's content and you will be so much less stressed.
So that is kind of why we've ended up with this particular challenge because I've discovered that in the real world, this is what you want to do.
Make the data nice and then you can just have at it.
Then you can query it for whatever it is you need.
[1:03:10] I think part of the hard part though is you come across the messy data when you're trying to do the thing you're trying to do.
Yes. Like you don't realize it's messy until you do that.
[1:03:22] So you have to kind of run into the wall. oh okay well let me clean that data okay now i'm going to run into the next one let me clean the data before writing goopy goopy yeah so what i used to do when i ran into the wall because you're right you will run into the wall so what i used to do when i ran into the walls i would fix it at the point in time or the point in my pipe of filters where i hit the wall but now i fix it at the front so whenever i run into a wall i front load the fixes and then i don't have to mess up my logic that's it okay i just pre-fix it i've discovered this works way better and i do a lot less swearing now because i didn't mention at the start that i knew just enough jq to be dangerous i knew just enough jq to almost do what i want and then be very frustrated i am i am having a much better time i think i sent you a very happy message with a nice big jq script from work i was like this is the real world and i just use all the stuff i learned while teaching programming about stealth which was very pleasing i like it hey could could you try to uh teach it let's say do you want to win get a million dollars why don't you learn how to get a million dollars and teach us yeah yeah if i figure it out i will definitely do a series taming the million dollars there you go all right this is great this is a lot of fun i'm surprised at how much i like this i'm not sure i'll ever end up using it but but i'm really enjoying uh learning how to do it it's really and this is a fun data set.
I'm going to make a wee prediction.
[1:04:48] I'm going to make a wee prediction. So we're having a lot of fun with our other, other hat on. The community is having great fun with XKPassWD.
But one of the things that's going to be happening in the future is there's going to be an API version of XKPassWD as well as a web version.
And I think that might be talking some JSON.
[1:05:05] Oh, okay. All right. I may be looking at a bigger picture and I may be setting us up for some more cool, fun stuff after we learn PHP.
So anyway, Anyway, I think you will be using JQ. There we go.
[1:05:17] There we go. Right. Oh, I should probably set the stage.
So we are now, we have learned math, we have learned assignment, and we have learned to mess with strings.
Manipulating Arrays and Dictionaries in JQ
[1:05:26] So the next two obvious things to mess with after numbers and strings is, of course, arrays and dictionaries.
So in the next installment, like we did for strings here, we're going to discover that JQ has provided us with a whole bunch of functions for doing fun stuff with arrays and with dictionaries.
And the plus operator and some of its friends have also been overloaded for arrays and dictionaries.
In JQ, you can indeed add and subtract arrays and dictionaries.
Well, stay tuned till next time for that weird, weird universe.
Right. Until then, folks, happy computing.
If you learn as much from Bart each week as I do, I'd like you to go over to let's-talk.ie and press one of the buttons over there to help support him.
He does 98% of the work here. I'm just the stooge that listens to him and asks the dumb questions.
If you go over to let's-talk.ie, you can support him on Patreon, you can donate via PayPal, or you can use one of his referral links.
I really hope you'll go over and help him out. In the meantime, you can contact me.
[1:06:36] Music.