PBS_2024_08_06

An audio podcast where Bart Busschots is teaching the audience to program. Associated tutorial shownotes are available at https://pbs.bartificer.net.

2024, Allison Sheridan
Chit Chat Across the Pond

Automatic Shownotes

Chapters

Introduction
Decision-making Process
Addressing Changes and Fixes
JQ as a Command Line Tool
Supporting JQ on Windows
Introducing FQ
FQ Functionality
FQ as a Pretty Printer
FQ and Binary Formats
Enhancements in FQ
Exploring XQ Project
Evolution of JQ Usage
FQ Development Inspiration
Benefits of JSON Piping
Terminology and Jargon Challenges
Exploring FQ and CLI Interface
Defining Language Semantics
Acknowledging JQ Maintainers
Embracing the Open Source Philosophy

Long Summary

The recent interview on Programming by Stealth featured Matthias Wadman, a key figure behind JQ, who shared insights into the development and maintenance of the tool. Matthias revealed that while there are around eight administrators associated with JQ, the core team of active contributors consists of about five individuals, including himself, Nico, Emanuel, and Ichny. The project's dynamic nature sees contributors come and go as they identify areas for enhancement, with five long-term contributors driving ongoing development and fixes.

Matthias discussed the decision-making process within the JQ project, emphasizing the importance of consensus among maintainers when considering changes to the language. The team adopts a conservative approach, prioritizing bug fixes and addressing behavioral issues to ensure stability and script integrity. Despite challenges in maintaining compatibility with existing scripts, the team remains dedicated to responsibly enhancing the tool's functionality.

Further insights included JQ's composition, with a substantial portion written in JQ itself and some aspects relying on external frameworks or languages for specific functionalities. Matthias highlighted the historical transition of JQ's initial version from Haskell to C, emphasizing the project's technical evolution. He also discussed JQ's compatibility with Windows systems, detailing the compilation process using MinGW for Windows support to ensure consistency in the command line interface across platforms.

The conversation expanded to introduce FQ, a complementary tool to JQ that enables parsing and transforming various binary file formats into JSON-like structures. FQ's decoder-centric approach allows for in-depth analysis of binary structures like image files, PDFs, and network capture files, showcasing its versatility in decoding complex data structures. Matthias emphasized FQ's role in simplifying the debugging process by providing detailed data representations and contextual information to enhance the user experience.

The podcast discussion centered on using JQ and FQ for debugging and filtering data, particularly in cyber security applications. The speakers highlighted the importance of quick error identification and efficient data filtering, emphasizing JQ's versatility in processing JSON data and extending capabilities through FQ. Personal experiences with leveraging JQ's features for data processing tasks were shared, underscoring the importance of regular expressions and UTF-8 support in handling binary data effectively.

Overall, the interview provided a deep dive into the intricacies of JQ and FQ, showcasing their collaborative development efforts and technical nuances. The dialogue highlighted the tools' adaptability and versatility in modern data analysis and manipulation contexts, emphasizing the seamless integration of JQ filters and functions within FQ for diverse data operations. The conversation also touched on the symbiotic relationship between CLI tools and the JQ language, illustrating how their integration streamlines data transformations and enhances coding efficiency in various projects.

Brief Summary

Matthias Wadman, a key figure in the development of JQ, discussed insights into the project on Programming by Stealth. The core team consists of five active contributors, focusing on bug fixes and stability while enhancing functionality responsibly. JQ's composition involves a transition from Haskell to C and supports Windows systems via MinGW. FQ, a complementary tool, is utilized for parsing binary file formats, offering detailed data representations for debugging. The podcast explored JQ and FQ's applications in cyber security and data filtering, highlighting their adaptability and effectiveness in modern data analysis. The conversation emphasized the seamless integration of JQ filters and functions within FQ for efficient data transformations.

Tags

Matthias Wadman
JQ
Programming by Stealth
core team
bug fixes
stability
functionality
Haskell
C
Windows systems
MinGW
FQ
binary file formats
cyber security
data filtering
adaptability
integration
data transformations

Transcript

[0:06]
Introduction
[0:00]Music.
[0:07]Hi, folks. This is something I don't think we've ever done before on Programming by Stealth. I am interviewing someone who has written parts of the tool that we have just talked about in the series. So I am joined by one of the brains behind JQ called Matthias Wadman. So welcome and thank you. Hey. No, yes, I'm Matthias. I'm one of the maintainers of JQ. I didn't really I didn't start the project it was started I don't know ten years ago oh yes Stefan sounds about right yeah that's right I'm I'm only been in container for like I don't know three years so I hopped on later late on, so how many of you are there making jq go how many maintainers there are yeah i think there are like eight people who are administrators or can do like things in the project but maybe i would say it's only like five or something that is like active that do things or maybe three people are like very active it's like me and And Nico and Emanuel and Ichny, I think we are the most active ones.
[1:28]People come and go, I would say. So as people discover JQ, they find a pain point. They say, oh, I wish it did blah, blah, blah. And if they have the skill set, I guess they contribute for a while. They add the feature. I would say there are maybe five now that stay around for a longer while. And then some people come and just do some small change and then they disappear yeah but maybe we are like i would say if we would do a new release it would be like five people that are involved in fixing things and writing change logs and helping out with things and how does how do you guys as a project decide like what go you know how do you decide what's on the roadmap or is it very much led by the community here?
[2:13]
Decision-making Process
[2:14]I have to, there is no, I don't think there is any official like policy on how things work. So it's gonna be what I, my interpretation of how we work. I think it's usually that we are, what I've seen is that we, if we are like three maintainers who agree on something, we usually merge it.
[2:36]And then sometimes if it's like a bigger, more complicated change, I would say it's like bug fixes and like kind of obvious behavior, weird behaviors, I would say. Maybe we have three people who approve, then we merge it. But if it's more complicated things, I think we usually wait for Nick or someone who is more... Or we ping some old maintainer, old active or old JQ maintainer about the problem. See what they think about it. So we are fairly conservative when it comes to changing the language itself. Have like to add new built-ins or changing the syntax it's like it's nearly it never happens anymore i would say so it's like the most changes are crashes and uh yeah like things that are obviously wrong i guess there's a huge difference between taking a built-in that already exists and making it do something a little bit different that's already quite scary uh but yeah that That's probably the most scary, actually, would be to take something that's in the language and make it different. But I guess the least scary is a bug fix, and then somewhere in between would be something, again, new, built in or something?
[3:56]
Addressing Changes and Fixes
[3:56]I mean, the least scary is, of course, it crashes. Like something that shouldn't crash. That's not how it should behave. And those we fix quite fast, I would say. If it's more... But it's tricky because it's like small, more fixes is like usually that's why it's good to have many maintainers because usually someone comes up with the figures out that, oh, this will break some existing JQ scripts. So there are there are behavior changes that we probably would have liked to have done, but we won't do them because they would... They will change things. And we have a couple of changes now that probably will change, things in existing scripts, but it's like syntax, like priorities between some operators that are like very confusing. Probably the existing scripts that will change how they behave now, they were probably written by someone. It's most likely the person didn't mean it to behave the way it works right now, I would say. So it will probably fix more scripts than it will break.
[5:08]Yeah, but nonetheless, if someone's written it intending the side effect, and then you change the side effect... It's the as operator when you bind the new... Well, that one has some weird.
[5:23]Precedence, like how it binds the next, the rest of the query. That is very confusing when you look at the source.
[5:33]
JQ as a Command Line Tool
[5:33]So that sort of brings me to another interesting question. So JQ is a language, and it is a command line tool. So how much of JQ is written in JQ? Because I have this feeling from the documentation that some of the built-ins are actually the reduce built-in wrapped in a function. How much of JQ is JQ, and how much of JQ is something else? I would say there is even, if people are interested, there is this built-in .jq in the jq repository. That is all the ones that are written in jq itself. And it is quite a lot of the built-ins are written in jq. But then there are a few ones that are like, they are either because they can't really be, or it's more like we don't want to write like a regex engine in jq. So therefore, there is a built-in that binds to this regex library that we use. Or there are some performance reasons for doing it. It would be too slow to do it in JQ.
[6:39]And the reduce is one that is used as the back end of many things. So obviously, reduce couldn't be written in JQ. It must be written in a more fundamental language. Yeah, the reduce even has its own syntax. I don't think it can be built-in even, because it has this, like, you do reduce, I don't know, I don't even remember the syntax, reduce and generate. Yeah, the syntax is quite, yeah, the syntax is quite different. Yeah, like, what are they called? Special forms, I think they are called in. They are like functions, but they have their own syntax. But some of them, I think, are... They are kind of like how the compiler works is that they just get translated into a built-in, some built-in function but the syntax but how the user writes it looks special but it just gets translated like assignment for example in jq or update when you do like equals or pipe equals, that actually gets like kind of translated into a set path kind of or an update which is itself is a set path that you can see in the built-in jq file is there is a quite big implementation of, the thing that takes like the left-hand side of the right-hand side and then.
[7:59]Figures out the paths and runs the generators on the right side and signs into the left side yeah so then cool that code is kind of complicated and some of some of the built-in jq functions they They use some special things that you shouldn't use in normal JQ queries because they have some special knowledge of how they behave. They might know about the inner brains of JQ that isn't in the documentation.
[8:30]Some of them, I guess, I don't know the history of some of them, but they are kind of hacks, I would say, just to fix some reference counting, thing to to be able to release the reference on one of the sides and things so don't don't but you could probably implement most of them in one normal jq code if you want but they would be it would be slower or use more memory if you do if you didn't have these hacks kind of yeah as long as you have them for people i guess it's okay well yeah and so that brings the the nerd in me to the other obvious questions so a whole bunch of jq is written in jq which is cool bootstrapping and i love that kind of thing but obviously there must be something lower down because it can't all be built in itself so what is the really native language is it c or what is it it's written in c, or actually if you want some some history trivia it was actually written in haskell the first version.
[9:29]If you go to the if you check the git report story the first commit is like a whole is jq written in Haskell like a proto-jq is a very simple version of jq and then you can see that a couple of months later the whole implementation got changed to C. So someone had the idea they were like I wonder could this work and they implemented in Haskell which is obviously their language of choice. I mean, I've done a lot of first attempts at code in Perl, and I would never intend to keep it in Perl. But it's like just to prove to myself that this is actually a good idea. And then obviously, you move to another language, I guess. Yeah, my guess is that it was kind of a prototype just to see how it would work. But yeah, but now nowadays it's C, all of it is C.
[10:20]And it's very similar to other languages like Python or anything. It's like a parser that kind of compiles down to some kind of bytecode that runs in a virtual machine, kind of. So there is like a very JQ-specific virtual machine inside JQ. I'm thinking back to my compiler classes at university many years ago. That makes sense. And so it's all written in C. C, so the command line interface and the core code are obviously very closely related to each other then. Now, I'm a Mac person, so I just install Homebrew and I go brew install JQ and all the magic happens. And on Linux, I go yum install JQ and all the magic happens. Does the C code work on Windows? Yeah, it does.
[11:09]I think none of the maintainers currently is a Windows user, Which is kind of a problem because we still want to support Windows,
[11:22]
Supporting JQ on Windows
[11:20]or it has supported Windows before, and we have to support Windows somehow. But what we do is that we compile it with this, it's called MinGV, which is like a POSIX, like a UNIX-ish environment for Windows. So you can kind of take UNIX programs and... And then MingiWii is the one that kind of emulates to make it work like a Unix environment. So that's what we do for Windows. So the Windows people would have the same... So they would have jq.exe or whatever, and then it would have the same command line flags. But would they be backslashes instead of minus minuses? Or is it even still minus minuses? It's minus minus. Good, good. Teach them proper. Teach them proper.
[12:10]But maybe we can talk later about that writing JQ, how much JQ is JQ, is that if we talk about later, we're going to maybe talk about FQ, and FQ goes a bit further than JQ. Well, that's a good opportunity. So we talked a lot about JQ.
[12:31]
Introducing FQ
[12:26]I don't think on the show we talked about FQ. So let's assume we don't know what FQ is. Can you explain what it is and how it fits into this universe?
[12:37]FQ tries to behave like a CLI-wise. It tries to be exactly like JQ more or less, like how it behaves with input files and output. It's just that instead of giving JSON files, you can give it various binary formats as input, like an MP4 file or a JPEG or a PNG. And then FQ has decoders that kind of takes the binary file and then massages them into like a JSON-ish data structure. So you'd be representing like image bit information in a JSON-like structure? Yeah, so the structure is like the same JSON-ish types, like objects, arrays and number strings. But what it does is that it uh and when you run the like fq has like a.
[13:40]It's like a bit stream decoder it's like a bit reader it reads bits from binary files and then it has decoder like the decoder dsl that can kind of keep track of where things are in the file so you get like a json structure where all the fields like an object or a number also knows from with what bit range from the original file that corresponds to this thing in the structure.
[14:11]So you can get like a hex dump, and next to it you see a tree, like a JSON structure over. But it's not really JSON. It behaves like JSON, but you can kind of decode things into...
[14:26]
FQ Functionality
[14:27]So then you can use JQ queries on that data structure to find things or whatever you want to do. And change it, obviously, then. You can't really just assign things into it. It's kind of like a read-only data structure you get. If we want to go into details of FQWorks, FQ is also very decoder-centric. It's about presenting a file, kind of like how something works.
[14:58]
FQ as a Pretty Printer
[14:58]If you give it like an mp4 file, it will show you all the boxes in the mp4 file, if you know how the mp4 file works, and it will show you the tracks and all the samples. And it will also kind of like a map, well like number values into string representations of the numbers and things. So it will kind of decorate the data for you. So you don't have to keep track of the mappings of the numbers. And maybe it even like calculates things into who maybe derives things also while it's decoding. So it's like a pretty printer for binary files, you can say. Yeah, so I guess just for the listeners, so one of the things you may not realize is that a movie is actually, in an MPEG format, you start off with a keyframe, and then you tell it how to change the pixels until the next keyframe, when it's a fresh start, and then you tell it how to change the pixels, and then you tell it how to change the pixels. So those are data structures, you know, I want to move this to here. And so that's then represented in a JQ or a JSON-like format of lists, dictionaries, etc. Oh, wow.
[16:10]It's kind of like my idea, how I got the idea was that when I played around a lot with JQ, is that I realized that like JQ, the language doesn't really care that it is JSON, that it's inputs and outputs. But it just needs to be something that can be number strings or booleans or arrays or objects. It doesn't really care that it's JSON. Like the format, JSON format itself is not so important. It just happens. If you want to play with it, of course, it was mapped from JSON. But yeah, when you can, it does, the JQ interpreter or the language doesn't need it. As long as the data structure behaves as JSON, then you can do whatever you want.
[17:01]I think the reason JSON is so popular is because, actually, if you have as your three primitives a single piece of data, a list of pieces of data, and a name-value pair of data, you can represent almost anything with those three primitives. It is very powerful. very powerful and i've noticed that when i have written a lot of binary decoders from fq is like you can usually come up with a very good like modeling of of course there are formats that are very that has like references and things but then you can maybe you can usually you can represent this like an array of objects in another array that has just has like ids into that so you can represent graphs or whatever you want to if your value can be a path in another array then you can and probably represent almost anything.
[17:50]
FQ and Binary Formats
[17:51]And the obvious thing that as a photographer what obviously leads to my mind is a lot of our binary files have like ye olde data in them so xf and id3 so that seems like a really good match for something like jq yeah exactly it works it works very well and it's also it's also the what fq do like a lot of these binary formats like for mp4 or jpegs or pngs is that they these binary files they usually have a common format inside of them like if you have tiff or exif for example like nearly all all image containers can contain exif right which is the idea yeah there is an exif decoder in fq and then the the other png and jpeg decoders they they use that EXIF decoder. So you kind of get like a tree and then it gets like the tree from the EXIF decoder gets like appended into that tree. So you can get like nested decoding with FQ, which is also very, very nice. And then my brain has jumped straight away to, and then you have another decoder for zip and RAR files, and then you can nest all of those and then you really can go to town.
[19:09]Because if you work with, I mean, I work as a, I'm back in engineering, but I work as a media engineer, you could say. So I work with streaming media and transcoding.
[19:23]So then you have all the MP4 is the most common format, but there are others and then, and all the like codecs, they also have their own like bitstream formats so then you can just write one like a sem like in the h.264 decoder, no not the full decoder because that's like a huge huge the specification is like thousands pages but you can you can like the code with fq you can kind of decode the.
[19:55]Like the overall like the biggest parts of it and then you can skip kind of like the action pixel or macro blocks or whatever but you can get like a very good overview and for the for the sample format and then you can use that sample that can hate you six four sample decoder together with different container decoders so you can just like you're just just reuse it or less so you don't write many good decoders right because so many things are common yeah, So what sort of file formats do you currently support in FQ? I would say most media containers it supports, like the most commonly used. MP4, Matroska, and like all the PNG, JPEG. And what about PDF?
[20:47]Exif it has and TIFF. And then it has some support for like network capture files like PCAP. Oh, sorry, with my cybersecurity hat on, you've just pinged me here. That's very interesting. And it uses something called GoPacket, which is FQs written in Go.
[21:16]
Enhancements in FQ
[21:12]So it actually uses some already existing libraries. So it can also do TCP reassembly. So you can kind of take a PCAP and then continue decoding the TCP stream.
[21:27]Currently, there is no HTTP decoder, but I have been working on it. But then you can kind of decode, like, it has TLS, for example, support, but not the modern TLS. It has support for TLS 1.2 and 1.1. So then you can decode that further down. So it's like, that's the whole point of FQ, is that to decode as much as possible to make it very, So you as someone who's debugging some problem with a file or something weird is happening, you should be able to just run FQ, query it, and then see all the... The data should be as decorated as possible with explaining everything, what everything is. And if there is a timestamp somewhere in FQ, you will see the number in seconds, but then there will be like a description afterwards saying this is the date in like unix time this means 19 blah blah blah so it tries to kind of say what things are so it's it's kind of is it the whole point of it is to for me when i'm using is to kind of make it to, lessen the cognitive burden when you are debugging something to like to give you as much.
[22:48]Show you as much as possible, translate as many things as possible for you. So you can just, yeah. I've noticed it helps a lot when you're debugging things that you can just see that. Right. Yeah. The color space on this thing is actually wrong compared to this color space down here. So then you can just see, they map to the wrong graph, or whatever it is you're doing. Right. Usually when you're debugging, the thing you're desperate for is information, right? Give me the facts. What's actually gone wrong?
[23:26]And then you go back to your code and you find the logic and you go, oh, silly me. You know, I did blah, blah, blah. so with my cyber security hat on my head the thoughts of being able to take a giant big pcap file because of course the problem with trying to find problems on a network is networks are very noisy places especially in the real world it's one thing to be you know in a lab environment with one pc but in the real world it's just all noise and the thoughts of being able to use all of the functionality of jq to filter down packets in a syntax i'm really be comfortable with instead of trying to write unix and tcpdump filters which are horrible, um but also being able to do that in jq syntax doesn't a lot nicer, yeah it's uh i would say the only problem is that it it might be that if you have huge files fq is not very good because it's it's built in a way.
[24:22]Because of technical limitations or things I would like that I wanted that kind of is in the way of making it fast for big files I wanted to there are some features that that requires to kind of have the whole file in memory at the same time so then you can't really but there are some to some extent you have to decode the whole thing to do the queries kind of but I'm just sort of thinking again, it could be a first pass where you end up with some JSON data that you could then output from FQ I presume, so you could start off with a big PCAP that has life, the universe and everything that you don't want, and you use JQ to only filter it to, a big JSON file of the stuff you care about, and then you save that big JSON file, and from that point on, you now work in plain old JQ, and you process that JSON file, which JQ is fantastic at. Yes, that's how I've done several times to kind of minimize down the thing you're actually looking for. where I'm going.
[25:23]On a related note, so JQ comes about because we want to process JSON, and it turns out that JSON represents something really generic.
[25:34]
Exploring XQ Project
[25:30]Therefore, with FQ, you have found another use for that same language. And something we discovered, or I discovered, on a cyber security blog recently, is XQ, which is a project that tries to, they haven't quite got feature parity yet with JQ language, but their aim is to support every built in in JQ. And they have input filters and output filters. So you basically say, I want to take this TOML file, I want to query it using all of the JQ language primitives, and I want to output, say, a YAML file or a JSON file or even CSV. Although for CSV, they then say, well, now, if you want to write CSV, I'm afraid to say you cannot have any dictionaries at the last part. You need to be down to just arrays, whatever. But it's actually the fact that that language can be reused. It's kind of fascinating to me and, uh.
[26:24]So I presume you guys aren't upset to hear that people are taking your language and using it for other projects. No, I think it's I'm doing everything to to make you more like the language more used or more known. Because I think it's and I don't really I don't really care if the people are used as long as they write you. I don't really care if they use FQ or whatever. They use whatever you need. Well, I'm really happy to hear you say that because. So that's what I'm trying to help people understand, to evangelize what it actually is. I feel like a lot of times people, even myself, I remember like maybe five, six years ago, I didn't know much about JQ at all. I thought it was like a JSON indenter, more or less. A pretty printer. A pretty printer for JSON. I don't remember one day I was doing something and I knew that you could write filters in this weird syntax that I didn't understand. And I was reading the... I guess I did mon man jq and then like, what the hell is this?
[27:38]
Evolution of JQ Usage
[27:36]I mean, I think that's how a lot of people find it, right? You go to Stack Overflow and they give you one little piece of jq that probably has one map, function maybe or whatever or map values or something and it outputs something useful and then you go oh that's interesting yeah and then you start to dig a little deeper go what if i could multiply it together while i was doing that oh i can do that too and oh i can do this and then before you know what you have this giant big thing and then for me the light bulb moment was the minus f flag, where instead of me thinking i have to fit all of my logic onto one line that goes on the command line i suddenly end up with an indented file that is nicely structured where i can have comments and I can even define my own functions at the top. And once I realized I could put my JQ into a file, that was game-changing. That utterly changed things. Because I now have JQ files for problem domains. Like, for example, the Have I Been Pwned database is a giant, like, when you query its API, you get back giant big JSON.
[28:36]And you kind of end up needing to merge pieces together. So the minus minus slurp file is fantastic for that. So you basically download the data dump once and it tells you these are all the breaches that have ever happened that we know about. And then you use the API to say, and now tell me where my users are. And then you need to marry those two data files together, right? So that you actually get enriched information. And I just have one big JQ file that has all of the functions I need to marry it out. And instead of getting back a whole bunch of garbage, I just run one terminal command and it tells me these five users, those people need to be told to change their passwords yesterday.
[29:13]
FQ Development Inspiration
[29:14]Yeah, that's kind of how FQ started. I was using the normal JQ because there are a lot of tools for working with media files.
[29:30]To dump them into various XML and custom their own weird custom formats. So you can run like on an MP3 file or a FLAC file and you get some very detailed information about it. But they are all different formats. So what I did was to always like massage this into JSON and then kind of get it into JQ so I can do queries on it. Then I got tired of that after a while. Can I go straight to it? I also wanted to have kind of like, because when you do that, you kind of do, you lose the, you lose the information from where, from where this MP, like the, the thousand MP3 frame, like where is that in the file? When you use these other tools and turn into JSON, you will lose that information, unless you, of course, encode the stock byte or stuff, but the default in those tools you usually didn't, so then you.
[30:27]That's also why I didn't end up writing JQ or FQ. That makes so much sense to me. I can see exactly how the problem to be solved leads you straight to that solution.
[30:42]
Benefits of JSON Piping
[30:39]The other thing is I am forever looking for a minus minus JSON flag. If you tell me about something in the terminal, I am just looking for something that goes minus minus JSON as an option because then I can pipe it to JQ and then rearrange the key so they all make sense. And what the listeners have heard as the episode before this one, is a case study where another one of our listener contributors, Helma is someone who contributes a lot to the programming by self, but not usually on air.
[31:09]But herself and Alison did an episode together. And the problem to be solved was she was getting a new Mac and she wanted to know what apps have I got, which apps are already M-series apps and which apps are universal and which apps are x86. And she discovered that the terminal command to tell you what's installed has a minus minus JSON flag which was then the start of a rabbit hole that ended up with this .jq file that did all of the transformations and then renaming all the weird things Apple does like calling it if it came from the Mac App Store but you got it on, sorry if it came from the iOS App Store but you got it on a Mac it calls it an iPad app and just transits all of that weirdness and give you back with the app CSV.
[31:53]Output format give you back a perfect csv of exactly the information that you wanted you know and that just sounds like the kind of thing a lot of people would end up doing with jq it's like, oh this command gives minus minus json and we're off i would say it's one thing i've thought now when i'm now i'm such a big jq user so i use it for more or less everything instead of awk and things and everything so now i'm so used to it now so i use it for everything but But one thing I've noticed is that JQ is very good at taking text formats, like semi-structured, non-JSON as input, and then you use this dash R, the big R.
[32:41]So you get like, it takes like all the lines or even the whole file as one big text. So you get like one, you can get like a, or you can slurp, raw slurp things into, then you get like a big string of text. And then you like the JQ language itself is very good for like splitting and using the Regex things to kind of like write their own like ad hoc parsers kind of. Good point, yeah.
[33:11]What's it called? This capture, I think it's called the built-in. Yes. That is very, very, very powerful where you can kind of like even give names to the groups and it even spits out like objects for you. Like it's super, super powerful. Yeah, I do that quite a lot because I love regular expressions. I'm a big fan of regular expressions. They don't scare me. But when I've done a regular expression, I want the output to have sensible names for each of the capture groups. And that's where a built-in like capture is great because once you've captured them to named fields, well, from then on, you can use the normal JQ syntax in a very sane way. So all of your insanity is captured in one line and then sanity for the rest of your file. Exactly. So you use the JQ to make sense of things.
[33:56]So I am really happy to hear you say that you see the language as being a much bigger thing than the command line tool because I spent a lot of time in the series always differentiating between the language and the command. So I would say in the JQ language, and I would not use a code style fixed with formatting. And then whenever I talked about, you know, the minus F flag, I would write JQ with the code style fixed with formatting to basically differentiate I am talking about a command line tool versus I am talking about a generic language. And for a long time, my co-host Alison would be quite confused by the difference until we encountered XQ, you where it's like all of the bits where i wasn't saying the command all of those bits you can bring with you to this other tool and the only thing that is different is the, the command line tool the minus f or whatever and so again fq then is exactly the same concept the fq will have different flags to jq but the language and the filters is your friend your friend, fq even tries to copy how the cli tool works because that's even more at home yeah.
[35:09]Xq has intentionally not copied it which is probably better on the whole because it does such different things and so it's not confusing enough to trick you so it will be like minus minus jason and you know minus minus tom all and stuff so it's kind of intentionally different, but the fact that once i open that single quote and i start to write the filter the filter is my good familiar jason i'm sorry jq language and there can be no jason involved it could be tom all going to yaml and i'm filtering it away with the jq language and at no point was there any jason created or harmed in the process.
[35:48]I was going to say, I think we got into FQ because I was going to say that FQ itself is not written totally in JQ, of course. It's a lot of Go code. But you can say that the CLI part of FQ, like the CLI interface is actually written in JQ, in FQ.
[36:09]So it is that You can say that the main function is written in JQ in FQ. So the CLI, all the command lines are parsing, and everything that is kind of... You can say that JQ is the one that is the controlling language, and then just calls out to different Go functions that are built-ins. So FQ has some special built-ins that is not part of the normal JQ, a standard library. So it has like there is like a read line built in for example to implement like an interactive shell and it has like it has some function for opening a file for example that the AQ doesn't have. So it's like it has one function to open a file and then you get like a special like a binary file. It behaves like a string but internally in FQ it keeps track of that, that this is actually not a string. This is like a reference to an open file. So then you can kind of pipe it into a decode function.
[37:24]So then the code function can kind of in how it works is that it i don't want to change that i don't want to add the new type into jq or into json because or it's right i've tried in the beginning it did not work well you don't want to have a new like a binary type in in jq it like all the buildings with break and all those like so you have to so there's a so some of the especially FQ functions, they can take like the value input and it has like special things that can kind of like interrogate the value and ask, this thing, is it actually binary? So then it can kind of, okay, okay, it is binary. It can kind of say like, and if it was a string, it would actually turn it into a binary, like a UTF-8 bytecode, or code points. That's another thing I love, is that it's UTF-8 all the way down in JQ, you don't have to think about it.
[38:26]There's also one of the problems with JQ is that it has UTF-8 as input and output, because you can't have raw output, because there are some things you can't. There are some byte values that are not valid in UTF-8. So then it breaks down. It's funny, because in my universe, the biggest problem is UTF-8 not being supported, because I'm thinking in terms of strings and characters and stuff. And so for me, the fact that JQ is UTF-8 all the way down is this amazing luxury.
[38:59]In the binary world, you need every possible byte combination. Yeah, for example, like for example, in with FQ, you maybe you want to open some binary file, and then you want to output, some frames or maybe part of a pcap file and you want to select part of it and then output that into a new binary file and if it was utf-8 out it would it wouldn't work it would it would encode stuff like it would it would end up encoding this as replacement characters or something so sq has support for knowing that okay the the thing that the outputting now is actually a binary but then I'm okay with just, I will just output it as it is. Gotcha. Kind of, that's how it, yeah. So you, so I meant, what I meant with the, with the, how that FQ is written in JQ is that, that the code is quite complicated nowadays, but in the, in the beginning it was kind of a, like the main function in FQ was more or less like a read pipe, like open pipe, decode CodePipeDump. That was kind of the main .c, .jq.
[40:17]Cool. I do like this bootstrapping thing, because I think a lot of people don't realize how much code under the hood is actually written in the language itself. A lot of JavaScript is JavaScript, a lot of PHP is PHP, a lot of .jq is .jq, which is something that unless you've written a compiler, you would never think. At least for .jq. In the beginning, it was actually written in go most of the like the cli was also written in go but the the more features i i added the more i realized that this this gets more and more like complicated that the go code has to call kind of jq code and jq code has to call go code so it was easier to just kind of let the jq code be in control and then just call out to go code but the go code never not really true there are cases where the Go code actually calls JQ code, but that's like very internal things. But then if you do it that way, is that everything, if everything is in JQ.
[41:18]All these generators and things just works, like it behaves as the CLI, like it just feels natural. Right. Like there was a lot of things that just solves itself automatically because now it just works the way, if you want it to behave like that, that you want the CLI tool to behave as the JQ language. If you have noticed that you can kind of say that one of the things that took a while when it's like aha moment was that when you realize that the CLI tool is kind of like that you have an input, that it outputs several JQ values. That's like the generator. So it's like the CLI tool itself is kind of like a JQ function also, kind of. If you don't give it a lot of weird filter arguments.
[42:07]It's like typing two JQ command processes together is kind of like nearly as... So it's kind of like it fits very well together everything. No, you're right. And before I discovered the joys of using minus F and having my JQ in a separate file, I was often resorting to piping JQ to JQ to break my logic into pieces.
[42:32]I mean, it works. And it works really well. Like a generator. The whole CLI tool works like a generator. And because it's JSON, you can just tape it to the next JQ. So it's a and it yeah that is and uh and behavior implementing that behavior was very easy when itself is within jq cool so one of the things i was slightly i i definitely want to touch on in this conversation with you so you you you spend a lot of time in jq jq is obviously very much something you understand and you you think of the universe in a jq like way now now that you've you know discovered its power uh but when i was writing the show notes i was doing one of those really scary things is i was only about two or three lessons ahead of the virtual class i was learning jq a little bit ahead of what i was writing in the show notes but not very far ahead and then i was trying to write the show notes and i was trying to try i don't like to use the wrong jargon when i'm teaching something because then people can't go to google and they can't get useful help because I've used the wrong word. So I was always using the word filter and stuff like that. But I did find it difficult to try to get my brain to think the way JQC is the world because it's so different to other programming languages I've used because it's not a programming language. It's a data manipulation language.
[43:55]I am very curious how you found listening to us try to explain JQ. Did you, how do you think, how is that?
[44:03]I think you, I think, I think you understood it similar to how I do, but maybe you used, I think you just use different terminology for some things. I think the biggest thing you will have noticed is that because of history on our series, we didn't refer to objects as objects because we started off in a JavaScript world world where there was a difference between an object that had functions and an object that only had data and so we started to call the ones that were only data dictionaries because they mapped a name to a value and when we moved into jq i i argued with myself a lot do i call them objects even though everyone listening to the show thinks an object can contain functions or do i continue to use the programming by stealth word dictionary which is not really, it's not either JavaScript jargon nor JQ jargon, right? It's programming by stealth jargon. And I argued with myself a lot, but I decided to bring the dictionary jargon with me into my description of JQ. So I imagine straight away when you heard me talking about sequences and dictionaries, you were like, oh! It's okay.
[45:12]
Terminology and Jargon Challenges
[45:12]I'm trying to remember what it was that they, there was something.
[45:21]I can't remember exactly what it was it was some terminology that was, But I think what I've noticed, what people have the hardest time to understand maybe in JQ is the generators, I would say. That everything is a generator. Yeah, sometimes it's a generator that makes one value and sometimes it's a generator that makes more than one value. But yeah, that took me a long time to understand. yeah that's uh that took a while in when it kind of when you get used to it that and you realize that everything is just generators even it's it helps to even think about uh like string literals or something as generators also that just outputs themselves kind of yeah one string comes out yeah yeah yeah and then it becomes very very consistent all the way down, and very logical and sometimes when jq does things in a way that's unusual when you stop and think about it as a list of generators it's like oh okay well of course it would be that way around um and that's actually something i wanted to commend you guys on the documentation.
[46:36]A lot of times it does a very good job of explaining the thinking, and it really struck me when you put the section on defining your own variables way way way way way down the bottom like it's nowhere near i'm used to going to my first programming language and the very first thing you learn is this is how you make a variable this is an if statement this is a for loop and when you come to jq the first thing you're told is don't make variables don't use conditionals and definitely don't use loops right because that's not how this language works, even i remember from i i haven't written that part of the manual i haven't written, most of the most of the documentation hasn't changed much the recent years so it's in a section called Advanced Features.
[47:19]Finding a variable is advanced features. Yeah, maybe it should be. I guess most people maybe don't need these bindings. But yeah, they should be described much better because they are very useful. I mean, it is this thing that if you don't know about them, I think there are some things you can't do anymore. You will fail to do it. like you you will think like if you don't know that they exist you will struggle yes and i can give you an example i can give you a fact so i was so the documentation says most of the time you don't need variables for example if you want to get the average you just uh it was an ad and length and add you know add divided by length and hey presto you have your average you don't need to actually make a variable and count it up and loop over it you just do it in one step and i was like, oh, wow, okay. And so I made a mental note, don't teach variables in the series until you find a problem where there is no other answer.
[48:21]And the problem where there was no other answer was when you have a dictionary that is structured on that is indexed on one key, and you want to reorder it so that it's indexed on a different key. And there's no way to do that without keeping track of one of the keys as a variable. And at that point in the series, I introduced the as keyword, word. But until then, I didn't. And I'm kind of glad I went that way around it because it means that me, as much me as anyone else, the question is always, do I really need a variable or am I doing this wrong?
[48:52]And a lot of time, the answer is no, Bart. Simplify. This is way easier than you think. You're overcomplicating this code and don't make it a variable.
[49:02]I mean, if you can do it without a variable, you should probably not use a variable. It depends a bit, of course. I have had situations where I have a very complicated JQ code. And then I could write this, nearly code golf this into not using variables. But it would be just like a mess. so maybe you want sometimes to I even write very verbose.
[49:34]Jq code to name things to make it clear I guess it's all programming it depends what you want to what are you maximizing because maintainability is something I sometimes see developers forget about so they say yeah but by making this variable but we're wasting memory and this isn't efficient. And I'm like, yes, but six months from now, when you're gone and I'm gone, someone else has to fix this code. Are they going to figure it out? Or if I name this variable and put a commentable, but is that going to make this code last longer? I would say in JQ, my philosophy is usually is that if the JQ function, or if you write some function or a query or something, and if it's only is like three or four filters pipe together and you can do it without naming things I usually just let it be without naming but when it starts to become like some kind of nesting and then maybe I start to break out things into, naming things somehow.
[50:42]Yeah which you can do with the update operator and just start to basically say well I really don't like this key being named this city thing because it came from a terminal command and I'll just rename name them and sometimes that's enough, but then there are times when you just want to capture the variable and say five filters deep here in this nesting I'm really going to want to have a quick and easy name for this thing I now know, and I'm just going to shove it into a variable give it a nice human friendly name and then later in the file I can call it by its nice human friendly name.
[51:14]Cool well look thank you firstly i should say to the to the listeners a big thank you to you because you've actually reached out to us so you saw that we were doing a series on jq and you reached out to us on i think it was mastodon i think you're one of those modern mastodon people, twitter thing um and you basically offered to come and have a conversation with us which no No one has ever done before. No one has ever, mind you, the JavaScript developer is probably too busy to come talk to us.
[51:44]
Exploring FQ and CLI Interface
[51:45]But it was really nice of you to reach out. And so I just really want to thank you for, we had arranged this interview and then I had a bit of a health crisis and then we had to rearrange everything again and you were really good about that. And so I just want to say a big thank you to being so forbearing and for giving up a not insubstantial chunk of your afternoon here. So thank you. No problem. I've tried to spread the word of JQ, so I've been trying to write, JQ programs and JQ tools for making it easier to write JQ programs to show what you can use it for Cool, The other obvious thing is would you like to tell the listeners where they can, is there anything you want to plug basically, so obviously FQ, I will put a link to FQ in the show notes If they're working with, you can actually work with the text format also it has support for xml and html and a lot of and it actually has function to turn it back into some other format also so if you do you can use that, well that's giving me a whole bunch of other ideas because well i have some code at the moment that's using a really heavy javascript library that basically does an entire document object model and all i really want is to pull the title out of an html file.
[53:02]So I can try it yeah it's probably more efficient, I mean maybe go a bit deeper but the problem with turning XML or HTML into JQ is that it's very hard to do to model XML and HTML into JQ in the same way that you can of course model it in a way that is like not lossy so that you preserve all the orderings and white space and but if you do that you will end up in a JSON structure that is like, you don't want to query it. Yeah, it's like, it's going to end up with a race of a race that is like, you don't want to write JQ queries for this. So what FQ does is it actually kind of lossily turns it into a more JSON-ish, So then when you turn it back into XML again, some things might get lost. It tries to keep ordering new things, if you want.
[54:00]I'm curious now. It's something to play with, and that's always fun. Anyway, sorry, I was a bit of saying, so JQ will be in the show notes. Sorry, FQ. Any other links you want to give people? Say your Mastodon handle would be a good start, given that's how we met each other. In the show notes or something. Yes, we'll do that. We will do that. Otherwise, I have I think it's JQJQ might be interesting which is a JQ interpreter written in JQ.
[54:32]I love it. I love recursion. And it's kind of like, it's just a fun project to see how it I was like in any language you want to it's interesting to see if you can write in itself the language and JQ didn't have any, nobody had tried that I know so it turned out to not be that hard but maybe it was just me having spent, too much time with Jeku so I didn't think it was so hard it had some parts that was very hard to I had to go back and read some more academic papers about parsing to remember again how do you do how do you do LL parsers with recursive descent when you have binary operators and you learn a lot about parsing at least on the way. As I say, this will all be in the show notes, so this is quick.
[55:36]JQLSP may be interesting for people, which is a language server for JQ. Language server? Give me a hand, what does that mean? So an LSP is... Like most modern text editor have a LSP support. Like VS Code or like NeoWim or Emacs I think also has. So I have an LSP for JQ so it does like syntax checking on the code while you're writing. And you can also even press in some at least at least in previous code you can kind of go to definition so it will actually jump to where it's defined and it will actually and it also checks for like it it keeps track of the lexical scope so you can so it will tell you that this function doesn't exist or this binding doesn't exist it's very useful if you're going to write a lot of Oh, it sounds it. And I happen to be a VS Code fan, so you're singing my song here. Yeah, I don't think I could have written jqjq without jqlsb. I would have gone insane.
[56:49]That sounds fair. That must be one heck of a lambda jq file. jqjq. It's like 2009, I think. It doesn't support everything jq does, but it supports most things. But it was a way to learn JQ also for me to kind of like understand the really nitty-gritty parts of JQ. Right. I found a lot of bugs also. There are quite a few open bugs now on JQ that is like, because you end up with this very semantical thing, like what does it mean to write this generator plus a generator? What does it mean if the assignment outputs several values on this side, is it only the first that gets it? Or it gets very, maybe norm, like the average JQ user that will not hit this. But if you're writing a JQ interpreter, then you will. When you try to re-implement these things, you have to know what to do. Some parts you you get forced to decide what does it mean when this happens.
[58:07]
Defining Language Semantics
[58:07]That's actually a really interesting way to check if your language is fully defined. If you try to write it in another language and it forces you to answer questions that the documentation doesn't answer, then you've kind of done a really good sanity check on JQ by redoing JQ. And it's like one of the reasons why I really like that there are a couple of different JQ implementations is that we have found a lot of things in the original JQ that doesn't make sense. Some of them will probably get changed and some things maybe will stay weird because we can't change them.
[58:48]And then maybe some other implementation maybe have to implement the weird way to stay compatible. Or maybe there are or something like Jack, for example, which is a Rust implementation of JQ. And that one is making a lot of progress and becoming compatible. But I think the author of Jack is aiming more for some kind of correctness. Like try to make how JQ probably should have been.
[59:24]So there are some ways that are probably going to be better.
[59:29]It's going to you can't like i think it has like if you index into null it would actually throw an error instead of just i think it is and jq i think just returns null if you do it i don't remember that there are some statistics like jq is very liberal with some things and and jack is going to be more strict and they may be forgiving i think very forgiving and jack i think is not going to be as forgiving but uh it also could be i think it's so and then some part i think is also because of performance that it's like if you can change the order of these things in the reduce you can skip doing some other things i don't there there is a the jack readme file has a lot of explanations to why he chose to do some things. He even has two academic papers about JQ. One is an academic paper. I don't know if he has presented... I think he sent in one of the papers to present somewhere at some conference about it. I don't know if he has done it yet, but I can send you the links to it. One of the papers is about just defining finding the semantics of the language.
[1:00:51]And then another one, I think the other one is the implementation of it, like how Jack implements it with all the different like...
[1:01:01]Lowering into different virtual machine kind of, how that works. If you're into YaQ, it's interesting.
[1:01:11]Even just with my computer science hat on, because I studied formal computer science as an undergraduate, so we did compiler theory. So a lot of this stuff is kind of fascinating. I blow the dust off my brain and I try to remember back to my compiler course.
[1:01:30]
Acknowledging JQ Maintainers
[1:01:28]But no, that's fascinating stuff. So, look, again, thank you very much. All of these links will be in the show notes for everyone. And obviously, the most important thing to say is thank you for the work you do maintaining JQ, because without JQ, my work life and my fun life would both be less good than they are because I use JQ both for hobbies, because I'm the kind of nerd who writes hobbies in JQ. But even just with my work hat on, it makes my life easier and it saves me a lot of time. So genuinely thank you for the work you do maintaining jq and i should thank you thanks i will forward the thanks to the other people who are i thank you i will tell nico about it.
[1:02:10]I will see if i can get a hold of him to see if you can he can do he could probably tell you a lot more about the history of he has been been around for a long while it would be kind of interesting because the origin stories of things tend to be quite fascinating i have asked him about it a couple of times because it's a bit blurry for him also i think it seems like he doesn't even know also know some of the details on how how how this uh how stefan the guy who started it how what the ideas was but we'll see maybe there will be a follow-up episode with Nico. We'll see. Hopefully. We shall see. And of course, that's evergreen content, so it can happen anytime, right? There's no particular... It's not time-constrained, which is the great thing about our origin stories. They stay true forever. Yeah. Anyway... Maybe I should thank some people. I would like to thank Nico and Stefan and Ichny. I don't know his real name.
[1:03:14]That's a very nerd thing. Emanuel is also one and now I only remember the active ones that are working on it, but there have been a few maintainers that have worked for many years that are not active anymore, I think it's William Langford maybe. And Mikael Suu and Daniel Tornoli I think have also been active so they are probably the ones that have done the most work I feel more like.
[1:03:50]What do you call it like I'm just keeping cutting the grass a bit and taking care of, maintaining you're carrying the torch forward someone else started the race and you're carrying the torch at the moment and then at some stage you probably hand it off to someone else And that's the great thing about open source is that because it's open, it's free for people who have the time and the energy and the vision to move things forward. And we all get to benefit from, you know, other people's work, which is, I mean, these podcasts and stuff that we do, they're all open. They're all Creative Commons and stuff, because the whole idea is that we're
[1:04:26]
Embracing the Open Source Philosophy
[1:04:25]trying to pay it forward. We've benefited from lots of things other people have done, and hopefully we get to help others do other cool things. And that's sort of the open source idea. Right that's the reason why i'm here so talking about this let people know what what's out there, well as i say thank you very much for your time it was it was really good fun chatting to you actually yeah i i've we've only just met and i feel like there are people right your fellow nerds, we could probably probably have a lot of good fun over long beer but anyway thank you for your time today thank you for contributing and thank you for for being on and for all this stuff you do and um enjoy the rest of your monday thanks for.
[1:05:10]Music.

Error: Could not load transcript. Please try again later.

Reload

Loading Transcript...