CCATP_2024_06_22

An audio podcast where Bart Busschots is teaching the audience to program. Associated tutorial shownotes are available at https://pbs.bartificer.net.

2021, Allison Sheridan
Chit Chat Across the Pond

Automatic Shownotes

Chapters

Introduction
Evolution of Programming Skills
Transition to Shell Scripting and YAML
Popularity of YAML
YAML in Show Notes
Using YAML in Note Organization
YAML's Key Priorities
Personal Knowledge Base Journey
Evolution of Data Storage Formats
Human Readable YAML Example
YAML Terminology and Design Goals
Types of Collections in YAML
YAML Versions
Unicode Strings in YAML
Quoting in YAML
Nested Arrays and Dictionaries in YAML
Teaser for Next Episode

Long Summary

Alison Sheridan hosts episode 796 of Chit Chat Across the Pond on June 22nd, 2024, welcoming guest Bart Bouchats for Programming by Stealth 168. Bart jovially shares personal anecdotes before delving into the surprise topic of YAML, highlighting its role as a data representation language akin to JSON. He explains the evolution from focusing on practical programming to developing broader skills like shell scripting and understanding JSON for web APIs. Bart discusses the prevalence of YAML in configurations and front matter, describing how YAML enhances organization in tools like Obsidian for managing personal knowledge bases. He illustrates YAML's simplicity through example config file snippets, emphasizing its user-friendliness for both humans and computers.

The evolution of data formats is explored, starting with Comma-Separated Values (CSV) and Tab-Separated Values (TSV), progressing to XML, and finally discussing JSON and YAML. The simplicity of YAML is emphasized, highlighting its human-readable and embeddable nature, allowing for flexibility in data visualization. The conversation touches on tools like pbpaste and pbcopy for efficient command-line operations. YAML's elegance in data representation, structure, and document definitions are explained, introducing the concept of YAML front matter and its implications in structuring data within documents.

The complexities of YAML syntax are further unraveled in the episode, exploring various elements and nuances within the format. Details such as the use of three dashes at the end of a document and the structure of comments in YAML are dissected. The significance of indentation in defining relationships between elements within the document is elucidated. Scalar values, including null, boolean, and numerical data representations, are clarified with distinctions between accepted formats in YAML versions.

The Unicode nature of strings in YAML is emphasized, along with discussions on single-line and multi-line strings. Collections in YAML, such as sequences and mappings, are explored, likening their structures to markdown bulleted lists and key-value pairs. Core syntaxes for sequences and mappings, nesting techniques through indentation, and practical examples are shared to enhance comprehension. Challenges of handling arrays of arrays in YAML are teased towards future exploration and learning. The episode concludes with gratitude for listener support and teases upcoming episodes focused on delving deeper into YAML syntax and tools like YQ.

Brief Summary

Join me in this episode of Chit Chat Across the Pond where I host Bart Bouchats for Programming by Stealth 168. Bart shares personal anecdotes and explores the surprise topic of YAML, a data representation language similar to JSON. We discuss the evolution of data formats, from CSV to XML, highlighting the simplicity and human-readability of YAML. The importance of tools like pbpaste and pbcopy for command-line efficiency is emphasized, along with YAML's elegance in data representation and structure. We delve into the complexities of YAML syntax, clarifying elements like indentation, scalar values, and collections. Tune in for insights on strings, sequences, mappings, and upcoming episodes dedicated to exploring YAML syntax and tools like YQ.

Tags

Chit Chat Across the Pond
Bart Bouchats
Programming by Stealth 168
YAML
JSON
data formats
pbpaste
pbcopy
syntax
YQ

Transcript

[0:06]
Introduction
[0:00]Music.
[0:07]Well, it's that time of the week again. It's time for Chit Chat Across the Pond. This is episode number 796 for June 22nd, 2024, and I'm your host, Alison Sheridan. This week, our guest is Bart Bouchats with Programming by Stealth 168. How are you doing today, Bart? I'm just going to say grand because that's just easier for the listeners. I'm not really, but I'll be fine in the long run, I'm sure. And when it's all over and I can laugh about it, I will tell everyone what's going on. But right now, I'm pretending it's not happening. There you go. Nothing life-threatening. There we go. We don't think. We don't think. Yeah. That made everybody feel a lot better, right? Yeah, yeah. I shouldn't mention the two ambulance trips either. Look, here, I have a sense of humor. I'm laughing. I'll tell you about it later.
[0:52]
Evolution of Programming Skills
[0:53]So we are at another one of those crossroads, right? Because I thought we were doing a two-episode show on JQ. I think I thought it was two. Maybe I thought it was three. I didn't think it was like 10 or whatever we ended up doing. but it was a lot. And then we sort of were coming into our summer hiatus and stuff but I sort of decided I'm going to sneak in another topic before we go on hiatus. So while trying to decide how to introduce this topic I sort of spent some time on the PBS front page looking at everything we've been doing to try figure out
[1:27]
Transition to Shell Scripting and YAML
[1:26]where we sit in the big picture. And so we started off many moons ago go very very focused on being programmers right coders it was all about you know very practical html css javascript it was coding coding coding and then when we went to phase two where it was all about helping the community rebuild xkpasswd we zoomed out we switched from a narrow focus sense to wide focus sense and it became about developer tools and so we learned a lot and i'm kind of surprised how much we've learned because i went node package manager javascript modules Jules Linter's JS doc test suites with jest bundlers with webpack um and, Mermaid Diagrams, Bash Scripting, and JQ. Wow. You know? A lot more skills.
[2:14]Apart from Bash and JQ, they were very, very laser focused on delivering XKPassWD. But in order to be an efficient software developer, you actually need to have some more generic skills. And shell scripting is really obvious as a generic skill to have because all of the build tools and stuff, you're in the shell all the time for that kind of stuff. of. And given how much JSON is used for web APIs, it seemed like a scale I think we should all have as well. And now today I'm taking us into another language, which is, I think I may have wrongly described it as a markup language last installment, because when it was first invented, it was jokingly called yet another markup language. It is YAML, but it's actually a data representation language. It's equivalent to JSON more than it is to HTML, if that makes sense. It's for writing up data, not for writing up text.
[3:10]And our connection to YAML is arguably a little bit more tenuous, but actually it's only a little bit. So YAML is becoming really popular.
[3:20]
Popularity of YAML
[3:21]It's used all over the place. And I still remember the first time I saw JSON, a little light bulb went off in my head and i was like this this is a thing this is going to be all over the place and it was and, when i recently i say recently about two years ago discovered yaml for the first time the same light bulb fired in my head and i've been watching it since and i'm seeing more and more open source projects prefer to use yaml for the config files a lot of them will do things like you can use yaml or json but we prefer you to use yaml and then all of their documentation is in yaml and they say, by the way, you can use JSON if you like. And so...
[4:02]I had a poke around at the amazing work the community are doing on xhpastor.vd and I honestly expected to find a YAML file. And I tried really hard, but I didn't. So I was slightly disappointed. You were hoping to say, here, we're already doing it. I was kind of, but then I realized, actually, yeah, we are.
[4:21]
YAML in Show Notes
[4:21]These show notes that you may or may not be reading right now, they are held together with YAML. They are on GitHub pages using what is effectively the Jekyll open source static site generator. And Jekyll's config is for this website. Our GitHub pages doesn't look like everyone else's GitHub pages because we have a custom theme, which means I had to make a configuration to tell GitHub pages to use a custom theme. So back up, I think you skipped a little step in there. So the show notes are all written and hosted in GitHub, or hosted, the repo exists with all the show notes. But then there's a GitHub page that's created from those, and that GitHub page is what is pbs.bartofisser.net.
[5:11]Correct, yes, exactly. And that is delivered through Jekyll with a YAML config file. And the theme is written, well, it's a Jekyll theme, it's full of YAML. The theme is chock-a-block full of YAML. I wrote the theme, so that's also more YAML. So actually, these very show notes are YAML. So this is a very, very meta episode, because the notes you're reading are held together by the thing we're talking about, which I think is kind of cool.
[5:37]
Using YAML in Note Organization
[5:38]Okay, but to make it even more meta, and I'm going to be talking about this, let's see, timey-wimey, wibbly-wobbly going on, I think it's going to be next week, I'm going to be talking about how I was forced to use a new tool because you're actually using YAML to organize the show notes as you're writing them on your desktop in Obsidian. Yes. So there's, there's another, another piece. It's just a tiny little bit of code at the top, but it was enough to cause me grief with my tool. So I've changed tools in order to just edit what you're doing.
[6:12]And that gives you something new to review, I'm guessing. So I'm hoping you're enjoying. Well, physics nerd Graham has actually stolen the review, but I'm going to write an addendum review. He's doing his review, and then I'm going to follow it up with mine. So it's meta all the way down. Excellent. Well, as I say, so there's three reasons I think we should do YAML. So first off, I am almost certain that at some stage before the HK passivity project is out, there is going to be some YAML in there. And also, anyone who's listening along, you're going to meet YAML. I don't know where, I don't know when, but you're going to meet Yammel.
[6:46]I also, it's a bit like JQ. I've sort of fallen in love with it and I want to share. And it's kind of my show, so it's only half my show, but still. I write the show notes and I don't know. If I were to ever tell you what to be passionate about, would this show work at all? Nope. No, not very well, no. I mean, you know, we can veto things that you're dispassionate about and that's fair enough. But yeah, you're right. If I'm not excited, it won't be much fun. But this isn't going to be a 10-part episode, or 10 parts on YAML that we know of.
[7:20]Ah, but I know, because both parts are written. So it's two parts. Okay. It's two. I don't trust you, Bart, because next week you'll learn something new it can do. I've also written episode 170, which is on a different topic. Okay. So there are three episodes in the bag, and the third of them is not YAML. Therefore, we have to finish in these two episodes. So it is going to be a two-parter. Good. So part one is today is going to be putting it in a bit of context, learning the basic syntax and sort of how YAML thinks about the world. And then the second installment, we're going to pick up on the more advanced aspects of YAML syntax and a tool I came across as if the universe just wanted us to do YAML. So we've just finished JQ.
[8:05]And the day after we finished JQ, within 24 hours, I turned on my daily security podcast from the SANS Internet Storm Center. And they had a special article on a tool called YQ which takes the JQ query language and applies it to any data format that it can read including YAML, CSV, XML so you can use everything we've learned in JQ to query stuff written in other markup languages or other data languages and you can you can read it in one language process it with the JQ syntax and output it in a different so you could take some YAML YQ and you can, yeah so you said you said jq correct the jq language so that syntax we learned those jq filters okay yq uses the exact syntax oh oh cool but it can read anything and write anything so you could take a csv file use all of the jq syntax we've learned and output a yaml file or read a yaml file and output a json file or read a json file and output a csv this is the magic of open source right because these languages can get embedded inside each other.
[9:16]Yeah, exactly. So we're going to spend some time with YQ because, man, how could the universe drop that in our lap and how could we pass it by? So that's part two. And as I say, that's all written. So let us start with the big picture. So YAML is a language for storing data and its primary objective, when you read its own documentation,
[9:39]
YAML's Key Priorities
[9:37]it says up front it has a certain number of priorities. In order, priority one, be human readable. That is their driving goal here which is very like jason or sorry very like markdown it's it's very markdown in that aspect but it also needs to be utterly unambiguous so computers can interpret it too so we human should be able to look at it and go ah yeah and computers should be able to look at it and go i know what you mean and that of course gives it great power so i think of it as the markdown of data. So it wants to be to data what markdown is to text. So as an example of how simple this is, this is a seven line snippet from the actual config file holding up this actual web page that we are now reading from and you may or may not click to after you listen to the show. It is, like most config files, a bunch of key value pairs. It is just a key colon a value, remote theme colon barterfisher slash barterfisher Jekyll theme, title, programming by stealth, Email podcasting at bartofhistory.net. Description, a blog and podcast series by Bart Boots and Alison Sheridan.
[10:46]It's pretty obvious what that is, right? Right, right. And that is a dictionary in YAML, believe it or not. But we'll get to more of that in detail. But pretty straightforward. So this is why it's used so much in configuration files.
[11:07]
Personal Knowledge Base Journey
[10:59]And its other big use is in front matter, which is metadata inside text in Markdown format. Format and again metadata is key value pairs right what tags are there it's going to be you know tags will be the key and the value will be an array of strings um you know custom title for the page and stuff all that stuff is in the front matter and that is all just key value pairs so that's why it's so at home um so and now we've got to the bit that you mentioned earlier so So for many, many, many years, I have maintained what I now call a personal knowledge base. I didn't used to call it something so grandiose, but Obsidian, which is where I've landed, uses that terminology. And it's actually quite appropriate. So I've been keeping, you do this too, right? You use, oh, what is the one you use? Bear? Is it Bear? Anyway. You're talking about my Markdown editor? No, not your Markdown. So when you find something and you keep a note of it so that you, you'll remember that you did it once and you go searching.
[12:04]Google? I don't know what you mean. Oh, well, I use keep it. Keep It, thank you. Yes, your version of Keep It is my personal knowledge base. So whenever I discover something, it goes into my personal knowledge base. And it started off as Evernote, and then Evernote decided they didn't want to be a note-taking app. They wanted to run my life, and their app became awful. And then I switched to Joplin because it was open source, but it turns out it's by nerds for nerds UI, and it made me pretty cranky. And then I ended up on Obsidian, back-ended in Git. And I feel really very, very, very at home because it's got a nice shiny UI, but when you peel it back it's markdown files with the yaml front matter in a gate repo that i own so if the whole thing explodes i have a gate repo with a folder of perfectly readable markdown files and my metadata is still there too because it's perfectly readable yaml sitting at the top of those markdown files so i can't lose my stuff and it's yet more yaml in my life yeah i don't, yet get what the yaml in your notes does for you that you've got them in obsidian requires you to to use it, right?
[13:07]Well, you see, if you just use the Obsidian GUI, you'd have no idea that there was front matter and that it was marked down. You would see a nice GUI editor with text editing tools and a little section at the top called metadata where you could type in your keywords and all that kind of stuff. But under the hood, it's YAML plus markdown. Okay. So that's how it's storing all of the stuff it's using to organize your notes. I don't know what the... It's managing the front matter. But I still don't know what the front matter is. What does it do? Why is it there? The most, well, that's where I put my tags, right? So it says tags, colon, and then a list of the tags. I don't know that. You just told us. Okay. Right, yeah. So basically it's all of the data about your data. So if you're writing a note and you have some sort of a system for organizing it, so as I say, tags are what I'm using at the moment as my primary piece of organization. But for the show notes, I actually use other fields that I made up myself, like what state this is in in the workflow. So you can just make up a field like, I think I call it status, which is early draft, ready to record, published. So you can make them up yourself. They're just key value pairs. But anyway, the point I guess I'm trying to make is that YAML is very human friendly data and you can embed it in stuff.
[14:25]
Evolution of Data Storage Formats
[14:25]So I want to give us a little bit of history of how we got to here, because since the very first days we were using computers, back in the days when they had green text and a green screen, we have needed to keep information in those computers and we have been putting it in text files. And so I believe the very earliest data files were something called fixed with text files, where basically you just decided you all got together as a team and you went, our data data files will have 20 characters for the name then a space and then six characters for their title and then a space and then 40 characters for their surname and every single line of code had to count all of the offsets where's the name why the name is that character 21 and you know so the title is a character 21 and where's the surname it's at 21 plus six plus and if any piece of your code ever got it wrong your data was all garbage because if someone put 20 characters in a 19 character field then every field afterwards was a mess it was truncated or messed up, but it was very easy to read because you open it in a text editor and it lines up right it's literally sort of like trying to trying to write uh tables in markdown and you put in all the spaces so all the little lines line up and it looks nice so you can read it but then all sudden you insert something and you got to go back and put them all in it's annoying as all get out, exactly but that is how we started off and it soon became clear that that was a little bit problematic Dramatic.
[15:50]Brittle. In my actual real-world life, I came across one system old enough to still be written like this. We had to change our mind on how we structure the data.
[16:02]Years, years later, subtle bugs were falling out of that code base because, well, all of the offsets were wrong. Every single line of code, anywhere in the data that referenced that data file we changed, we had to redo all the math to work out all the offsets.
[16:18]Yeah, like I say, years of pain. It's a terrible idea. So needless to say, we humans had a better idea. And the better idea was, why count characters? Why do we take a character that's not going to be in your text and use it to say, this is the end of the first field, let us now move on to the next field. And the character initially chosen was the tab. So you had some data, tab, some more data, tab, some more data. And the files were often called tab files, although technically they're called TSV for tab separated value. But if you find an old file with a .tab extension, that's what it is. It's just a tab file. while this has the advantage of being much less brittle uh but now if your data is a bit jaggedy it won't line up anymore if your data is mostly symmetrical it still lines up but if your data is a bit your data is more than eight characters worth of jagged the tabs won't line up anymore and it looks janky again but it is at least you know not so brittle um but that didn't last long because then we realized that if we've given up on it being easy to read well why don't we actually you just go all in why don't we just go with well csv is what came next get rid of that tab because the tab looks like a space it's actually quite human hostile because is that a space or is that a tab well if it's a space then actually we haven't separated the field if it's a tab we have.
[17:36]And also you can't have tabs in your data in a tsv csv then is comma separated value and that introduced the concept of you may optionally quote the fields so now you can use backslash quote inside your field so you can have quotes inside your data as well as surrounding your data and you can have any special character you like in there just put a backslash in front of it and to this day the most generic data format is the csv if you want to send data between two people and you have absolutely no idea what software they have if you pop it in a csv whether they're pages or google docs or word or sorry excel or whatever you're having yourself if you send them a csv they'll ingest it right every cloud service we ever use we integrate with 20 million things and CSV.
[18:24]Upload your data in csv um so the next language to really take off was the extensible markup language or xml and there was a time when i was an undergraduate or xml was the bee's knees it was the future of all things because unlike a csv and a tsv you couldn't just have plain text data you could have nested data structures you could have like you know arrays and things like that inside an XML file, you didn't only have plain old text fields. And it is, can be human readable. If you choose your tag names well, and if you indent things properly, it looks quite like HTML because it's basically tags surrounding data. And so you can definitely do nice XML, but it's very verbose. It's an awful lot of markup for a small amount of data. So if you've been looking along in the show notes, I've had the same data in fixed width, TSV, CSV, and it's all been pretty similar looking in terms of length. And then we come to the XML. There's a giant big blob. It's exactly the same information. It's the dummy log file from our previous JQ installment about, you know, putting cookies in the mantelpiece and then Santy arriving.
[19:46]Oh, no, I've lost your audio, Alison. You're on mute. I am on mute. Luckily, Bart can see me talking. Yeah, we've had audio issues, so that is a scary thing. So the file that you're showing there, it looks very much like HTML, where you have an opening and closing tag, and it's in angle brackets, and then it's slash and the same thing. So it says timestamp and then slash timestamp. You know what that also looks exactly like? It looks like the text in an RSS feed. I wonder if RSS is written in XML Is it? Bing, bing, bing It is, it absolutely is I wish I'd thought of that because that would have been a great link in the show notes Yes, absolutely, that is one of the many places XML still lives, is RSS is XML It is absolutely XML.
[20:31]It's also really popular in the enterprise because Sun Microsystems who invented Java and Oracle who bought Sun Microsystems and Microsoft all really got into XML when it was popular and those systems haven't changed so you'll hear phrases like soap and xml or pc all over the enterprise it's xml all the way down there is there is xml keeping the lights on keeping our money flowing through our banks xml is still all over the place but i i don't like it uh amidst the open source community xml became a bit of a bit of a thing you turned your nose up at because jason came along which is way more, developer friendly than that icky xml and of course i don't need to explain to you what jason looks like. I think we may have seen more than enough Jason after the JQ installment. But it's, you know, it's nice and pretty human readable. It's actually, it's human readable, but not all that human writable. Because if you forget a single comma, it borks the whole file, and you have to have your opening and your closing brackets. So while it's human readable, it's a bit less human writable. And it doesn't support comments.
[21:33]Thank you, thank you, thank you. That is one of the things I find most cranky and juicing about JSON is no comments. That is absolutely a flow in the format. So all of that then brings us to our friend YAML, which takes the JSON idea and says, well, let's not have these opening and closing brackets. Let's just simplify that right back. And what you end up with is something that is human readable and human writable.
[21:58]
Human Readable YAML Example
[21:59]And that is what I hope you will agree with in one and a half episodes time and in the show notes you can see it is pretty darn short there is very little fluff you could arguably say that there are some superfluous spaces and some dashes and some colons but really for the most part it is purely our information this is the timestamp this is the severity this is the message timestamp severity message for the most part pretty straightforward, forward okay so so this is the since this is an audio podcast let's just say what that said the way it was written now was just dash space time stamp colon and then he's got a time stamp and then on the next line it's it's uh left the line to the the line above it which i'm guessing is probably important it's a severity colon info and then the third line message colon carrot and cookie placed on a mantelpiece, that's the third piece of data that goes together.
[22:56]Yes, precisely. Very human readable, not a lot of extra characters. You have intuited that those time-sum severity message belong together, and that the next time-sum severity message are part of another thing. You intuited that, I didn't tell you that. In fact, I sort of went out of my way not to tell you that. Oh. You did intuit that, right? Well, or there's a vague chance I remembered it from three days ago when I proofread the show notes.
[23:24]Well, I know it's so rare that I should get credit for being brilliant, but I don't want to take it when it's not true. My memory is so bad, I bet I did Intuit it this time. Three days ago then. Yeah, I'm sure you did.
[23:39]
YAML Terminology and Design Goals
[23:40]Okay, so now we get to the bit where I'm going to do the thing I always do, where every language has their own word for things. And YAML has some words that YAML uses to describe concepts. So this is how YAML sees the universe, and I'm going to be very careful from here on out to try and use the right words.
[24:01]Let's start with an introduction to say that YAML was intentionally designed to solve many problems, not just those we care about here. And I'm actually going to quote from the specification, which I don't do very often because usually specifications are dry and not all that pleasant. But I thought the opening to the YAML spec was actually quite informative. So I'm just going to do what I rarely do and read a quotation out loud. I'm not very good at this, but I'll have Have a go. So it starts off by saying what it is. YAML, a recursive acronym for YAML Ain't Markup Language, is a data serialization language designed to be human friendly and work well with modern programming languages for common everyday tasks. Then there's a few more paragraphs of fluff, and then the bit I really want to quote. There are hundreds of different languages for programming, but only a handful of languages just for storing and transferring data. Even though it's potentially and virtually boundless, YAML was specifically created to work well for common use cases such as configuration files, log files, interprocess messaging, cross-language data sharing, which is a big deal in the enterprise, object persistence, and debugging of complex data structures. And these people are my people because this is their closing sentence. When data is easy to view and understand, programming becomes a simpler task.
[25:25]Yeah. Well, I would certainly think so. Yeah. So that last sentence just spoke to me so much, I figured I would quote them. I thought they were worthy of a speck being quoted in these show notes. And in the very early days, it was humorously called yet another markup language. And it isn't a markup language. So then they decided to officially name themselves the YAML ain't markup. So, yeah, when you first introduced it, you said yet another markup language. The show notes say, yes, another markup language. So has it had three names? Oops. Or should it be yet? That should be a T. Okay. That should be yet. Okay. I got it. I thought I'd spotted most typos. Okay, so terminology then. So to make YAML flexible, it was actually designed from day one to be embeddable.
[26:11]So YAML expects to be a passenger in something else. It could be a stream of network data. It could be a file with other stuff. So YAML is perfectly happy to be an embedded piece, a bit like a mermaid diagram. It expects to be found embedded inside a markdown file or something. thing. YAML is like, well, you might have me on my own, but I'm probably going to be embedded in something. Okay. And it also, it doesn't feel there's a need for a one-to-one. So you could embed 20 YAML documents in one file. It's perfectly happy to have 20 chunks of data in one file. Hmm. Okay. It doesn't, you know, it's perfectly happy with that. Or particularly, imagine you have a stream of log data, right? It's just a stream of logs. Log entry one, that's one piece of YAML. Log entry 2, another piece of YAML. Log entry 3. So it's perfectly happy to be chunked like that. And so it calls some information, right? Just a unit of information. It calls that a document. So that's sort of its biggest picture of itself. A document is one or more pieces of information in YAML. And you can embed the YAML documents anywhere you like. So a document contains data. Well, data comes in two flavors in YAML's mind. Single value things, which YAML calls scalars. So examples would be a boolean, numbers, strings, they're all scalars.
[27:31]
Types of Collections in YAML
[27:31]A collection is the other thing you can have, which is very well named. It is multiple values. A compound structure of some sort is a collection. And in YAML, there are just two types of collection, A sequence and mappings. And a sequence is the most generic word for a list, an array, a linked list, a vector. They're all in all different programming languages. In Java, what we think of as an array is actually called a vector in a lot of cases. Lists, linked lists, it's basically a sequence of data in a row. So they couldn't call it an array and make our life easier. No, because I didn't want to pick any particular programming language's favorite term. They were trying to be as clear as possible. So they added one to the lingo. Thanks. Appreciate it.
[28:21]I kind of like mapping, though, because what better way to describe key-value pair in one word than a mapping? Not bad, yeah. A name mapped to, yeah, a name mapped to values. So dictionaries, hash tables, lookup tables, key-value pairs, that is what a mapping is. Yeah. So a document contains data, which is either single pieces, scalars, or multiple pieces together, collections, which can be sequences or mappings. And that is the sum total of the jargon. That is all the jargon there is in YAML. Good. Which isn't too bad.
[28:52]A very small note on YAML versions. The first version dates back to 2005. 5, but the version we are using now is 1.2, which dates to 2009 with a little update in 2021. We are now at YAML 1.2.2, and it is the 1.2.2 spec linked in the show notes that I have used for all of these show notes. So that's kind of a way of saying that it's such a simple little thing that it's not like it's going to need a bunch of enhancements. It's not a giant complex programming language it has to develop you know for AI to show up or something like that it's just here's a way to have your data structured simply yeah and ironically the only difference I have found between stuff I found on the internet and the 1.2.2 spec is that they have simplified how Booleans work they used to have more options for Booleans and then they changed their mind we'll talk about that in a few scrolls down but the only thing I found is I'm making the language simpler, not more complicated. Between 1.1 and 1.2. I'm not going to complain. Simpler seems good to me. Now, we now come to a very interesting problem. YAML's job is to visualize data, but you don't know YAML yet. So how do I show you what YAML means...
[30:11]In some other language. Well, we've just spent so long doing JSON, so problem solved. I am going to translate, using YQ, all of the YAML snippets into JSON. So I'm going to show you, this is the YAML syntax, and this is what it means. And this is what it means is going to be the self-same data in JSON. Okay. Okay. So it's like, we know French, and you're going to teach us German by saying, here's the German, here's the French that you already know. Okay. That's what they are. Okay. Yeah, exactly. So I figured how else can we visualize this, right? And this gives me a nice thing to say that, so next installment, we're going to dive nice and deep into YQ. It's an open source project. If you're on a Mac, brew, install YQ, and then you can play along. Now, I've done all the translations for this installment, so you don't actually need to do anything to play along, but you'll find that on GitHub and it's on Homebrew. And if you're curious, I did a very fun thing I love to do. There's a terminal command of the Mac called pbpaste, and its friend is pbcopy. pb is pasteboard. So what I did was I went to the show notes, I typed out my YAML, I went control A, control C, and I went to my terminal and I typed pbpaste pipe yq-oj or minus output JSON, and then it barfed out the JSON I wanted. And then actually I could have piped that to pbcopy and then I would have had it straight back to my clipboard. So you couldn't have hit command V on the command line?
[31:35]But then, but no, but then you have a whole bunch of barf, right? It's on multiple lines and stuff, right? Whereas if you do pbpaste pipe, it never shows up on the screen. It actually gets shoved into the app as if you were piping it from a file. Ah, I like that, okay. It basically treats your clipboard as a file, so pbpaste is amazing. Okay, so pbpaste... So yq minus oj... Hang on. Pbpaste pipe yq equals minus o equals j. You keep saying minus oj. So that says yq this, and I want the output, and by j it means json? Yeah. It does. You can also type minus O Jason, but I'm lazy. Minus O equals Jason.
[32:13]Equals Jason. Okay. And then the other one is if you want it in really shrunk down Jason, minus capital I equals zero means indent with zero, which is the equivalent of in JQ, we had a minus C for compact. Oh, okay. So minus capital I equals zero does the same thing. It gives it to us nice and short. Okay. And you may get, Google may lead you astray. Because I heard about YQ on a security podcast and it described this amazing tool. And I went to Googling and I landed on a GitHub page for a project called YQ and it didn't match what I'd heard on the podcast. And I was very confused for about five minutes until I went back to Google and searched a bit more and realized there are two of them. One of them is for people who like Python and all it does is it takes some YAML, it runs it through Python's converter to JSON and then it actually sends that to the genuine JQ command. So it's a wrapper around JQ that uses Python's translation from YAML to JSON. If you have Python installed, why not? Why didn't they call it PQ, though? They should have called it PQ. Where'd they get the Y for?
[33:19]YAML? By the way, this is a perfect example of where I'm actually finding ChatGPT to be really useful. Because when you Google, you just get whatever you get. But in ChatGPT, you can say, I'm looking for the YQ that lets you do work on YAML files and JQ or and JSON. Yeah. And it would give you the right one, or it would be wrong. But you'd make one up to be a third one. Yeah. But in general, if you wrap it around, like I say things like, OK, I'm in macOS Sonoma, meaning don't give me answers that worked back in Mavericks. How am I going to do this? And if there is an answer, it will. If there isn't, it'll give you the one from Mavericks.
[34:04]Yeah, I think, you know, the way Googling is a skill, I think prompting is a skill of the future. But you can just be so much more human friendly when you write it. You know, you can say, what was that thing that Bart told me about? Not quite that yet, but it will be soon. Well, if Apple intelligence is what it promises, that's exactly what it'll be like. Because it's supposed to tune in to us. Anyway, so with all of that done, let us finish up by being a little bit practical. Let's start with the basics of the syntax. So remember, we have documents that contain our data. So the very, very first thing that YAML has is what it calls a structure, which is, it's start me some YAML and end me some YAML. And so you start YAML with three dashes, and you end the YAML with three dots, unaligned by themselves. So if you run something through a YAML parser, you say, dear YAML parser, there is some YAML in this file. It will go, where's the three dashes? Now I'm going to start listening. I'm going to keep listening until I see the three dots or until I see three more dashes, in which case I know it's a second YAML document and I'm perfectly happy to start over and do a second document for you. Okay, your YAML front matter on these show notes has three dashes, some text and then three dashes. And there's never another one.
[35:25]So three dashes doesn't mean to start another one. It does, but it's very subtle. Why? Because what's actually happening is that a string on multiple lines with no markup whatsoever is a YAML document describing a string that.
[35:44]With no markup or anything. So do you have three dashes, three dots at the end of the whole document then? No. No, you can leave them off. If it runs out of documents, it just stops because you've run out of documents. So actually, the markdown file, from YAML's point of view, is two YAML documents, a dictionary and a giant string. But why not just do it with three dots and end the YAML with three dots so it's clear that this is a start and end of a YAML document? Because under the hood what obsidian and so forth are doing is they're saying if the document starts with three dashes run the whole thing through yaml and give me the two yaml pieces of data, the second yaml piece of data i will then run through a markdown processor and the first yaml piece of data is the metadata so it actually wants two yaml documents because that's it's treating the file as yaml it's a it's a strange subtlety that made my head explode briefly and And then I was like, oh, yeah, no, it's saying start a YAML document, start another YAML document. But it bothers me that you never end it. Why not put the three dots at the end? Well, you could and it wouldn't break anything, but it's basically considered wasted effort. Huh. So the spec says you don't have to end a document. It does. Yes. OK. Basically, if the file ends, so does the document. OK. All right. Interesting. So it's a little strange. But the three dashes you're going to see all over the place. So, next up, comments, because we can.
[37:13]Now, a lot of programming languages give us choices for comments, right? You can have two slashes, you can have an octothorpe slash pound sign, whatever we call it, or you can have slash star star slash. YAML simplifies all of this completely. There is one type of comment, the good old shebang or octothorpe or pound sign or hash symbol or whatever we call it. It's not called a shebang.
[37:37]Shebang has the exclamation point. Yeah, you're no, you're right. You're dead right. Just the hashtag. Octothorpe is correct. Octothorpe is correct. Hash symbol is correct. Pound sign is correct. Right. Okay. Mayada Woods. Maybe that's all the names it has. All of those names. The ordinal symbol I believe it's also called. Anyway, basically a hash symbol and then everything after the hash symbol is a comment. It can be at the start of the line. It can be near the end of the line. It's a comment in the style we're used to from Bash. I love it already. I never want to see Jason again.
[38:08]Just the comments it's amazing um indentation is not optional yaml uses indentation to represent that things belong together in the language of the spec the scope are these things part of the same dictionary then they're indented the same amount okay if you want to nest them you just keep indenting more and you might think oh my god that could get confusing you know spaces look like like tabs or whatever, how can I tell if it's indented the same amount? The answer is tabs are not allowed. You will indent with spaces. Okay. Problem solved. Right? If it lines up and it's all spaces because it has to be because you're not allowed to use tabs, then hey, Presto. So if you put a tab in, does it barf on you? The YAML parser will barf on you, yeah. And if you're using an IDE with YAML, it will just replace it with Spaces. So if you're using VS Code and you hit the tab key, what you actually get is Spaces. Okay, right, right, right. Which is fine.
[39:07]In terms then of our scalers, so our single value things, right? The simplest single value item is nothingness. The concept of null is a scalar value. It is the value to represent no value. And in YAML, you can write it in a few different ways. So you can write null with the letters lowercase n-u-double-l, the letters uppercase n, lowercase u-double-l, or all caps null.
[39:43]Or the character tilde by itself. If you want to have something in your little dictionary or whatever, so you have colon and then something you can put tilde or absolutely nothing if you just have name of key colon name of next key colon on the next line yaml will go okay that's null so literally nothing like absolute nothingness is interpreted as null okay which is harder to say than to you know when you look at it it's like yeah there's nothing there nothing means nothing thing okay so the next simplest is the boolean which has two possible values true and false so you can have true capital true all caps true they are the three valid values in yaml 1.2 in yaml 1.1 you could also have yes and on but you can't anymore can't anymore they are not they have been deprecated they are not in the spec anymore okay and it's probably for the best, doesn't that mean a bunch of old code is going to break, well the YAML parser will fall back but you shouldn't write it in any new YAML okay.
[40:58]Again, false, the same variance of upper and lower case, and no and off, if you're going back to YAML 1.1.
[41:10]
YAML Versions
[41:07]When we get to numbers then, it starts off pretty straightforward. If you would like an integer, type yourself some digits, 1234 or minus 1234. Nothing weird there. Perhaps slightly weird, octal numbers, which a lot of languages need for various things, you start them with a zero character. That's not an O, that's a zero. So 042 is octal 42, which is 34.
[41:38]So in YAML, and it's not just YAML, it's the same in C and a bunch of other languages. And JavaScript, I think may have this subtlety too, but I may not have mentioned it to you. 042 and 42 are not the same thing. 0x is how you do hexadecimal. So you're used to that, I think from website HTML colors and stuff, You might see like 0x, whatever. So that's x decimals. So 0x4f is 79. Decibel numbers, nothing exciting there. 123.45 minus 123.45. But you can do exponentials if you're a science-y person. So you can say 123.e plus 5, which is 123 by 10 to the power of 5. Or you can even do e minus 5 and have it by 10 to the power of minus 5. But that's kind of you. Yeah. Cool. Right. And you have infinity is dot inf. And if you're a mathematical and you can explain it to me, you can also have negative infinity, which is minus dot inf. We can't use it if we can't explain it to Bart, though, is what he just told us. Well, you know, if you can explain it to yourself, you can use it in your code. But unless you can explain it to me, don't use it in code I have to deal with. And then our friend, not a number, the numeric value to represent the fact that you can't represent the value numerically, which we know in JavaScript as nan, Nan is here as dot nan in all caps. That is it. They are our scalers.
[43:02]Then we come to strings. And I'm going to put a little pin in here and say, YAML absolutely supports multi-line strings. But it does so with a lot of nuance and power. Therefore, about half of next installment is multi-line strings. It's not difficult, but it's important. So I want to give it space. So for today, we're going to deal with strings that fit on one line. And when we do so, the first thing is that string quoting, oh, sorry, actually the very first thing, in YAML, every string is Unicode. That is part of the YAML spec. There is none of this Latin 9, none of this Latin 1. If it is in YAML, it is in Unicode. Therefore, accented characters, no problem. Emoji, no problem. If you can type it, it can go into your YAML file. They're all Unicode. Yay.
[43:59]
Unicode Strings in YAML
[43:59]And quoting is entirely optional. So if not putting quotes around it would not cause any confusion, then you don't need to put quotes around it. So if you're writing a dictionary where the colon means this is the separator between the key and the value, and your string has colons, well, then that is ambiguous in that context. So then you would have to quote it so as not to be ambiguous. us. But if your string is, you know, hello world with a space and an exclamation point, no problem. You don't have to quote it. A lot of YAML written by code will quote it so that the code doesn't have to worry about checking for ambiguity. The code would just always quote it because that's easier to write in code. But when you're writing as a human and you're not using symbols, type away. Don't worry about it. You'll be fine. Just typey, typey, typey. It'll all be be good.
[44:53]When you do need to quote, you have the choice of single and double quotes. In JavaScript, that choice was aesthetic, right? You basically picked whichever ones you liked the look of. In YAML, they mean different things. A single quote is like a really strict quote. There is no escaping in a single quote. If you type single quote hello slash n world, that literally means H-E-L-L-O backslash N-W-O-R-L-D. Okay. It is not the new line character, it is backslash N. Double quotes, you have to escape every special character, and backslash N is new line character.
[45:36]
Quoting in YAML
[45:37]So if you need to add a new line, you would do it in double quotes? Double, yeah. Okay. All right, that makes sense. Whether I'll keep track of it or not, that's a whole different Oprah, but...
[45:51]Next up, we have our friends, the collection. So that was our single value. So now we have our collection. So let us start with sequences. A sequence is basically a markdown bulleted list. So you define an array in what looks for all the world like a markdown bulleted list. Minus space first value, next line, minus space second value, next line, minus space third value, minus space fourth value. Right, we run that through our yq thingy and we get out the array true 42 blah blah blah, right. It is, it is that simple, it just looks like a bulleted list.
[46:28]Mappings or dictionaries just look like key value pairs, name colon value, next line, name, colon, value. Next line, name, colon, value. That's it, right? They are the core syntaxes for our two structures. They sure look simple. And to nest them, they are. Nest them? To nest them, you just indent. So now we have the actual complete config file for this website you are reading right now. This is it. I haven't in any way doctored. I just went to control A, control V, command A, actually. I was on a Mac. Um, I just covered and pasted it straight in and this is it all. So I'm going to draw your attention to a few little bits and bobs in this big file. So the first thing is the entire config file is what we would call a dictionary when we were using JSON, right? It's, uh, it is, the whole thing is in YAML speak, a mapping, name value pairs, first name, remote theme, the value, the path to my theme. Plugins then is the next key in our key value pairs. Its value is an array. So we have two spaces to say whatever this is, I am inside plugins.
[47:47]Minus, ah, I'm an array. And then it's an array of one value. So there is one plugin installed on this website, which is the plugin to allow Jekyll to use a theme that is not on this website. So jekyll-reboot-theme is how Jekyll is loading from a different Git repo, which is in the Bartifusier organization. Hang on, hang on. That doesn't matter. Hang on, we've got a mapping. Plugins colon, and I would expect there to be a value to go with that key value pair, but instead of there being a value there's a an array which in this case is called a hang on, sequence i was trying to remember it myself okay starting over i gotta be able to say it get it in my head once okay so it says plugins colon so that's going to be a mapping, and i've got a key the value in this case has a i've got a minus there so that's an array which which we're calling a sequence.
[48:42]
Nested Arrays and Dictionaries in YAML
[48:42]But there's no... I'm not used to seeing... Yeah, I guess that makes sense. I guess that makes sense. It just happens to be weird because it's only got one value in the array. That's why it looks funny. Yeah. You're right. It is a bit weird because there's only one plugin. If I was more plugin happy, that would be more obvious because the next line would be indented the same to say I'm still part of plugins. Minus sign, next value. Okay. So the indentation says we're part of plugins. What else do I want to draw your attention to? So we have dictionaries deep down here, right? Or mappings, right? So if you look at the snippet down near the bottom, this is community, right? Community is a key and its value is another mapping. And that second mapping has one, two, three, four keys, URL, description, icon, and labels. And labels is another mapping, which has one key, button. So that's three deep. So if you look at that in JSON format, you have OpenCurly's community, OpenCurly's URL description icon label, OpenCurly button, close curly, close curly, close curly. Okay.
[49:51]But instead of it being wrapped with curlies, just space it in. Indent, indent, indent. Right. Now, the one that confused me was you've got a section that says nav underscore items. And then it starts to scare me because it's all lined up nice and pretty, but I don't, I don't, I can't intuit it looking at it, what it means. Okay. So nav items is obviously the name and then its value is going to be whatever is indented under it. So when we look under it, what's indented under it? Minus sign, space, and then more key value pairs. So the value of nav items is an array of mappings. Okay, so it says minus URL colon, it gives a URL. Then below that, lined up with the word URL without a minus is icon. So is that an array with three values in it? URL, icon, and text, those three key value pairs?
[50:47]So so those three value carriers together are a mapping so the array it's an array of mappings it's an array of dictionaries in json speed didn't get a yes or no out of that so i don't know i just described it so it's it says minus url colon icon colon text colon so those are those are three mappings that are in an array so it's nav items has an array well actually it has two arrays in it, but one of the arrays is this URL icon text set of mappings. No? Okay, you're saying that slightly strangely. It doesn't have two arrays. It has an array with two elements. Yes.
[51:27]NavItems, the value of NavItems is an array with two elements. The first element starts at the first minus. The second element starts at the second minus. Oh, yeah. No, I didn't say it weirdly. I said it wrong. I thought those were two separate arrays. Those are two values. Oof. Yeah, this one doesn't stick in. I can't see it. Okay, so scroll down to the JSON. So we're starting a curly that says NavItems colon open square brackets. Start an array. The first element in the array is open curly bracket, start a dictionary. URL icon text, close the first dictionary, comma, second element in the array, open a fresh curly, URL icon text, close the curly. So it's an array with two dictionaries, all of which are the value for nav items. Okay, so I've lost back at the beginning of the episode where you explained, does the minus mean here comes an array? Or does minus mean here comes a dictionary? Item. An array item. Minus means here comes an array item, right? So the syntax is like a bulleted list, so it's a bullet. So a bullet isn't a new bulleted list, it's an item on the bulleted list. So minus true, minus 42, minus three, minus hello world, that's an array with four things.
[52:47]I think what's messing me up is normally if you're going to do an array, you put a square bracket and then you close the square bracket on the other end. You don't have two square brackets starting the two elements in one array. But that's just different. It's just different. It'll take me a little bit to get used to it.
[53:05]Yeah, I think thinking of it as a bulleted list is kind of easy, right? So a bulleted list, every item in a bulleted list starts with a bullet. But the one thing, right? If I give you a bulleted list, that is the syntax. An array is a bulleted list. Okay. Okay. What else do I want to draw your attention to?
[53:30]
Teaser for Next Episode
[53:31]Top level, we've done that. Now, I'm going to put a pin in something I'm going to solve for you next time. So an array of arrays in YAML is clunky, if you do it this way. So if we want to have an array that contains two elements, elements. A number, and then the second element is a whole other array that contains two numbers. So if we were to put that in JSON, it's square bracket, one comma, square bracket, two comma three, close square bracket, close square bracket. Nice and simple in JSON. In plain old YAML, that is minus space one. Start an array with one item in it. The second item in the array is a whole other array. Minus. Start a new line, space space minus two, space, space, minus three. That's not nice. It's logical, it follows the spec, but it's not nice. So there must be a better way. And that's your teaser for next time. Ah, come on. I actually, I don't hate that as much as I hate the earlier example, but it'll be interesting to see what you're going to do to us next. This is fun. Yeah.
[54:49]Excellent. Right. Well, next time we're going to do our final pieces of YAML syntax and we're going to play with YQ and then we're finished with YAML. Two. Two. I don't know. Who's got a dollar bet he'd learn something else new in the meantime and he just changes the number for the other one? I'll call it an A and a B. Yeah, he will. He's tricked us before. We're not new here. All right, Bert. I won't promise it won't happen, but I don't think it will. Anyway, right, folks, until next time, lots and lots of happy computing. If you learn as much from Bart each week as I do, I'd like you to go over to let's-talk.ie and press one of the buttons over there to help support him. He does 98% of the work here. I'm just the stooge that listens to him and asks the dumb questions. If you go over to let's-talk.ie, you can support him on Patreon. You can donate via PayPal or you can use one of his referral links. I really hope you'll go over and help him out. In the meantime, you can contact me at Podfeet or check out all of the shows we do over there over at podfeet.com. Thanks for listening.
[56:00]Music.

Error: Could not load transcript. Please try again later.

Reload

Loading Transcript...