Full Stack Journey 079: Infrastructure Management With GitOps & Flux With Frank Wiles

This Full Stack Journey podcast episode features host Scott Lowe and guest Frank Wiles of REVSYS discussing GitOps and Flux. They explain how GitOps involves making changes to infrastructure by committing code to a Git repository, which is then automatically implemented. They also discuss their preferences for programming languages and operating systems, with Python and Linux being their top choices. The speakers delve into the role of Flux in the GitOps process, explaining how it automates the process of making changes to Kubernetes state based on Git repository instructions. They also provide practical tips for using Flux and GitOps effectively.

Links

Frank Wiles on LinkedIn

Flux Project

REVSYS

Transcription

Transcriptions are provided via an automated service, so they aren’t perfect. You’ve been warned.

Scott Lowe (00:00:01) – Good morning, good afternoon, good evening, everyone. Welcome back to the Fullstack Journey podcast, where we talk about the ongoing evolution of the IT professional and the technologies and trends and techniques that are influencing our careers as it pros. Thank you so much for listening. I’m your host, Scott Lowe, and my goal today, as usual, is to help equip and prepare you, my listeners, for your journey of learning across the full stack of technologies that we find ourselves working with in our data centers and cloud environments. So, joining me today, I have Frank Wiles. Frank is gonna be talking with me about, um, GitOps and Flux. So Frank, good morning, and thanks for joining.

Frank Wiles (00:00:41) – Morning. Thanks for having me.

Scott Lowe (00:00:43) – Uh, why don’t you take a moment and introduce yourself to the audience?

Frank Wiles (00:00:49) – Sure. So I’m a an extremely full stack developer. I, uh, started off my career in ops and then jumped back and forth between ops and development as I got bored with one role or the other. So I’ve been, uh, deeply involved in open source software since about 1996. And yeah, so we quickly moved all of our customers. Uh, so I own a, uh, an agency, uh, named Revus that specializes in Python and jengo, uh, web applications, uh, typically at a, at a large scale and with a, with a fairly large performance need. And so we quickly started moving our customers to Kubernetes, maybe even a little earlier than we should have, based on how, uh, how well it was working at the time. But, uh, we recently in the last, uh, year or so, have moved over and almost entirely to get Ops and Flux and, and have seen a lot of great benefits from it.

Scott Lowe (00:01:45) – Awesome. Cool. Well, I’m excited for the show. Um, we’ve had guests on to talk about related projects, uh, that are similar to Flex, but this is our first time actually talking about Flex. Um, and so I’m, I’m looking forward to, uh, our discussion. Uh, before we get started, are there any sort of online, you know, handles you wanna share with listeners in case they’re interested in connected with? Sure.

Frank Wiles (00:02:10) – Uh, yeah. So, so the company’s website is revsys.com. That’s r e v s y s.com. Uh, my personal website is frankwiles.com, so frank w i l e s.com. And then I’m, I’m fwiles on Twitter.

Scott Lowe (00:02:25) – Awesome. Thank you, sir. All right. So, um, before we actually jump into our content, uh, I wanna play a little game that I like to do with guests and it’s, it’s like, uh, getting to know the guest. And we do that by, um, just throwing out some, you know, very simple little questions about, you know, like, what are your preferences, that kind of thing. Um, there are no right or wrong answers here, obviously, we’re just, you know, it’s just a little way to kind of get a feel for the guests, right. Um, and we don’t, we don’t do anything too terribly controversial unless you feel like the emax versus, uh, vi, uh, discussion is

Frank Wiles (00:02:59) – So, oh, so we’re gonna get into religion, huh?

Scott Lowe (00:03:02) – , uh, well, yes. Some people might feel like it’s religion , that, that’s, that, that’s a fair comment. Um, alright, so, so, um, well, I would say, what is your favorite programming language? But I’m gonna guess that’s Python.

Frank Wiles (00:03:13) – Yes, Python would definitely be my favorite programming language. Yes. Fair. But also we do, we do a lot of React and, and, uh, I’ve started learning Rust recently and I like it quite a bit as well.

Scott Lowe (00:03:24) – Okay. All right. We might have to bring you back to talk a little bit about Rust. I hear a lot of people talking about it, and I am a budding programmer myself. I spend most of my time with Type Script and Golan, so, all right. Uh, Lennox Mac or Windows?

Frank Wiles (00:03:38) – Uh, if I had to pick Lennox, um, I was Linux on the desktop for about 20 years, and just recently in the last, I don’t know, five or six years have, have moved to Mac because I kind of get the best of both worlds, right. I have a munich’s development environment, but I have, you know, I don’t have to mess with how do I get this video to play kinds of yes. Kinds of clinics, issues and sound cards. Everything just kind of works and

Scott Lowe (00:04:02) – Absolutely. I guess I should, I should change that question to be like, what’s your preferred desktop and what’s your preferred like, development slash deployment environment, right? Because I find a lot of folks that we talk to are like, yeah, I use Mac on the desktop, but you know, I deploy everything onto Lenux, right? Which is myself. Yeah. Like, I will use a Mac, uh, just for the same reasons you’re talking about, because it works. And to be honest, with the new Apple silicone chips, like, battery life is insane. Um, yeah. But when I deploy, uh, it’s, it’s always on Lennox ,

Frank Wiles (00:04:31) – Same. Same. Yeah. I can’t remember the last time I deployed something that wasn’t when it’s,

Scott Lowe (00:04:36) – I had somebody ask me a question about Windows the other day, and I was like, I haven’t touched Windows in like two decades, so I really can’t help you. I’m sorry,

Frank Wiles (00:04:43) – My, my, my last windows, uh, system was 3.11, so yeah. Okay. 90, 93. Yep,

Scott Lowe (00:04:51) – Yep, yep. Okay. All right. I have used it more recently than that, but, uh, it’s been a while, so anyway. Okay. So there, there you go. Listeners all about, uh, Frank, not really, but you know, it’s fun. Mm-hmm. . Okay. So we’re talking about Flux and we’re talking about GitOps. Um, so let’s, let’s break those two things apart. And first let’s talk about GitOps, and it’s a term that a lot of people have heard. Listeners probably have some idea of what it means, but I’d love to have you sort of provide your definition for that. And then we’ll go into talking a little bit about Flex, and then from there we can kind of dig into it.

Frank Wiles (00:05:28) – So I see, and this may not be the dictionary definition, I see GitOps as affecting the change in your infrastructure by making a commit, pushing to get and having something do the work for you. The, the benefit there is that you have the revision history of who did what, when, or at least who did, who intended to do what, when it may not have worked, but you can see and learn, um, from your colleagues. You also can have easy PR type approval processes of, this is what I think we should do, and your teammates can kind of double check you and make sure you’re not doing something crazy. Um, and that, that is, um, that is how we see GitOps, is that it’s, we’re defining what we want to have happen, um, and then there is something out there that is listening to that repository and attempting to make it happen.

Scott Lowe (00:06:28) – Gotcha. Okay. That makes sense. And we, uh, you know, for listeners, I’ll put a link in the show notes. We have talked about GitOps, um, at a high level before when we talked about, um, uh, we talked about Argo and, uh, we’ve also talked about GitOps, um, in other contexts as well. But it’s, it’s always great to get sort of a practitioner’s viewpoint, right? You’re, you’re running, uh, you know, uh, a firm, you’re providing services to the clients, you’re moving clients over, you’re so, you’re, uh, you know, actually, you know, not not working for a project that’s providing this or anything, this, but actually putting this to work, you know, rubber beats the road, you know, in, in the real life trenches. And it’s, uh, I always find it’s useful to get that context from someone who’s actually using the technology. So we’re really talking about get being a source of truth, um, for configuration information, deployment information, whatever, and then having some sort of mechanism that works off that source of truth to then make changes, right? And

Frank Wiles (00:07:26) – Absolutely.

Scott Lowe (00:07:28) – Go ahead.

Frank Wiles (00:07:28) – I think, I think one of the key pieces that, uh, ops people don’t necessarily take to heart immediately is the, uh, essentially the, the safety of it, right? You can have a protected master branch and only your two senior ops people can merge to it, but you can have your developers make changes to the yammel with then your approval, and that reduces a lot of coordination. You don’t have the, well, what version of this do we need to deploy? The developer made a pull request that changes the version from version 18 to version 21, and you don’t necessarily need to know why they’re skipping three versions, right? You just need to say, okay, well, that seems like a valid version and merge, and it happens.

Scott Lowe (00:08:18) – Yeah, that’s a, that’s a great point. I, there’s, there’s so much value in my mind, and I’m, I don’t come from a, a super in-depth or rigorous software development background, but the more that I observe the way that a well run software engineering organization operates, the more that I see there’s just so much value there for operations and, you know, quote unquote infrastructure folks, right? Um, and I’ve been talking about, you know, getting folks to like, look, just start using gi start using version control, right? Just even if you don’t do anything else, at least then you’ve got this, this is the, this is the approved version of whatever it is that I’ve stored in here, right? And then from there, you could go into all kinds of other things like infrastructure as code, and GitOps, which we’re talking about now. And then of course, Kubernetes is, you know, a whole nother sort of thing. But so much of that as well, being driven by yammel and, and definitions s code, because it is a declarative system. So, so much, so much to, to be able to pull outta that. So with that context in mind, then, where does Flux fit into that definition?

Frank Wiles (00:09:26) – So, flux is the thing that is attempting to do what you’ve told it to do that runs inside your cluster. So in, um, in early days of using Kubernetes, what we would do would be, as the last step of the CI pipeline, everything’s passed, we’ve built Docker images, everything’s happy. The very last step typically was either directly applying a Kubernetes manifest to the cluster, or doing like a helm upgrade to upgrade a helm chart in, in the, in the cluster. But that means that our CI system has to have admin level cluster credentials, which makes our CI system a point of attack, a pretty serious point of attack. The nice thing about Flux is it removes that security concern because flux runs in your cluster and just has credentials to be able to read and write from this GI repository. And that’s the only point of of synchronization there.

Frank Wiles (00:10:31) – So, worst, if somebody breaks in, gets into your GitHub organization, the worst they can do is install things into your cluster, you know, in, in a get ups way with, with a history tracking of what it is they did, or if they manage to penetrate into your cluster, the worst they can do is mess with that pile of YAML you have in your get ops repository. Uh, and that’s, that’s the only, that’s the only point of of communication there. Um, so flux is a small set of microservices that work in coordination, depending on which features you want to turn on, which features you want to use that listen for yammel instructions that look, you know, that are Kubernetes, uh, custom resource definitions, CRDs, they’re listening for those, and then taking action based on what you’ve said, install this helm chart, install this manifest, and it’s watching for changes. Um, the easiest way to set it up is to have it pull that repository like every two minutes or something like that. But that’s a little slow. Uh, for me, it has the option to set up a web hook from GitHub straight to flux so that it pulls a new version, right? As changes happen. And that kind of smooths out kind of the, the timing of things. Um, do we maybe want to go into like what the various, uh, pieces of flux are or should, uh, do you think that’s enough of a, an overview?

Scott Lowe (00:12:04) – I, I think that’s a good overview, but I, it probably would be helpful, you know, for folks that, um, may already be familiar with Kubernetes, you know mm-hmm. , Kubernetes already has this idea of a reconciliation loop where we’re taking, you know, desired state, and we’re comparing that against, um, actual state and then, and then having that loop reconcile things. And so you have all of these, all of these controllers that are responsible for doing that. Um, you know, it, it sounds to me, and correct me if I’m wrong, you know, it sounds like Flux is yet another controller because there is a custom resource definition that’s backing it. Um, uh, or multiple custom resource definitions that are backing it to define the custom resources, and then it’s gonna take action on the lifecycle of those custom resources. But the, I guess the, the interesting piece here, those custom resources are the, are the things that are providing that connectivity to, uh, a Git repository, GitHub or GitLab or something like that, and understanding, you know, the idea of receiving an incoming web hook because you got notified of, uh, you know, a PR being merch, something like that.

Scott Lowe (00:13:05) – Is that, you know, kind of reasonable description.

Frank Wiles (00:13:09) – Yeah, I mean, the way I like to think about it is it fluxes the, the operator that is making changes to Kubernetes state that an operator would normally be doing manually with co control or helm or customized. It’s doing those things for you based on the instructions you’ve given it in the GET repository. So it is a, a second reconciliation loop, if you will. It’s reconciling itself with the get repository, and then if changes need to happen, it’s applying those to Kubernetes, which is then ma doing it’s reconciliation loop.

Scott Lowe (00:13:44) – Yeah, that’s a, that’s, that’s a great explanation. Um, I, I was, I was kind of trying to stretch to describe how those two merge, but it makes sense that this is a, you know, sort of a first reconciliation loop of reconciling that, that state between the, the source of truth, the get get repository and the cluster or clusters, and then triggering changes, which then trigger the second re second reconciliation loop, which of course may be all kinds of loops from all kinds of controllers changing all kinds of things. So, okay, cool. Well,

Frank Wiles (00:14:16) – Yeah, for example, like if in flux you tell it, install this helm chart, it sees, then you want to do that and it’ll issue a helm install. But if the chart’s broken in some way, Kubernetes is going to be failing on the fact that it can’t find the pod, or it can’t find the image, or it, you don’t have permission to, to, to install in that name space or, or whatever may be happening there. But Fluxs reconciliation is done, it is done, the helm install, but you’re, you’re on your own from there, right?

Scott Lowe (00:14:44) – Yeah, yeah, yeah, sure. That makes perfect sense. It’s, it’s, again, it’s a matter of like recognizing which loop we’re talking about here, right? Right. If you’re talking about pulling the state from the Git repository into Kubernetes as the new Kubernetes desired state, that’s flex’s responsibility, right? So it’s got ownership over that connection and understanding that connection and pulling data across all kinda stuff. But once it hands that off in place of the operator using a command line utility or an API call or whatever, then it’s all Kubernetes and standard, standard Kubernetes functionality that you need to, uh, you know, observe or troubleshoot in the event that something, um, something goes wrong.

Frank Wiles (00:15:24) – Yes.

Scott Lowe (00:15:25) – In terms of the components of flux, like, uh, you know, what kind of pieces are we talking about here? And

Frank Wiles (00:15:30) – Yeah, so the, the, the main most important piece that you really don’t have flux unless you have this is called a source controller. And that is what watches get repositories. Uh, for, for state, there is typically the, the, what we tend to refer to as the flux repo, which is your get ops repo, but it also can watch other repositories for things like, for example, we keep most of our helm charts in a Git repository, um, oftentimes in the same repository as the code itself just in a directory. So we end up defining several git repositories, one of which is the flux repo that it is watching for changes on, but it is also watching those other repositories for, for example, changes to the helm charts. So you can that, so the source controller is, is dealing with all of that, and that includes things like, um, setting up read, write, deploy tokens so that you can ac so it can access those repos.

Frank Wiles (00:16:32) – Um, it can obviously do it in a read only fashion if, if, if you want, uh, for, for most operations then there is what’s called the notification controller. And this is what takes in web hooks. It also kind of coordinates between the other systems, uh, the other, uh, components a bit. It also can do things like, um, send a web hook or a Slack message or teams, and they, they support a dozen or more different sorts of telegram and things like that that can let, you know, Fox sees a new version of this, Fox is doing this. You can get those kinds of automated updates in, in, in those kinds of ways you’re expecting. Then there is the customized controller. So for, uh, users, customized is a kind of mind bendi way of, uh, patching yammel, uh, on that you can use in place of systems like Helm for templating, the mountain of Yammel that you’d end up generating in a system like Kubernetes.

Frank Wiles (00:17:37) – Um, so Helm is another kind of packaged system like that. Helm’s main purpose really is to just take Yammel templates and apply values into them. So you have kind of what you would expect in a template. You can loop over things and do variable replacements, but, um, it’s, it’s used the go template language and, and just builds big piles of YAML for you and then applies them to the Kubernetes api. Um, customize is symbol similar, except it is not a templating system. It is kind of an overlay and replacement and patching type system that is a little harder for new users to kind of get their, their hand head around. Um, but it also is supported by flux. And typically in most of our clusters, we have a combination of just bare yammel, Kubernetes manifests, customize, and helm charts all in the mix at the same time.

Frank Wiles (00:18:38) – Uh, it kind of depends on the client, depends on the project, but, um, those are the pieces. Then you have what’s called, um, the, uh, I’m gonna get the name right here. Um, the image, there’s an image automation controller and the image reflector controller. So the image reflector controller watches docker repositories or images that have new tags or tags in general. Um, and then the image automation controller receives notifications from the reflector controller and maybe takes some action because of there being a new docker image. So, to give you a real world use case on this, we typically set things up such that in our dev environments, when the new docker image hits the repository and is seen, it is automatically deployed into the dev environment. Um, now staging and production and other environments typically are pinned to a particular version based on the needs of the project, the wills of the developers.

Frank Wiles (00:19:51) – But in your development environment, you typically want whatever’s latest. And so this is, this is achieved by having the reflector controller watch a, a docker registry and the automation controller knowing that you want all new versions to, to be deployed in this particular name space. One of the nice kind of benefits of that is for your app to be deployed in your Kubernetes cluster, the, the Kubernetes cluster has to already have credentials to access that Docker registry, that private docker registry. So Flex isn’t getting any additional capabilities here. It’s the same capabilities as the rest of the cluster. It’s just watching on a schedule to see if there is, if, if there’s new images. So it, it doesn’t like add any, like, security vulnerability aspects to it. Um, and then, like I said, you can, it’s not forced on you. You have to set that up very specifically, and you can, can set it up using regular expressions. Like, I want to deploy any new image that starts with the word Bob. Uh, I want to deploy, uh, only patch versions in EM type context, uh, or I just always want to deploy the latest image, whatever that is.

Scott Lowe (00:21:06) – I, I have to admit that I had not considered the idea of having a controller like Flux a watch or, uh, a container repository. Um,

Frank Wiles (00:21:17) – It’s, it’s great because like most of the time I set things up and the developers, I never end up needing to talk to again, right? Until there’s like a real problem, not a daily, Hey, could you deploy 180 2 to the dev site? That kind of stuff. And again, then we typically set it up so that there is a specific yammel file for staging, and they decide, okay, the version that’s in Dev right now is good to go to staging. So they just copy the version number over to the staging, make a poll request. I look it over basically for does it seem logical what version they’re putting in, and is it valid yammel Yes. Merge, and then it goes into staging.

Scott Lowe (00:21:57) – Yeah, that’s a, that’s a really nice, a really nice workflow to help automate that, that process of saying, okay, as a developer, I wanna, I wanna deploy the latest build of this container. I just made some changes. I wanna just go ahead and, and deploy this. Right? So Flex picks that up, redeploys it in, in Dev, right? Um, and then they can look at it and they can manually decide through PRS what that promotion process looks like as they move from dev into staging. And then they can have, you know, obviously some user acceptance testing and validation and performance testing and regression, et cetera, et cetera. And then they’re like, oh, yeah, okay, now we’re ready to move to prd. And then it’s another pr and along the way get up, you know, flux is watching and it says, oh, I see this PR merged.

Scott Lowe (00:22:36) – Okay, now let me go take action. Um, so that’s, that’s, uh, that’s really cool. I like that. Um, so I guess question I have is, this sounds awesome and I totally get sort of how everything works, but as a user who maybe their organization perhaps is not as mature with cloud native technologies, are there things that, like what’s the most effective audience? Like, if you had to describe, you know, in your experience in working with this and working with your customers, you know, the, this is the kind of, of user who will be successful implementing this workflow based on organizational tendencies and processes and that kind of thing. And this is a user, maybe they need to, you know, make some changes, mature a little bit in their use of the technology before they’re ready to try something like this. Because, you know, so often we talk about automation and really GitOps is a form of automation, but sometimes if we’re not careful, we end up automating bad things, , you know? Right. Like, you have to have good structure in place in order to then mm-hmm. turn around and automate it. Right. And so in your experience, what, what helps, what sort of traits do you see in organizations and users that are successful with Flux?

Frank Wiles (00:23:57) – Well, I think that, uh, you, you need a, a good solid understanding of Kubernetes before you try this or something like gargo, because it is, it’s this another level of abstraction on top of it, right? And so I think that if it’s your third day with Kubernetes, don’t, don’t be attempting this yet. If it’s your sixth month with Kubernetes and it’s been going well, and you seem to understand everything, this is worth giving a try, right? And if you’ve been using committees for a long time, I kind of question why you haven’t tried this or Argo, or something similar. Um, because it does, it removes those distractions from your day as an ops person. There’s not a lot of value in you editing that yammel file to the new version twice a day as developers put out builds, right? It’s, it’s a, it’s a distraction, right?

Frank Wiles (00:24:47) – It’s you’re holding them up, you are getting out of your flow and your zone and whatever it is you’re working on, because this needs to happen real quick before the two o’clock meeting, right? Um, and, and this put gives them that power and lets you focus on higher value stuff. Um, I think that the only way you would get into trouble with this would be if you let your developers have complete access to everything to where they could take down production, for example, right? By deploying a bad version or making a, a change that doesn’t make any sense, right? Or, you know, in a lot of cases in your helm charts, you’ll have things like, you know, do we deploy the crime jobs or not sorts of flags? And they might toggle that and push it and turn off the crime jobs in production. And if you are not a part of that approval process, you won’t notice that they’ve made that mistake, right?

Frank Wiles (00:25:42) – Um, so, so I think that yeah, you, you need to, you know, at least keep the reins around production, but I think that any organization that has a decent level of testing, preferably automated, but even if it’s through a QA system, I don’t think there’s any harm in letting the developers drive what is running in staging, right? As long as everyone knows the developers are driving what’s running in staging, if it’s broken, it’s a developer problem, right? And put, put it back to the Virgin that worked yesterday so that QA can continue their work right? Until you sort this out. And they can do all that without your involvement if you want, right? Um, it does also encourage you to be able to have multiples of those kinds of environments. So I don’t, you know, it used to be a, a huge pain for ops to have multiple environments and, you know, a multi-day setup sort of process. And now it’s like, you know what, um, every now and again, broken staging gets in the way of Q a’s job. So let’s have a QA environment that QA controls the version of what’s deployed there, and then all you have to do is make sure that the person who’s putting in that PR as part of the QA team before you approve it.

Scott Lowe (00:27:01) – Yeah. And it seems like if you were to couple that with some, some good infrastructure as code capabilities, that it would be super easy, you know mm-hmm. to, to create those sort of environments. Um, my wheels are turning now in terms of like, uh, you know, how we’d automatically hook up flux to a dynamically provisioned environment, but that’s a, that’s a different thing anyway. Um, okay, so, uh, user out there is listening to the show and they’re like, you know, Frank’s right? I do need to focus on higher level value, right? I, I should just automate this. How do they get started?

Frank Wiles (00:27:38) – So, um, luckily Fluxs, uh, there’s Flux V one, which you may find the occasional blog post about, um, flex V2 is, is about to be released, and they’ve essentially moved and changed lots of things in the last dozen or so, uh, minor versions of, of flux. So make sure you’re running the latest things. The installation and setup process has gotten extremely good in the last few versions. They have a CLI tool that you download that is version specific. Um, so you download version 42 of Flux CLI or version 43, they have a bootstrap command and you tell it, I’m wanting to bootstrap in GitHub, GitHub Enterprise, GitLab, Bitbucket, something else, a little more manual. And each of those has a few options, like, do you want to have it watch Docker registries? Do you want to do the image of automation, uh, controller? Because if you’re not going to use those, there’s no point in having them running, uh, and taking up resources.

Frank Wiles (00:28:46) – And then it uses your personal access token from GitHub to automatically create the repository if it doesn’t exist in the organization that you want. If it does exist, it, uh, creates a ReadWrite deploy key for that repository and installs it in the flux system name space in your cluster, and sets everything up for you. So you pretty much one command as long as locally you have access to your Kubernetes cluster and a GitHub token in, in the GitHub example, it automatically sets everything up for you and you are effectively set up and ready to start using it. Um, it has a pre-check step that’ll tell you like, Hey, the version of Flux you’re using doesn’t work with the version of Kubernetes that you’re talking to. Here’s what you need to do. You either need to downgrade flux two versions, or you need to upgrade your Kubernetes to at least one point 24 or whatever. And it basically tells you all that, uh, or it tells you, I I can’t access the Git repository. It really is a installer like, uh, kind of setup process, which is really great.

Scott Lowe (00:29:57) – That’s, that’s cool. I like it when, uh, projects like, like flux when they take the time to make sure that the experience for a user to begin to start to use it is as straightforward as possible. I mean, obviously there are certain levels of complexity that you cannot abstract away sometimes, and there are certain things that you can’t, you know, automate for a user, but if you at least provide good docs and a relatively intuitive experience, then most users will stick through that. Um, I did have a question pop in my head as you were talking though. You mentioned a ReadWrite deploy key. Um, what are the sort of the use cases where, uh, flux would need right capacity into a repository? Cuz it seems like intuitively I am reading state from that get repository cuz it’s source of truth to pull it across and then initiate my actions and not I’m writing except perhaps in the case of it creating a repository for you, uh, if it didn’t exist. So,

Frank Wiles (00:30:54) – So that’s a really great question. The, the right capability is only needed for the image automation controller. So consider the scenario, you have a yammel file that says version 17 of an app and it sees in the docker registry version 18 land and it wants to deploy 18. Well, some older systems would just do the 18 deploy into the cluster and leave the 17 in the yammel and just know that it’s out of date. Well, what flex will do is actually rewrite that for you to 18 and do a commit and push back to repository. So you see not only your people’s changes to what has happened, but also what flux itself has changed about your infrastructure. So that’s why it needs right access.

Scott Lowe (00:31:47) – Gotcha. Okay. That makes sense. And I do think that writing it back to the state is the right thing to do from my perspective, rather than leaving it out of sync and just kind of knowing that it’s outta sync because you want ideally, uh, you know, that state to reflect what is actually happening on the cluster. Um, cuz invariably somebody’s gonna run some command somewhere and end up, you know, reinstalling version 17 instead of version 18 because there was, you know, whatever. Right.

Frank Wiles (00:32:17) – I have done that more times than I can count. Oh wow. I installed a version from six months ago on accident because it’s been running, the automation has been running just fine for six months and I’ve accidentally not updated to as I do a manual he upgrade or something. Yeah,

Scott Lowe (00:32:32) – Right. Absolutely. Okay, cool. Um, so, um, for the most part, people unsolved, you know, new projects and technologies, things are pretty straightforward, especially if you take your time and you do your homework and you kind of, you know, make sure you’re comfortable with the underlying technologies ahead of time. But every project, every technology has sort of, its, you know, potholes. In your experience in working with customers and deploying this for customers, you know, what are some of the most common potholes that you see people, you know, stepping in into, and, you know, what, what would be your advice to help, you know, people who are trying this out, uh, avoid those?

Frank Wiles (00:33:07) – So I think one of the hardest things, uh, from our experience is using flux with a new helm chart. Either that being a helm chart that you’ve created or one that you are not particularly familiar with, it seems to get in the way. So what ends up happening is flux attempts to do the install or the upgrade, but if the helm process fails, it retries and it keeps retrying and eventually it gives up, but then it sort of gets wedged and you have to manually go delete flu’s knowledge of this helm chart and put it back in. And that’s just fairly cumbersome to do. So my advice is like, if you’re playing with a new chart, you’re playing with some new tech to do that manually until it’s working and you’re comfortable with it, and then do it again into the flux and have it manage it from there.

Frank Wiles (00:34:03) – Um, that’s probably, in my opinion, my biggest frustration will be when I’m like, this should work first try and then it doesn’t and I’m like, well, it’ll work second try. And so I keep deleting it out of flux and I realize that I’m spending like half my morning deleting and redoing and I should just stop with the flux bit for until I get it working and then just do it once at the end. Um, so that’s, that’s probably the, the biggest issue. It’s a little, it can be, it also can be a little opaque in finding what’s going wrong, right? Like it’ll tell you that the, the helm release has failed but doesn’t necessarily bubble up the best error message to lead you to, oh, you have a yamo error on this line of your values file or something. So that sort of stuff is just better.

Frank Wiles (00:34:48) – You get better error messaging, better control and, and, and kind of visibility to just use home directly for that, for that moment until it’s happy and then you can let Flux take it over from there. Um, that I think is the, by far the biggest pothole, uh, that you’ll run into. The other one is very small, but it took me more than a day to figure out. So I definitely want to mention, um, when you’re doing the image automation, the way they accomplish that, because they don’t know where you’re gonna need to plug in either the full docker name, you know, the, the repository and tag or just the tag, they don’t know where you need to plug that in to your YA animal. And so the way they accomplish that is by using a specifically formatted comment on the line that says, replace this value either with the full Docker name and tag, uh, or just tag so that you can plug it into home charts or your customized or whatever.

Frank Wiles (00:35:51) – Um, the format of that is, you know, a pound sign space, a little bit of like one little line of JSON that says, this is the image policy I want to use and I want to, I want the tag, or I want the full, uh, full docker image. It turns out that that space between the pound sign and the JSON is particularly important. Uh, I did not put the space, didn’t think it would matter, and spent an entire day trying to figure out why folks was not seeing my automation and then I added the space and it works just fine. So I I, I actually, I meant to file that as an issue with them, uh, because it should be a very easy thing to solve and not let somebody trip over, but, um, save yourself the day that I had and and be sure to include the space .

Scott Lowe (00:36:45) – Got it. All right. That’s, see that’s the kind of stuff, real world practical information I love for guests to share on the podcast, so thanks for mentioning that. Um, and, uh, to your comment about, you know, like trying something new with Flex, I think to me that makes a lot of sense because it goes back to the comment earlier about automating things like you wanna make sure whatever it is you’re automating is, is working, otherwise you’re just automating a broken process and you know, like it does, automation is not some magical, you know, pixie dust that’s gonna fix everything for you. Like, you gotta make sure stuff works manually and then you can go and you can add some automation to it, right? Um, so it definitely makes a lot of sense to say, Hey, you know, you’re, you’re, you just wrote this helm chart or you’re trying it out for the first time or whatever. Deploy it somewhere, you know, manually make sure you understand how all the values interact and how everything gets deployed and then you can bring it into your ecosystem where, where Flex is controlling it. So that’s good advice. All right, well we’re about to wrap up here. Um, Frank, any sort of, you know, last minute or final thoughts you wanna share with listeners?

Frank Wiles (00:37:47) – Um, so one thing that isn’t immediately clear sometimes to people is you can run multiple instances of flux in your system. So if you have a large cluster with maybe three or four different development teams, you can break those into three or four different flux, get ops repositories by team. So maybe there’s a team of senior developers that you trust a little more and you can let them maybe merge directly to master, or it’s an internal project. You can have those separated by multiple repositories very easily. Conversely, you can have one get ops repository that is used by multiple flux installations. So in our case, a lot of times we will have the non-prod cluster and the prod cluster working in the same GitHub repository with just like a dev directory and a prod directory with all the configuration underneath there. Uh, because we don’t need that separation on our teams, right?

Frank Wiles (00:38:50) – So you can do it kind of however you want to slice and dice your get repositories. Um, you can also use branches. So one of the techniques one of my ops people has started using is all this stuff that’s not developer focused is in an ops branch and everything that’s developer focused is in the main branch. So they don’t even see the clutter of all the other tools and stuff. Not that they couldn’t switch over to the branch, they just don’t, they, they don’t get confused by, there’s all these directories of things they’ve never heard of, like Engine X ingress, controller and Prometheus, and they don’t get lost in those things. There’s their app, they’re of their app, and that’s it.

Scott Lowe (00:39:28) – Yeah, yeah. That’s a, that’s a, that’s a neat technique. Um, okay, cool. Very good. Well thank you for that. Uh, as we wrap up then, Frank, you wanna just remind listeners where they can find you online?

Frank Wiles (00:39:40) – Yeah, so you can find my company revsys revsys.com and you can find me on Twitter as fwiles or on my personal website, frankwiles.com.

Scott Lowe (00:39:51) – Awesome. Thank you so much, Frank. I appreciate the discussion. This was very, very helpful and I think the listeners who may be thinking about, um, deploying a GitOps solution are gonna find a lot of value in, uh, in what you’ve been able to share. So thank you for that. And that’s it for, yeah, absolutely. And that’s it for this episode. Listeners, I wanna thank you again for joining me for another episode of the fullstack Journey podcast. Um, as always, I invite you to share your feedback on this episode or any episode of the podcast, so don’t hesitate to reach out to meet, uh, you can find me on Twitter, uh, at Scott underscore Lowe. You can also reach the podcast is at fsj podcast on Twitter. Um, either way is fine. Um, it’s me behind the scene either way, so it doesn’t really matter. Uh, and, uh, so, uh, again, welcome your feedback. Would love to hear from you, and I would love to hear ideas for new shows or maybe ways that we can improve the show. This has been the Full-Stack Journey podcast where too much learning is never enough.

Explore our podcasts

Day Two Cloud Heavy Networking Heavy Strategy Heavy Wireless IPv6 Buzz Kubernetes Unpacked Network Automation Nerds Network Break Packet Protector Tech Bytes

Full Stack Journey > Ep. 79 | June 13, 2023

Full Stack Journey 079: Infrastructure Management With GitOps & Flux With Frank Wiles

Links

Transcription

Explore our podcasts

Have feedback for the hosts?

Grab a Packet Capture!

Leave a Comment Cancel reply

Full Stack Journey > Ep. 79 | June 13, 2023

Full Stack Journey 079: Infrastructure Management With GitOps & Flux With Frank Wiles

Links

Transcription

Share this episode

Explore our podcasts

Have feedback for the hosts?

Grab a Packet Capture!

Leave a Comment Cancel reply