Cloud Native Testing Podcast

Modernizing CI/CD: Integrating Automated Testing with Cargo, Tilt, and Playwright

Testkube Season 1 Episode 15

On this episode of the Cloud Native Testing podcast, Ole Lensmar welcomes Matt McLane from Doc Network to discuss how they completely transformed their development and testing lifecycle. Matt details their journey away from a slow, manual deployment process—which involved risky ad-hoc testing on a staging environment connected to production—to a modern, automated workflow built for quality and speed. By implementing tools like GitHub Actions, Argo CD, and Cargo, they built a pipeline where testing is a core component, not an afterthought.

The conversation dives deep into their cloud-native testing strategy. Matt explains how integrating Playwright for end-to-end tests created automated quality gates, giving them confidence with every deployment. He also shares their approach to non-functional testing with k6 to ensure their application can handle high-traffic events. A key highlight is their use of Tilt to spin up ephemeral environments for every pull request, enabling developers and QA to test features in isolation early in the process. This episode is a practical guide for anyone looking to build a robust testing culture supported by a modern CI/CD pipeline.

Ole Lensmar:

Hello and welcome to another hopefully fabulous episode of the Cloud Native Testing podcast. I'm your host, Ole Lensmar, CTO at TestCube, and I am super delighted to be joined by Matt McLane at Doc Network. Matt, how are you? Great to have you. Please tell us about yourself. What is Doc Network? Not everyone will know. I know you've been working on a very special project that I'm eager to talk about.

Matt McLane:

I'm good today. My name is Matt McLane. I'm the DevOps Engineer Lead for Doc Network. We're a small company with just under 50 people and a dev team of around 16. In that dev team, I'm the one and only DevOps guy. I handle the cloud infrastructure, networking, observability, and all the tools that provide that. More importantly, I make it all accessible to the developers so they can do their jobs more efficiently. Doc Network is a small company, but we do a lot of great stuff. We provide registration, health forms, and that kind of stuff for schools and camps. We help kids go to camp safely by providing a platform that these organizations can use to get their kids there and to make sure that they're safe while they're away from home. That's what we do.

Ole Lensmar:

Great, thanks for that. It's super interesting that you're the only DevOps engineer, and that spurs some questions. Just before we started, you shared that last year you were rebuilding your CI/CD pipelines. That sounds like something most people will want to do sooner or later. Tell us about that.

Matt McLane:

Let me start with where we were. We had a Jenkins-based pipeline. Jenkins would build the artifact, but the rest of it, the CD portion of the pipeline, was a very manual process. Developers would get into a room two or three times a week, and they would put that new container in a dev environment and do some testing. But the dev environment wasn't really a dev environment; it was more of a staging environment because it was connected to the production database. They were putting the code on, testing it to make sure that it met their acceptance criteria, and then off the top of their head, they would work through whatever regression tests they could think of. Then they would manually repeat the build and deploy that to the production environment.

Ole Lensmar:

So CD was basically manual, so it was MD.

Matt McLane:

Exactly. The CI portion was largely automated, but even there, there were pieces of it that were not necessarily automated. If they had to do configuration changes or database updates, that was all done by hand.

Ole Lensmar:

What's the scope or the complexity of the application?

Matt McLane:

We got away with it for a long time because we do have a fairly simple application. It's a web-based application written in JavaScript, so we're largely just building a container that gets deployed on a Kubernetes cluster. That wasn't so bad. They were just updating some values here and there, and then our tooling would do some of that rollout. But it was more around validation and having confidence that the code we're deploying was not going to mess something up. We just didn't have the infrastructure or the processes to deploy that in a reasonable way. More importantly, it took a lot of development hours, a lot of hours from the dev team. We calculated that it was taking about 30 man-hours per week just sitting in those deployment meetings because we would have four or five people sitting in the meeting because they had written their code, and then they had to review it and go through it. It was just a very labor-intensive process.

So we wanted to rebuild it. We wanted to get rid of Jenkins and move to GitHub Actions. We wanted to improve our process and improve the tooling. We use Argo CD for the management of our Kubernetes clusters, but we wanted to make that process easier. As the world worked, the guys at Acuity had just deployed this new tool called Cargo, and it really is a beautiful experience. I don't know if you've seen it, but it's really beautiful.

Ole Lensmar:

I'm familiar with it and we looked at integrating it with TestCube. So that's super interesting to hear. How are you using Cargo?

Matt McLane:

We put Cargo in the mix and we redid the whole pipeline process and the whole flow. We got rid of those meetings completely, and instead of having our developers deploy code through the four environments we have, we have our QA people do that now. Our QA people are not developers. They needed a UI that made sense and where you can see what they're doing. Cargo provides that. New containers come up, it handles an automatic deployment to dev, and then as those versions get updated, QA can evaluate when it's ready to go to our QA environment. They can manually do that by just saying "promote" and Cargo will handle that. Then once they're happy on QA, they can promote it to staging and finally production all on their own without having to touch any code or the backend systems or anything like that. Cargo does all of that, and that's been great. It's literally making our dev teams so much more productive that our bottleneck is now the QA team.

Now that being said, what we need to do is make our QA more efficient, and that's where testing has to come in. As you alluded to, we were able to make Cargo integrate with our test environment. So whenever we deploy to QA, it'll automatically run a suite of Playwright tests that we're writing right now. What that does is Playwright runs and acts as a virtual user and goes and clicks in the UI in a browser, traverses our application, and runs a suite of tests that the QA person would normally run. If it finds any issues, it takes a snapshot and collects those as artifacts and stores them so that we can go review what was wrong or what didn't work and go from there. The idea is it's going to make their job a lot quicker, so we can just process more code and get those deployments out there.

Ole Lensmar:

Do you see that you would automate the provisioning or the promotion across environments if those tests pass, or is there still a manual step involved?

Matt McLane:

That's a good question. I can see a world in which we promote from our QA environment to a staging environment if they pass, but we're always going to have a manual step in there. I don't know if we would ever get to the place where we would be able to completely eliminate all manual testing because with automated testing, there's a level of confidence you have to have, a level of trust, and they notoriously can be flaky. We don't have an engineer to write tests or to shepherd those tests at this time. That's so far out in the future that I don't know if we ever would. But also, I think you want somebody there to manage when it gets deployed to the next environment and the timing of it, so having a manual process there does do that.

Ole Lensmar:

I've seen others use Cargo or another tool to build promotion where they automate, they have defined quality gates using some tool like TestCube or whatever. That helps them automate promotions, maybe not into a production environment, but into a staging environment, having QA do more exploratory testing to really kick the tires at the edge cases and the unusual behavior. But to your point, you do have to have a reasonably complete end-to-end testing suite, like with Playwright or some other tool that you'd trust to provide a reasonable level of confidence before promoting a release across environments. I think that's what a lot of people are aiming for, but getting there... you only know what you test. So you don't know what you're not testing, and calculating coverage on UIs can be very tricky.

Matt McLane:

Where we're at right now, we started with a whole lot of unit tests and that was the level of testing we had. We're working on writing those Playwright tests to get a lot of our past coverage done. Where I'd like to see us is for us to redefine what we consider our level of "done" when it comes to a project or a feature. That level of "done" should include observability and testing. If we could have that, then it gets put into the project timeline that now we have to write the test for a new feature. We just haven't quite got there yet, but we're talking about it. We're trying to work there.

Ole Lensmar:

It takes time, obviously, because you're also building a product at the same time. Have you been looking into creating ephemeral environments as part of your pull requests? How are you doing that? Are you running tests in those, both manual and automated? I'm just curious how that ties into the process.

Matt McLane:

We are. That's one of the more exciting projects that we have going on right now. We are using a tool called Tilt. What Tilt does is it lets us define how to build and then deploy an ephemeral environment. That's how we use it. The developer can from the command line just say "tilt up" and it will build a container that isn't shipped to our standard repository; it's shipped to a repository that is just their own so that it doesn't fall into the regular CI/CD pipeline process. Then it will stand it up on our dev Kubernetes cluster and provide them with a unique URL. That's super great. A developer can just say "tilt up" on code that is not committed and be able to look at it, or more importantly, have somebody else look at it and say, "Am I in the right place or not?" without shipping it to the dev environment. Then they can easily "tilt down."

Once we have that working, we can use a GitHub Action to do the same thing. We have set this up as a proof of concept; we haven't fully integrated it into our work process yet. But the idea there is that whenever you do a PR, it's going to "tilt up" and provide an ephemeral environment for that pull request so that somebody can go look at what that looks like as part of their code review. Then once you merge, it'll "tilt down" and bring that environment down. Once we have that working, we could easily run tests against it, and that would be really great. That's in our long-term plan. We just haven't quite gotten all there yet.

Ole Lensmar:

That's super interesting. I've heard of Tilt used for local ephemeral environments for dev to spin up whatever they need to deploy the version of the code they have locally and then run tests against it. But what you're saying is you would use that in your CI pipeline to generate as part of the pull request. Then you could both manually validate and run automated tests against it. The only thing that changes is the endpoint, and I'm sure you could make your Playwright test configurable on that. You just send in, "Run these Playwright tests against this endpoint instead."

Matt McLane:

Exactly. Then we can also send people to a standing URL. We could ask our client success team, "Hey, is this the feature that you want? Is this what you're thinking of?" But it's not done developing, it's not in dev or staging. It's just in that PR stage, so we can change it before we've committed it and merged it in with everything else. I think that's going to change everything for us once we get it fully implemented.

Ole Lensmar:

That sounds super powerful. In the PR process, I've seen people using tools like VCluster, and I think Argo has some native capabilities to create temporary namespaces where it deploys things. So that's what I've seen, but I'm super interested to see Tilt in that context as well.

Matt McLane:

It's a simple YAML file that you've added to the repo and then you're off to the races. It's gonna be great.

Ole Lensmar:

Going back to testing, where my head's at, you mentioned that you have a QA team. As the single DevOps engineer, how do you interact with them and how do you divide the responsibilities for not just functional testing, but non-functional load testing or other aspects of quality? Where do those fall between you and them?

Matt McLane:

Our QA team is one person, and she started as a client success person. The reason that we have her in place is she has a very deep knowledge of our application and how it works. As they develop new features or change features, she can spot how that all lays out in the application itself, but she doesn't write code. A lot of her testing is done manually because that's what she knows and she's good at. She shepherds through the deployment.

When it comes to Playwright tests for those end-to-end tests, that's all written by the developers. When it comes to load tests, we do have a set of load tests that we run and I'm largely responsible for that. We'll stand up a one-time environment that is essentially an ephemeral environment, but it'll be around for a month or two. Then we run a series of tests against it. They're written in k6 and they do our registration process, and then we scale that up so that we have 2,200 or 4,400 people registering all at once. They're k6 tests, so we don't really use the browser, but it provides a whole lot of load all at once on our application pods. That's one of the things that we are very sensitive to. At times, our organizations call them "competitive launches," where you'll have a camp that says, "We're going to open up our camp registrations at 12 p.m. on Tuesday," and you'll have thousands of people just waiting to click the button. That kills our infrastructure if we're not ready for it. So we have tests that simulate that; that's our hardest-hitting experience.

Ole Lensmar:

How do you correlate that? You generate X amount of virtual users in an ephemeral environment, and I guess you have some capacity or resources allocated to that ephemeral environment. Then do you just multiply? "Okay, with these resources, we achieve this throughput, and we need to achieve 10 times that, so we'll just multiply the resources in production by 10," or how do you go from the numbers you see there to actually deciding on your production capacity?

Matt McLane:

Largely we do math, for sure. We've put together a dashboard where we can correlate things. We can say, "Here's the virtual users, here's our throughput at this point, this is where the database started having issues," and try to identify that kind of stuff. Ultimately, though, we try to model our load testing target environment after our production environment. Then we run it again and we try to scale it up and we do it until the load test passes cleanly and doesn't fall down. Then we have an idea, and we add a little bit when we go to production. We'll make our databases big and beefy and have extra read replicas, and then we'll run it again. It's a long and boring process as you just run the test, reset, run the test. It takes about a week, week and a half of just running tests, but it's something you have to do once a year.

Ole Lensmar:

So the process is you run the test, you look at the results, and then you tweak the configuration of your infrastructure and then you run it again. Interesting. The QA person that you had, it sounds like she was doing more like exploratory testing, like an integration tester, because she has the big picture. Is she involved with the dev teams when they build new features to help them think of ripple effects of what they're doing?

Matt McLane:

They're very integrated in how they work, and she ensures that the acceptance criteria are all met on the tickets. As something gets merged into our application repository, it builds a new version of our dev environment—it builds a new container. That container automatically gets deployed to dev. Our QA person will go in right away and say, "Are all the acceptance criteria met?" and then she starts looking at what things we are going to need to test, and she works that out with the devs. Once we've accumulated enough of these deployments, that's when she'll promote it to QA and she'll start through her script of regression tests.

Ole Lensmar:

I have to ask, I've been talking to a bunch of testers over the last months, and after some initial skepticism, people are going more in on AI for helping them with testing, translating requirements to what they should test, and generating Playwright tests. Is that something you're interested in looking into, or to what extent is AI being used across your team?

Matt McLane:

In Doc Network, AI is not heavily leaned on. It's used by some of our teams in various ways, but it's very disjointed. Our dev team doesn't use it too much right now, other than we use GitHub Copilot to help us code and to answer questions. Having said that, we've just last week started implementing what we call "guilds." It's just a way to have an ad hoc group of people in our dev team, but across the dev team, to focus on a topic. One of those guilds is the AI guild, and they're just starting to look at what recommendations we can make to bring our organization into a space where we use AI in a better way, and also what are other options and opportunities for AI. I know that our director was just last week on your webcast about how we can use AI in testing, so I know that there's at least some interest. It's just we haven't quite got there yet.

Ole Lensmar:

To your point, I think a lot of people are interested and intrigued. There's a lot of very positively biased information if you go online. But many people I've talked to, when they try it out, are often surprised by how helpful it actually is. It doesn't replace them; it's a tool like any other tool, and it's an assistant, but used the right way it can definitely help in many workflows, including testing and QA.

Matt McLane:

100%. I wouldn't trust AI on its own because it's a computer. It doesn't know that it's lying, but it does, all the time. It presents wrong information as if it were true. So you have to have somebody there to validate it. But I can absolutely see how it can be useful. We were just talking about how occasionally the AIs in our GitHub repos can provide really pretty useful information about summarizing what a pull request is. If we could add in there, "You should check this, this, and this," I could absolutely see that being beneficial.

Ole Lensmar:

I definitely agree that you can go far beyond just "generate this code for me." At a higher level, like translating requirements to test cases, translating verbal test cases to Playwright scripts, or helping you think of edge cases. You could say, "Hey, what are some edge cases that I might not be thinking of when I consider this functionality?" Some things it'll come up with will be a bad idea, but others will actually be a good idea or a good thing to try out.

Matt McLane:

Having them write Playwright tests would be really useful. I will say that there is some nervousness about having AI read our code, and I don't think we'll ever have AI read our database because we have a lot of PHI in there. We maintain kids' health records; we don't want to make that available to an engine. But I can see if we can have AI help us write tests, that would be pretty sweet.

Ole Lensmar:

In DevOps-related tasks, do you see AI writing configs like Kubernetes manifests or do you see any use cases where it could help you?

Matt McLane:

I use it. Just yesterday I was using AI to try to help solve some problems around running a cron job, and it was hit or miss whether or not it was even right. As I said, sometimes it just lies to you. It's like, "No, it doesn't work that way," or "No, I'm not trying to create a cron job, I'm just trying to create a job." But I do use it all the time, in part because I find it in some ways is a really good search engine for "how does this work?" But it's not the end. I have to keep in mind that it's not always right, and so I have to validate it myself. It doesn't know what it's doing; it's just pretending it knows what it's doing. But I do use it all the time. Getting Copilot in VS Code changed everything for me.

Ole Lensmar:

Awesome. Matt, thank you so much. It's been a pleasure to have you here and to learn about your journey there. I wish you good luck with everything going forward with Doc Network. Thank you. Bye-bye.

Matt McLane:

Thank you. Thank you very much. Bye.

People on this episode