
Cloud Native Testing Podcast
The Cloud Native Testing Podcast, sponsored by Testkube, brings you insights from engineers navigating testing in cloud-native environments.
Hosted by Ole Lensmar, it explores test automation, CI/CD, Kubernetes, shifting left, scaling right, and reliability at scale through conversations with testing and cloud native experts.
Learn more about Testkube at http://testkube.io
Cloud Native Testing Podcast
The Shift in Software Testing: What Cloud-Native Means for QA with Mario-Leander Reimer
In this episode, Ole speaks with Mario-Leander Reimer, Managing Director of QAware, about the evolution of testing in cloud-native environments. They discuss the challenges and opportunities presented by shifting left in testing, the responsibilities of developers and QA teams, and the importance of continuous testing. The conversation also touches on the need for holistic quality insights, the role of specialists in quality assurance, and the significance of exploratory testing in maintaining software quality.
---
This podcast is proudly sponsored by Testkube, the cloud-native, vendor-agnostic test execution and orchestration platform that enables teams to run any type of test automation directly within their Kubernetes infrastructure. Learn more at www.testkube.io
Ole Lensmar (00:48)
Hello and welcome to today's episode of the Cloud Native Testing Podcast. I'm super excited today to be joined by Leander Reimer, a fantastic advocate for Testkube over the years, although we're not here to talk about specifically about Testkube Leander, it's been a pleasure knowing you for these time. Please give us a short introduction of who you are, what you do.
Mario-Leander Reimer (01:13)
Thanks Ole. Yes, so my name is Leander. I'm nowadays Managing Director and CTO of QAware. That's a Munich-based software consultancy. We do a lot of cloud-native development, cloud-native modernization projects at the heart of many, let's say, German-speaking, DAH-region-based enterprises and SMEs. Background-wise, I study computer science.
25 years ago, would say, I receive my master's degree in distributed systems. in industry ever since, doing projects different sizes, obviously. Also, that's maybe a coincidence for this talk today, I'm a lecturer for software quality assurance here at the university in my hometown.
Ole Lensmar (02:04)
OK, fantastic. So thank you. So QAware or for English speaking people QAware. Obviously, it would. So. Yeah, so so that obviously, you know, directly leads us into maybe what I want to talk about is about quality and testing in cloud native, which is, you know, coincidentally the topic of the podcast at large. So. We've you know other guests we've talked about.
Mario-Leander Reimer (02:12)
Probably QA, even though the A is not about quality assurance, it's about quality and agility.
Ole Lensmar (02:34)
What is cloud-native testing and how is it different from non-cloud-native or regular or whatever you might want to call it, testing? Love to hear your thoughts on that and both what you think yourself, but also what you see with your customers or the people that you interact
Mario-Leander Reimer (02:53)
Yep. Well, well then maybe we should go back a bit in time and software development time. mean, as said, I'm in the industry for quite some time, right? Testing used to be pretty easy, I guess, when there were monolithic systems, you know, these big systems. Agile wasn't a thing back then, I guess, 10 years. So things were kind of easy, right? You had one system, maybe one API, you had long release cycles.
You had one huge team, maybe even a dedicated testing team. So from a testing perspective, these times were actually pretty easy, right? I don't know if you agree, Ole, but I would say.
Ole Lensmar (03:32)
Well,
I definitely agree with the picture that.
going backwards in time, a more monolithic approach to building and deploying, delivering software also had a more clear place for testing in that pipeline or workflow or side lifecycle exactly. So definitely agree with that. I think, I mean, that's to me, that's kind of really the big shift now is that that the lifecycle of software delivery has changed a lot over, thanks to Cloud Native and
Mario-Leander Reimer (03:52)
and in that life cycle, in the software delivery life cycle.
Absolutely, mean, cycles
got shorter. And obviously, the systems we have to test with the, let's say, the advent of cloud and, you know, more microservice based or more smaller deployment units, let's say, testing has become more challenging. And a lot more has shifted towards the developers also. And like it always is, if there is some pressure,
either money-wise or time-wise or some product owner pushing too hard, only looking for functionality as a quality attribute, then it's just too easy to forget about or just leave the other quality attributes behind and leave the testing behind for these. And this is what I've seen multiple times when I do consultancy, go into, let's say, vintage systems.
Usually what we find is an absolute lack of proper quality assurance. And if you ask the people, you have different reasons, right? Either it's the infrastructure complexity or it's the time and the lack of time to set up proper pipelines. Maybe it's even the lack of knowledge sometimes, right? So different reasons there I find in reality. And I think...
And that's why we're here, right? I think if you want to do your work right, then well, you have to invest the time first up. But I think it's a myth that it takes a lot of effort to do that, right? There is the tooling there. There is good tooling out there. know, GitLab, GitHub, Action. So if you do like a lot of testing in the CI, you can do a lot on your local machine as a developer. And obviously, then there are...
Kubernetes-based tools like Testkube as an example that make life even more easier to take these tests out of the pipeline and put them straight into the cluster when stuff gets changed, when the workloads change. I think that's the future. That's super flexible.
Ole Lensmar (06:20)
Yeah,
but I was like you say, I think one thing that I find interesting is you're describing that testing, you know, the shift left we've talked about in the industry for long time is to do shift testing earlier. But to me that also there's like a challenge there is that even though you maybe technically it's easier to do testing earlier, we have a lot of more like code driven approaches to writing all types of tests.
the fact that you add testing to the workload of the engineers or the developers also sometimes results that there isn't as much testing being done, right? So it's like shift left is sometimes has resulted in less testing actually being done at the end. So I like the idea of shift left, but in practicality, I think sometimes it has diluted kind of
the actual testing because that's not what as an engineer maybe or as a developer, and I hope I'm stepping on anyone's toes here is maybe that's not your top priority initially unless you're doing a TDD or test driven approach, which maybe works more for functional but not as well for non-functional. you see that as well, that kind of shifting left has sometimes resulted in less testing being done or is it the other way around? Has it actually raised the awareness or the focus on testing?
Mario-Leander Reimer (07:46)
Yeah, mean, shifting left is, in my opinion, pretty easy, but limited in what you can test because shifting left means you usually don't have a running system. So what you can do best is like mostly static, static analysis, static analysis, static dependency, testing, checking, if you like. So these things are easy to shift left. I guess most teams do that nowadays because they have the awareness because many
tools already bring that baked in. There's nothing to set up. Okay, so that's kind of you get that for free. But it's more if you go to the right speaking, right? You have something deployed, you have something running, you're close to production, then only dynamic testing is it can be done, right? In acceptance testing, in end-to-end testing, in performance testing, in proper penetration and security testing. Yeah, right? Testing these.
always unfortunately called non-functional quality attributes. And they're, well, they're not as easy. You don't get them for free. You have to do the work. You have to write the test. You maybe have to containerize them to let them run within your environment next to the workloads you want to test. So there is still some work to be done. And again here,
You need time for this, right? And if you're...
Ole Lensmar (09:15)
Yeah,
who's responsibility is it ultimately if maybe historically if you say let's shift left, let's get rid of the QA team. let the engineers, the developers do their testing. But now to your point, we need to insert testing into our continuous delivery pipelines, right? Maybe you're doing GitOps or you're doing progressive delivery or you have like tools like Keptn or
cargo where you have like quality gates and both functional and non-functional tests that need to be written and automated and executed and used. Does that suddenly fall on the DevOps persona to own that? is there a new QA role to be found in a cloud native world? I don't know, I'm just thinking out loud here. From what we see.
With test teams specifically, is, it is very often DevOps people who are the ones crafting load tests, crafting security tests, and then injecting those in their pipelines. the QA or traditional QA testing is more on the functional side of things. Does that align with what you're seeing or are people doing it on your end?
Mario-Leander Reimer (10:21)
Okay.
Partially, yes. would argue that it's... Well, coming back to the responsibility you've been talking about, right? guess the responsibility highly depends on the team topology you have. If you have mixed teams that are responsible for everything, these ominous DevOps teams, whatever that is, right? I'm not a big fan of that term.
Ole Lensmar (10:44)
Mm-hmm.
Mario-Leander Reimer (10:58)
then yes, seemingly everyone's kind of responsible for doing that stuff. I've seen many organizations abolish test teams, which I think is a failure, is wrong. What they wanted to abolish is a dedicated testing phase after the development. That is what we wanted to get rid of.
There are experts in testing. still think that there should be specialists and experts in testing, but they should be integrated into that bi-weekly agile development cycle, or whatever sprint length you have, right? They should be integrated there and do the testing along the development side. So I guess it's a matter of get your team topology right, get your responsibilities right, get your team setup right. If you say...
If you are a small team, maybe a small company, then yes, it's probably like that, that everyone's responsible for development, for testing, for deployment.
The bigger you are, mean, the biggest of our teams is a project with roughly 60 engineers. And in there, for sure, not each and every developer in that 60 engineering team, personal engineering team is responsible for everything. That doesn't really work out. That is not efficient anymore. So there, let's say a separation of concerns, separation of roles is kind of what makes you productive again.
what I think should be the ideal, right? Get your team set up straight, get your responsibilities straight, and then it's pretty clear who's responsible for what and who's responsible for testing.
Ole Lensmar (12:34)
I totally agree that someone needs to own. I mean, there are ultimately you would have both non-functional functional requirements on the infrastructure application as a whole, and then someone needs to own making sure that those are enforced or validated throughout your entire pipeline. And I think what's changed maybe is how that just the plumbing and the mechanics of that being done. And think another thing we're seeing is
like you described earlier, we've broken down our systems into microservices, into containers, and things are continuously being built and deployed. there's this continuous flow of things going on. And what we've also seen is that people would often do testing very late or automated, very late in the lifecycle, even in production, to not slow down.
Mario-Leander Reimer (13:27)
Mm-hmm.
Ole Lensmar (13:32)
those continuous GitOps processes or whatever approach they're doing to building and deploying, they want that to be super smooth. They do very little sanity check testing through that workflow. And then when things hit production, that's when they have a more, every hour they run their end-to-end tests or whatever functional, both functional maybe and non-functional tests as like.
at the end, is that, I'm always like, I get it, because you want to speed up the delivery cycle, but I'm also like, wouldn't that risk something bad going into production and how would you kind of set the balance there? And I'm sure this is, do you see that on your end?
Mario-Leander Reimer (14:20)
guess that.
When I talk to people in conferences maybe or even at the customer side sometimes with the internal developer teams. Yes, there, especially when you talk to maybe not so experienced developers, junior ones, freshly coming from university. Yes, there sometimes is that, yeah, well, we all know maybe short sighted opinion that it will slow down you or the development or the release cycle.
I don't think that's true. That's only true if you have a sequential life cycle. First the development, then the testing, then the deployment. Then it's obviously the more you test, the longer that phase becomes. So the longer the actual release or the actual deployment has to wait. But if you see testing as a continuous asynchronous...
swim lane, stream right next to development, continuously testing as soon as something changes, you let exactly the right tests run, then that does not slow you down. That's the one thing. And then the other one, and that is, would say, hopefully, common sense in the software engineering community, term technical depth should be familiar to everyone.
Ole Lensmar (15:32)
Mm. Mm.
Mario-Leander Reimer (15:45)
the longer you wait with the proper testing or the longer you wait in finding potential issues, performance issues, security issues, whatever issues, your technical depth will accumulate. And once you hit a certain threshold, then you will be really slow, okay? Because then you have that huge pile of depth, which you then have to, you know, work down again. And this is actually what you want to avoid. Unfortunately, there, you know, that...
Ole Lensmar (16:01)
and
Yes. Go back.
Mario-Leander Reimer (16:14)
hill can become quite big. But when you realize, then unfortunately it's often too late. And that's why I say it's short-sighted.
Ole Lensmar (16:21)
Yeah, no.
Definitely, and I'm pretty sure, I mean, I'm sure I've been guilty of this myself, right? So let's, you know, let's just get stuff out and let's fix it later. And that's a very, very short-sighted position to take. Guilty is charged.
Mario-Leander Reimer (16:42)
I'm not saying I'm at myself from time to time, but
it takes some discipline to then say, but now I put in a break and now I clean up myself and then I continue again. It's a thing of discipline in the end, I guess.
Ole Lensmar (16:51)
Mm.
Mm. Mm. Mm.
Yes, and culture maybe also in the team, right? So that it's kind of really important maybe to work as a team, both in taking those steps to maybe sometimes accelerate, but then also in understanding the value of that, maybe for the business or for a specific customer, but then also taking a step back to your point and let's not get too far out of ourselves. We do have to clean up a little bit before we kind of take the next.
leap forward. I wanted to get back to something else you said. So you said you talked about asynchronous continuous testing going on while you build. it feels to me like that that's really, to me, the heart of what cloud native testing is versus not. And just from a technology point of view, you see that?
How does that kind of, how would that be implemented? Would you be using more event-driven pipelines? I'm just going to like CD events or cloud events, or is it GitOps, or is it Tekton, or is it a Captain, or is it, what is kind of the, is it a, at what point does this become a tooling thing? Or could you actually do this with GitHub Actions and Jenkins if you just, know, shifted your mindset and utilized those tools in that way?
Mario-Leander Reimer (18:24)
I guess all possible what you said, would say. My early experiments, that's, God, I don't know, three years, almost four years ago now, my initial idea was to just leverage Kubernetes-based events, right? In the end, Kubernetes is a huge state machine, right? Something created, something deleted, something has changed.
And that was then by coincidence, you know, when I stumbled across test cube in a very, very early version, version or zero something, it was really early. But so I kind of realized, okay, the idea is not bad to leverage Kubernetes based events. But I guess all the other tools you mentioned, are kept in in a way is kind of an abstraction on top of Kubernetes based events, which makes it even easier.
Ole Lensmar (18:57)
Yeah, very early.
Mm-hmm.
Mario-Leander Reimer (19:17)
There is the option on all these CI or CD tools like Argo or like Flux.
We have both in different projects or some think Argo is better, some think some of our teams think Flux is better. I'm not saying which I favor. But they also obviously when something, deployment has finished or maybe a Helm chart has been deployed, there is the option to call webhooks out of these tools also. So that's kind of an option where you could hook into and then trigger certain tests.
And yes, for simple stuff, why not use GitHub Actions to then trigger something once you know, hey, I have deployed a Docker image to that registry. Please let some tests on the security of these images run. Or I have deployed a certain workload to my test environment and now you trigger a GitHub to GitHub.
towards the cluster and then stuff runs. So all viable, I guess that's kind of dependent on the setup you have of your software development environment and infrastructure. But you should have everything in your toolbox and then you can decide depending on the project what tool to use and which trigger do you implement.
Ole Lensmar (20:23)
Mm-hmm.
Yeah, right.
I mean, like, I definitely conceptually like this idea of this being a continuous flow of things happening. To your point, I push, I build something, I push it to Docker Hub that takes off an event that triggers the deployment and with Argo maybe, and then that itself kicks off an event that runs tests. And if those tests passes, then that kicks off an event that promotes my.
thing to another phase or whatever. really like that. feels like if all those things can work smoothly, you could have a really well working machine at your disposal. Maybe the challenge is, one challenge I see today, and I don't know how to solve this, maybe you do, is how do I get like a, I'd like to know what's going on, right? What events are currently being triggered? Because you might have more.
Mario-Leander Reimer (21:16)
Yeah.
Ole Lensmar (21:32)
many different things being built and tested and deployed at the same time and having some kind of view or insight into what place in this lifecycle are my different components. That would feel like to me as maybe the control freak within me, which isn't very large, but it's still there. I would like to kind of see, be able to go in and look, okay, this is currently being, know, things are stuck here or that's where we're putting much of our resources.
Mario-Leander Reimer (21:59)
Absolutely,
I totally agree, right? I mean having everything in a pipeline, the pipeline only runs when something changes, right? And when that actual workload changes, I think in a distributed system, the testing also becomes distributed. Because if a downstream dependency, you know, has been changed, you need to run the test of the upstream dependency also. With a pipeline approach, you can't review that. And that then has the challenge that, you know, you have different tests running at...
random times whenever something changes. So accumulating the or getting a holistic view on the quality of the overall system and accumulating all these different results of all these different tests, all the different locks, all the different failures or success rates and other metrics that might be relevant. Like if you have a performance test and obviously you want the test to be successful, but obviously
Ole Lensmar (22:28)
Mm-mm. Mm.
Mario-Leander Reimer (22:54)
you want to have a look at maybe some performance.
Characteristic that's and that data is then located in some Prometheus instance Maybe because that one captured all the you know, the response rate and the the HTTP success codes and everything So what you definitely need is that single pane of glass? I guess you call it that single source of truth of quality truth So that then you can decide as a team or product owner To to say like okay. Are we safe to release camera releases everything in order is everything?
as it should be, and then you can promote like not only one microservice or maybe the system as a whole or a functional group of services that are tightly related, for example. So that gives you the flexibility. And that is something I see out there that still needs to be worked on a little.
Ole Lensmar (23:44)
I agree. think having that holistic view of not just testing, but you know, everything going on in an asynchronous lifecycle is something that will be more important. Another kind of interesting tangent that we've maybe on this is that sometimes it's still kind of, there's still like a one, I don't know what the word, one bottleneck for these life cycles. for example, we have
We have customers who say like at a certain time slot we want to run a you know, only these load tests are allowed to run, right? We're not allowed to run anything else, right? Because they want to make sure that when these tests run nothing else is kind of consuming resources both for the system under test but also for the testing tool itself and that then that doesn't always work well with this kind of continuous flow of things always going on because then you'd kind of basically locking down your
infrastructure and saying, for the next two hours, the performance team is going to run their tests and everything else is going to have to stop. And then when they're done, we can open the gates again and everything else can kind of pick up again. And that seems like to me like a little bit of collision between both the old way of doing things where it's more like, you know, water folly, not necessarily bad way, but and this kind of more continuous, fluent ways. Have you seen anything like that on your end?
Mario-Leander Reimer (25:15)
Yeah, definitely not that often in recent times, but I Let's let's say I have a because we are an independent consultancy, know, I've seen I've seen many different clients in in my time and I Think you have to take these concerns seriously
Ole Lensmar (25:37)
Hmm?
Mario-Leander Reimer (25:38)
If they are raised like, no, only two at night, we're allowed to do the performance test because we have the fear of, know, crippling some other workloads maybe, or deteriorating their quality of service. But if you dig a little deeper, I guess that is then our job as engineers to dig deeper, to find the root cause, because maybe it's a valid reason, let's say because they didn't implement cluster scaling and cluster sizing probably.
And of course, then your resources are limited. And then of course, that can happen. So that is then a valid reason. Then you have to go into a discussion and say, like, hey, listen, this is all about continuous improvement here. Why not implement, I don't know, cluster autoscaler or...
a similar solution depending on which cloud provider you're on, right? Why not implement HBA for the workloads then if this is the bottleneck? So I guess then our job is to find exactly that bottleneck, then remedy that one so we can also improve on the system stability. That's why we do tests and also on the whole test procedure and the whole software development procedure and delivery life.
Ole Lensmar (26:49)
Yeah, I think also
that really highlights what you were saying earlier. I think that QA is not just reactive and these tests are not just reactive and they're part of the entire life cycle or the entire process. To your point, QA can maybe help in those situations, especially if the people running the tests are...
low-level experts at maybe Kubernetes and auto scaling, other things can help not just being constrained by the test, but also maybe can help figure out understanding the reason for this bottleneck and also helping finding solutions. it's as always, collaboration, all that kind of stuff.
Mario-Leander Reimer (27:32)
Exactly.
because it's so much more complex, right? This is another reason when we come back to the responsibilities, right? I would say performance tests, security tests, everything that is tightly correlated with reliability of the system. I would say you definitely should have definitely an SRE in your team, like a cyber reliability engineer whose sole responsibility is to make sure that the system's running reliable, that the infrastructure reliable.
and connected to that reliability are certain categories of tests. Or you have a security specialist and his only purpose in regards to testing is proactively do penetration tests, proactively go into having a look at the dependencies, monitoring different CVEs, then maybe triggering a new set of scans and whatever.
Ole Lensmar (28:07)
and
Mario-Leander Reimer (28:28)
And software engineers also, right? mean, they are closely related to code. So for them, it's the acceptance tests usually. Maybe I have a usability expert. His responsibility is to run usability tests, even though automation and usability testing is not super easy. Not many tools out there yet, right? But I guess usability will become a thing with that new EU regulation on accessibility and usability, right? We do have to do our homework there also. Otherwise we violate that.
Ole Lensmar (28:45)
and
Okay.
Mario-Leander Reimer (28:59)
that law, that regulation.
Yeah.
Ole Lensmar (29:03)
Super interesting because that kind of brings us also maybe not full circle, going back to, I know we talk a lot about automated testing, but ultimately exploratory testing or manual testing is something that's always, know, which is very important on the functional side. Somebody should be provoking you, you know, the,
the UI of your application to do weird things and see how it behaves. But this, think, applies just like you're saying. It applies to security testing, any kind of testing. Someone that should be spending time more probing and trying to find bottlenecks or whatever, more as an exploratory testing practice, which I think is super important to keep alive and maintain in an organization and not just rely on automated tests through the process.
Mario-Leander Reimer (29:50)
Yeah.
And as you say, and you can combine both, right? So you need the proactive testing activity by whomever is responsible for it, Probe, play around with the system. Once you found the regression, maybe, automate it, put it into your automation system and test runner system. And then that's kind of done for the future. And then you are, you know, mental load free again.
and you can focus on other issues you may find. So you definitely need both. You come from exploratory testing, you then automate stuff, and then you go back to exploratory testing again.
Ole Lensmar (30:25)
Great.
Awesome. So on that note, thank you so much, Leander. It's been a pleasure talking to you and hearing your insights. Thank you so much. And I hope to see you at a conference or another event at some time soon. And thanks to everyone listening. Thank you.
Mario-Leander Reimer (30:50)
Thanks Ole and thanks to the audience. Bye.