Stuck between a great PhD offer and a solid DevOps career any advice?
I’m currently working as a DevOps Engineer with a good salary, and I’m 27 years old.
Recently, I received an offer to pursue a PhD at a top 100 university in the world. The topic aligns perfectly with my passion — information security, WebAssembly, Rust, and cloud computing.
The salary is much lower than my current salary, and it will take around 5 years to finish the program, but I see this as a rare opportunity at my age to gain strong research experience and deepen my technical skills.
I’m struggling to decide is this truly a strong opportunity worth taking, or should I stay in the industry and keep building my professional experience?
Has anyone here gone through a similar situation? How did it impact your career afterward whether you stayed in academia or returned to industry?
  
After having a phd in information security, what are the opportunities to come back to the industry?
https://redd.it/1ojv4rf
@r_devops
  
  I’m currently working as a DevOps Engineer with a good salary, and I’m 27 years old.
Recently, I received an offer to pursue a PhD at a top 100 university in the world. The topic aligns perfectly with my passion — information security, WebAssembly, Rust, and cloud computing.
The salary is much lower than my current salary, and it will take around 5 years to finish the program, but I see this as a rare opportunity at my age to gain strong research experience and deepen my technical skills.
I’m struggling to decide is this truly a strong opportunity worth taking, or should I stay in the industry and keep building my professional experience?
Has anyone here gone through a similar situation? How did it impact your career afterward whether you stayed in academia or returned to industry?
After having a phd in information security, what are the opportunities to come back to the industry?
https://redd.it/1ojv4rf
@r_devops
Reddit
  
  From the devops community on Reddit
  Explore this post and more from the devops community
  Offloading SQL queries to read-only replica
What's the best strategy? One approach is to redirect all reads to replica and all writes to master. This is too crude, so I choose to do things manually, think
Database.on_replica do
# code here
end
However this has hidden footguns. For one thing the code should make no writes to the database. This is easy to verify if it's just a few lines of code, but becomes much more difficult if there are calls to procedures defined in another file, which call other files, which call something in a library. How can a developer even know that the procedure they're modifying is used within a read-only scope somewhere high up in the call chain?
Another problem is "mostly reads". This is find_or_create method semantics. It does a SELECT most of the time, but for some subset of data it issues an INSERT.
And yet another problem is automated testing. How to make sure that a bunch of queries are always executed on a replica? Well, you have to have a replica in test environment. Ok, that's no big deal, I managed to set it up. However, how do you get the data in there? It is read-only, so naturally you have to write to the master. This means you have to commit the transaction, otherwise replica won't see anything. Committing transactions is slow when you have to create and delete thousands of times per each test suit run.
There has to be a better way. I want my replica to ease the burden of master database because currently it is mostly idle.
https://redd.it/1ojv8gv
@r_devops
  
  What's the best strategy? One approach is to redirect all reads to replica and all writes to master. This is too crude, so I choose to do things manually, think
Database.on_replica do
# code here
end
However this has hidden footguns. For one thing the code should make no writes to the database. This is easy to verify if it's just a few lines of code, but becomes much more difficult if there are calls to procedures defined in another file, which call other files, which call something in a library. How can a developer even know that the procedure they're modifying is used within a read-only scope somewhere high up in the call chain?
Another problem is "mostly reads". This is find_or_create method semantics. It does a SELECT most of the time, but for some subset of data it issues an INSERT.
And yet another problem is automated testing. How to make sure that a bunch of queries are always executed on a replica? Well, you have to have a replica in test environment. Ok, that's no big deal, I managed to set it up. However, how do you get the data in there? It is read-only, so naturally you have to write to the master. This means you have to commit the transaction, otherwise replica won't see anything. Committing transactions is slow when you have to create and delete thousands of times per each test suit run.
There has to be a better way. I want my replica to ease the burden of master database because currently it is mostly idle.
https://redd.it/1ojv8gv
@r_devops
Reddit
  
  From the devops community on Reddit
  Explore this post and more from the devops community
  payment processing went down for 2 minutes. engineering said p3. finance said p1
we had a payment gateway timeout friday that lasted barely 2 minutes. during that time customers couldnt complete checkouts.
engineering immediately called it p3. its a known issue with the third party provider. happens occasionally. self resolved. no code changes needed.
finance lost their minds. called it p1. ran the numbers and we lost significant revenue because its black friday weekend. customers who hit errors abandoned carts and didnt come back.
support sided with finance because they got slammed with tickets and customers were threatening chargebacks on social media.
product sided with engineering because technically the system worked as designed. timeout and retry logic did exactly what it should.
spent the entire postmortem arguing about severity instead of talking about improvements. finance wants anything touching payments to be p1 automatically. engineering says that makes severity meaningless.
the problem is both are right. from technical standpoint it was minor. from business standpoint we literally lost money during peak shopping weekend.
calling on fintech and ecommerce people: how do you handle this kinda scenario, looking for some advice.?
https://redd.it/1ojz89l
@r_devops
  
  we had a payment gateway timeout friday that lasted barely 2 minutes. during that time customers couldnt complete checkouts.
engineering immediately called it p3. its a known issue with the third party provider. happens occasionally. self resolved. no code changes needed.
finance lost their minds. called it p1. ran the numbers and we lost significant revenue because its black friday weekend. customers who hit errors abandoned carts and didnt come back.
support sided with finance because they got slammed with tickets and customers were threatening chargebacks on social media.
product sided with engineering because technically the system worked as designed. timeout and retry logic did exactly what it should.
spent the entire postmortem arguing about severity instead of talking about improvements. finance wants anything touching payments to be p1 automatically. engineering says that makes severity meaningless.
the problem is both are right. from technical standpoint it was minor. from business standpoint we literally lost money during peak shopping weekend.
calling on fintech and ecommerce people: how do you handle this kinda scenario, looking for some advice.?
https://redd.it/1ojz89l
@r_devops
Reddit
  
  From the devops community on Reddit
  Explore this post and more from the devops community
  What do you do when Audit wants tickets and there are none?
For those in large public companies, do you ever work with Audit? What do you do when Audit comes around asking for tickets on work that was done using systems outside of Jira/ADO? Audit is breathing down our necks.
https://redd.it/1ojzvun
@r_devops
  
  For those in large public companies, do you ever work with Audit? What do you do when Audit comes around asking for tickets on work that was done using systems outside of Jira/ADO? Audit is breathing down our necks.
https://redd.it/1ojzvun
@r_devops
Reddit
  
  From the devops community on Reddit
  Explore this post and more from the devops community
  The problem I see with AI is if the person asking AI to do something doesn’t understand scale, they could end up with infrastructure issues at the foundation.
How many times have we had to talk our own people off a ledge for considering Kubernetes when we just need ECS or vice-versa? How many times has management come back from a conference with a new shiny and it then becomes the biggest maintenance headache for every one involved?
I think that we may not see it immediately but poorly architected architecture in middling companies that are trying to poorly execute AI agents will keep us busy for quite some time. The bubble isn’t a sudden pop. Its a slow realization that you screwed yourself over two years ago by blindly taking the recommendations of an advanced autocomplete program.
https://redd.it/1ok2g4q
@r_devops
  
  How many times have we had to talk our own people off a ledge for considering Kubernetes when we just need ECS or vice-versa? How many times has management come back from a conference with a new shiny and it then becomes the biggest maintenance headache for every one involved?
I think that we may not see it immediately but poorly architected architecture in middling companies that are trying to poorly execute AI agents will keep us busy for quite some time. The bubble isn’t a sudden pop. Its a slow realization that you screwed yourself over two years ago by blindly taking the recommendations of an advanced autocomplete program.
https://redd.it/1ok2g4q
@r_devops
Reddit
  
  From the devops community on Reddit
  Explore this post and more from the devops community
  What’s everyone using for application monitoring these days?
Trying to get a feel for what folks are actually using in the wild for application monitoring.
We’ve got a mix of services running across Kubernetes and a few random VMs that never got migrated (you know the ones). I’m mostly trying to figure out how people are tracking performance and errors without drowning in dashboards and alerts that no one reads.
Right now we’re using a couple of open-source tools stitched together, but it feels like I spend more time maintaining the monitoring than the actual app.
What’s been working for you? Do you prefer to piece stuff together or go with one platform that does it all? Curious what the tradeoffs have been.
https://redd.it/1ok21tz
@r_devops
  
  Trying to get a feel for what folks are actually using in the wild for application monitoring.
We’ve got a mix of services running across Kubernetes and a few random VMs that never got migrated (you know the ones). I’m mostly trying to figure out how people are tracking performance and errors without drowning in dashboards and alerts that no one reads.
Right now we’re using a couple of open-source tools stitched together, but it feels like I spend more time maintaining the monitoring than the actual app.
What’s been working for you? Do you prefer to piece stuff together or go with one platform that does it all? Curious what the tradeoffs have been.
https://redd.it/1ok21tz
@r_devops
Reddit
  
  From the devops community on Reddit
  Explore this post and more from the devops community
  Datadog suddenly increasing charges
Hi there 👋🏻
Just wanna check if anyone else got these news.. Basically, they informed us that they have decided to have a new SKU for fargate apm and that now we are gonna be billed 3 times more for this product.. that is, if we have a fargate apm task, currently we pay 1usd and after this change is gonna cost 4usd.
has anyone got this news? I even thought that they wanna ditch us and this is the way for doing so..
https://redd.it/1ok48jx
@r_devops
  
  Hi there 👋🏻
Just wanna check if anyone else got these news.. Basically, they informed us that they have decided to have a new SKU for fargate apm and that now we are gonna be billed 3 times more for this product.. that is, if we have a fargate apm task, currently we pay 1usd and after this change is gonna cost 4usd.
has anyone got this news? I even thought that they wanna ditch us and this is the way for doing so..
https://redd.it/1ok48jx
@r_devops
Reddit
  
  From the devops community on Reddit
  Explore this post and more from the devops community
  Final interview flipped into a surprise technical test! and I froze
Went through a multi-stage interview process at a cybersecurity company, two technical interviews, one half-technical intro chat, and an HR round. Everything went well, strong vibes, and I genuinely felt aligned with the company culture and team, they loved the vibes as well.
I was told the final call with the VP would be a “casual intro and culture fit conversation.”
Except… it wasn’t.
The VP immediately turned it into a high-pressure technical interview. No warm-up, no small talk, straight into deep technical questions and drilling down to very specific wording. I tried to keep up, but I wasn’t mentally prepared for a surprise test. The pressure hit, I got flustered, and couldn’t articulate things I normally handle well.
After that call, I was told they think I have “knowledge gaps” and it’s not the right fit right now.
And honestly… it stung. Not because I think I deserved anything, but because I felt like I didn’t get judged on the abilities I showed throughout the whole process, but on a single unexpected stress moment.
I know interviews can be unpredictable, but being evaluated on an exam you didn’t know you were about to take feels off. Still processing whether I should reach out and ask for reconsideration or just move forward?
Just needed to get it out.
https://redd.it/1ok74pn
@r_devops
  
  Went through a multi-stage interview process at a cybersecurity company, two technical interviews, one half-technical intro chat, and an HR round. Everything went well, strong vibes, and I genuinely felt aligned with the company culture and team, they loved the vibes as well.
I was told the final call with the VP would be a “casual intro and culture fit conversation.”
Except… it wasn’t.
The VP immediately turned it into a high-pressure technical interview. No warm-up, no small talk, straight into deep technical questions and drilling down to very specific wording. I tried to keep up, but I wasn’t mentally prepared for a surprise test. The pressure hit, I got flustered, and couldn’t articulate things I normally handle well.
After that call, I was told they think I have “knowledge gaps” and it’s not the right fit right now.
And honestly… it stung. Not because I think I deserved anything, but because I felt like I didn’t get judged on the abilities I showed throughout the whole process, but on a single unexpected stress moment.
I know interviews can be unpredictable, but being evaluated on an exam you didn’t know you were about to take feels off. Still processing whether I should reach out and ask for reconsideration or just move forward?
Just needed to get it out.
https://redd.it/1ok74pn
@r_devops
Reddit
  
  From the devops community on Reddit
  Explore this post and more from the devops community
  Should incident.io be my alert router, or only for critical incidents?
  
So our observability stack consists of grafana and prometheus for monitoring and alerting, and incident.io for incidents and on-call....
  
Should I send all alerts to indicent.io and from there decide which channels the alert should go to (like slack, email... etc)? or make that decision on grafana and only send critical incidents to incident.io?
https://redd.it/1ok3iuw
@r_devops
  
  So our observability stack consists of grafana and prometheus for monitoring and alerting, and incident.io for incidents and on-call....
Should I send all alerts to indicent.io and from there decide which channels the alert should go to (like slack, email... etc)? or make that decision on grafana and only send critical incidents to incident.io?
https://redd.it/1ok3iuw
@r_devops
incident.io
  
  All-in-one incident management platform | incident.io
  incident.io is an all-in-one incident management platform unifying on-call scheduling, real-time incident response, and integrated status pages – helping teams resolve issues faster and reduce downtime.
  Do your teams skip retros on busy weeks?
Hi everyone, I’m looking for a bit of feedback on something.
I’ve been talking with a bunch of teams lately, and a lot of them mentioned they skip retros when things get busy, or have stopped running them altogether.
This makes sense to me since since I've definitely had Fridays with too much to get done, and didn't want to take the time for a retro.
But I wanted to check with everyone here - is that true for your teams too?
I wondered if a lighter weight way to run a retro would be of interest, so I put together a small experiment to test that idea (not ready yet, just testing the concept).
The concept is a quick Slackbot that runs a 2-minute async retro to keep a pulse on how the team’s doing: https://retroflow.io/slackbot
Would this be valuable to anyone here?
(Not promoting anything — just exploring the idea and genuinely interested in feedback.)
https://redd.it/1okadjz
@r_devops
  Hi everyone, I’m looking for a bit of feedback on something.
I’ve been talking with a bunch of teams lately, and a lot of them mentioned they skip retros when things get busy, or have stopped running them altogether.
This makes sense to me since since I've definitely had Fridays with too much to get done, and didn't want to take the time for a retro.
But I wanted to check with everyone here - is that true for your teams too?
I wondered if a lighter weight way to run a retro would be of interest, so I put together a small experiment to test that idea (not ready yet, just testing the concept).
The concept is a quick Slackbot that runs a 2-minute async retro to keep a pulse on how the team’s doing: https://retroflow.io/slackbot
Would this be valuable to anyone here?
(Not promoting anything — just exploring the idea and genuinely interested in feedback.)
https://redd.it/1okadjz
@r_devops
How do you get engineering teams to standardize on secure base images without constant pushback?
We're scaling our containerized apps and need to standardize base images for security andcompliance, but every team has their own preferences. Policy as code feels heavy, and blocking PRs kills velocity.
What’s worked for you? Thinking about automated scanning that flags non-approved images but doesn't block initially, then gradually tightening. Or maybe image registries with approved-only pulls?
Any tools or workflows that let you roll this out incrementally? Don't want to be the team that breaks everyone's deploys.
https://redd.it/1okcmjx
@r_devops
  
  We're scaling our containerized apps and need to standardize base images for security andcompliance, but every team has their own preferences. Policy as code feels heavy, and blocking PRs kills velocity.
What’s worked for you? Thinking about automated scanning that flags non-approved images but doesn't block initially, then gradually tightening. Or maybe image registries with approved-only pulls?
Any tools or workflows that let you roll this out incrementally? Don't want to be the team that breaks everyone's deploys.
https://redd.it/1okcmjx
@r_devops
Reddit
  
  From the devops community on Reddit
  Explore this post and more from the devops community
  Have you ever discovered a vulnerability way too late? What happened?
AI coding tools are great at writing code fast, but not so great at keeping it secure.
Most developers spend nights fixing bugs, chasing down vulnerabilities and doing manual reviews just to make sure nothing risky slips into production.
So I started asking myself, what if AI could actually help you ship safer code, not just more of it?
That’s why I built Gammacode. It’s an AI code intelligence platform that scans your repos for vulnerabilities, bugs and tech debt, then automatically fixes them in secure sandboxes or through GitHub actions.
You can use it from the web or your terminal to generate, audit and ship production-ready code faster, without trading off security.
I built it for developers, startups and small teams who want to move quickly but still sleep at night knowing their code is clean.
Unlike most AI coding tools, Gammacode doesn’t store or train on your code, and everything runs locally. You can even plug in whatever model you prefer like Gemini, Claude or DeepSeek.
I am looking for feedback and feature suggestions. What’s the most frustrating or time-consuming part of keeping your code secure these days?
https://redd.it/1ok5z0a
@r_devops
  
  AI coding tools are great at writing code fast, but not so great at keeping it secure.
Most developers spend nights fixing bugs, chasing down vulnerabilities and doing manual reviews just to make sure nothing risky slips into production.
So I started asking myself, what if AI could actually help you ship safer code, not just more of it?
That’s why I built Gammacode. It’s an AI code intelligence platform that scans your repos for vulnerabilities, bugs and tech debt, then automatically fixes them in secure sandboxes or through GitHub actions.
You can use it from the web or your terminal to generate, audit and ship production-ready code faster, without trading off security.
I built it for developers, startups and small teams who want to move quickly but still sleep at night knowing their code is clean.
Unlike most AI coding tools, Gammacode doesn’t store or train on your code, and everything runs locally. You can even plug in whatever model you prefer like Gemini, Claude or DeepSeek.
I am looking for feedback and feature suggestions. What’s the most frustrating or time-consuming part of keeping your code secure these days?
https://redd.it/1ok5z0a
@r_devops
Reddit
  
  From the devops community on Reddit
  Explore this post and more from the devops community
  Database design in CS capstone project - Is AWS RDS overkill over something like Supabase? Or will I learn more useful stuff in AWS?
Hello all! If this is the wrong place, or there's a better place to ask it, please let me know.
So I'm working on a Computer Science capstone project. We're building a chess.com competitor application for iOS and Android using React Native as the frontend.
I'm in charge of Database design and management, and I'm trying to figure out what tool architecture we should use. I'm relatively new to this world so I'm trying to figure it out, but it's hard to find good info and I'd rather ask specifically.
Right now I'm between AWS RDS, and Supabase for managing my Postgres database. Are these both good options for our prototype? Are both relatively simple to implement into React Native, potentially with an API built in Go? It won't be handling too much data, just small for a prototype.
But, the reason I may want to go with RDS is specifically to learn more about cloud-based database management, APIs, firewalls, network security, etc... Will I learn more about all of this working in AWS RDS over Supabase, and is knowing AWS useful for the industry?
  
Thank you for any help!
https://redd.it/1okfoz3
@r_devops
  
  Hello all! If this is the wrong place, or there's a better place to ask it, please let me know.
So I'm working on a Computer Science capstone project. We're building a chess.com competitor application for iOS and Android using React Native as the frontend.
I'm in charge of Database design and management, and I'm trying to figure out what tool architecture we should use. I'm relatively new to this world so I'm trying to figure it out, but it's hard to find good info and I'd rather ask specifically.
Right now I'm between AWS RDS, and Supabase for managing my Postgres database. Are these both good options for our prototype? Are both relatively simple to implement into React Native, potentially with an API built in Go? It won't be handling too much data, just small for a prototype.
But, the reason I may want to go with RDS is specifically to learn more about cloud-based database management, APIs, firewalls, network security, etc... Will I learn more about all of this working in AWS RDS over Supabase, and is knowing AWS useful for the industry?
Thank you for any help!
https://redd.it/1okfoz3
@r_devops
Chess.com
  
  Chess.com - Play Chess Online - Free Games
  Play chess online for free on Chess.com with over 200 million members from around the world. Have fun playing with friends or challenging the computer!
  Understanding Terraform usage (w/Gitlab CI/CD)
So i'll preface by saying I work as an SDET who is learning Terraform the past couple of days. We are also moving our CI/CD pipeline to gitlab and aws for our provider (from azure/azure devops, in this case don't worry about the "why's" because it was a business decision made whether I agree with it or not unfortunately)
So with that being said when it comes to DevOps/Gitlab and AWS I have very little knowledge. I mean I understand devops basics and have created gitlab-ci.yml files for automated testing, but the "Devops" best practices and AWS especially I have very little knowledge.
Terraform has been something we are going to use to manage infrastructure. It took me a little bit to understand "how" it should be used, but I want to make sure my "plan" makes sense at a base level. Also FWIW our team used Pulumi before but we are switching to Terraform (to transfer to what everyone else is using which is Terraform)
So how I have it setup currently (and my understanding on best practices). Also fwiw this is for a .net/blazor app (for now as a demo) but most of our projects we are converting are going to be .NET based ones. Also for now we are hosting it on an Elastic beanstalk.
Anyways here's how I have it setup and what I see as a pipeline (That so far works)
Gitlab CI/CD (build/deploy) handles actually building the app and publishing it (as a deploy-<version>.zip file.
The Deploy job does the actual copying of the .zip to S3 bucket (via aws-cli docker image) AS well as updating the elastic environment.
Terraform plan job runs every time and copys the tfplan to an artifact
Terraform apply actually makes the changes based off the tfplan (But is a manual job)
the terraform.tfstate is stored in s3 (with DynamoDB locking) as the "Source of truth".
So far this is working as a base level. but I still have a few questions in general:
Is there any reason Terraform should handle app deploy (to beanstalk) and deploy.zip copying to S3. I know it "can" but it sounds like it shouldn't be (Sort of a separation of concerns problem)
It seems like once set up terraform tfplan "apply" really shouldn't be running that often right?
Seems for "first time setup" it makes more sense to set it up manually on AWS and then import it (the state file). Others suggested setting up the .tf resource files first (but this seems like it would be a headache with all the configurations
Seems like really terraform should be mainly used to keep "resources" the same without drift.
This is probably irrelevant, but a lot of the team is used to Azure devops pipeline.yml files and thinks it'll be easy to copy-paste but I told them due to how gitlab works a lot is going to need to be re-written. is this accurate?
I know other teams use helm charts, but thats for K8's right?, for ECS. It's been said that ECS is faster/cheaper but beanstalk is "simpler" for apps that don't need a bunch of quick pod increases/etc...
Anyways sorry for the wall of text. I'm also open for hearing any advice too.
https://redd.it/1okfopx
@r_devops
  
  So i'll preface by saying I work as an SDET who is learning Terraform the past couple of days. We are also moving our CI/CD pipeline to gitlab and aws for our provider (from azure/azure devops, in this case don't worry about the "why's" because it was a business decision made whether I agree with it or not unfortunately)
So with that being said when it comes to DevOps/Gitlab and AWS I have very little knowledge. I mean I understand devops basics and have created gitlab-ci.yml files for automated testing, but the "Devops" best practices and AWS especially I have very little knowledge.
Terraform has been something we are going to use to manage infrastructure. It took me a little bit to understand "how" it should be used, but I want to make sure my "plan" makes sense at a base level. Also FWIW our team used Pulumi before but we are switching to Terraform (to transfer to what everyone else is using which is Terraform)
So how I have it setup currently (and my understanding on best practices). Also fwiw this is for a .net/blazor app (for now as a demo) but most of our projects we are converting are going to be .NET based ones. Also for now we are hosting it on an Elastic beanstalk.
Anyways here's how I have it setup and what I see as a pipeline (That so far works)
Gitlab CI/CD (build/deploy) handles actually building the app and publishing it (as a deploy-<version>.zip file.
The Deploy job does the actual copying of the .zip to S3 bucket (via aws-cli docker image) AS well as updating the elastic environment.
Terraform plan job runs every time and copys the tfplan to an artifact
Terraform apply actually makes the changes based off the tfplan (But is a manual job)
the terraform.tfstate is stored in s3 (with DynamoDB locking) as the "Source of truth".
So far this is working as a base level. but I still have a few questions in general:
Is there any reason Terraform should handle app deploy (to beanstalk) and deploy.zip copying to S3. I know it "can" but it sounds like it shouldn't be (Sort of a separation of concerns problem)
It seems like once set up terraform tfplan "apply" really shouldn't be running that often right?
Seems for "first time setup" it makes more sense to set it up manually on AWS and then import it (the state file). Others suggested setting up the .tf resource files first (but this seems like it would be a headache with all the configurations
Seems like really terraform should be mainly used to keep "resources" the same without drift.
This is probably irrelevant, but a lot of the team is used to Azure devops pipeline.yml files and thinks it'll be easy to copy-paste but I told them due to how gitlab works a lot is going to need to be re-written. is this accurate?
I know other teams use helm charts, but thats for K8's right?, for ECS. It's been said that ECS is faster/cheaper but beanstalk is "simpler" for apps that don't need a bunch of quick pod increases/etc...
Anyways sorry for the wall of text. I'm also open for hearing any advice too.
https://redd.it/1okfopx
@r_devops
Reddit
  
  From the devops community on Reddit
  Explore this post and more from the devops community
  Is “EnvSecOps” a thing?
Been a while folks... long-time lurker — also engineer / architect / DevOps / whatever we’re calling ourselves this week.
I’ve racked physical servers, written plenty of code, automated all the things, and (like everyone else lately) built a few LLM agents on the side — because that’s the modern-day “todo app,” isn’t it? I’ve collected dotfiles, custom zsh prompts, fzf scripts, shell aliases, and eventually moved most of that mess into devcontainers.
They’ve become one of my favorite building blocks, and honestly they’re wildly undersold in the ops world. (Don’t get me started on Jupyter notebooks... squirrel!) They make a great foundation for standardized stacks and keep all those wriggly little ops scripts from sprawling into fifteen different versions across a team. Remember when Terraform wasn’t backwards compatible with state? Joy.
Recently I was brushing up for the AWS Security cert (which, honestly, barely scratches real-world security... SASL what? Sigstore who?), and during one of the practice tests something clicked out of nowhere. Something I’ve been trying to scratch for years suddenly felt reachable.
I don’t want zero trust — I want zero drift. From laptop to prod.
Everything we do depends on where it runs. Same tooling, same policies, same runtime assumptions. If your laptop can deploy to prod, that laptop is prod.
So I’m here asking for guidance or abuse... actually both, from the infinite wisdom of the r/devops trenches. I’m calling it “EnvSecOps.” Change my mind.
But in all seriousness, I can’t unsee it now. We scan containers, lock down pipelines, version our infrastructure... but the developer environment itself is still treated like a disposable snowflake. Why? Why can’t the same container that’s used to develop a service also build it, deploy it, run it, and support it in production? Wouldn’t that also make a perfect sandbox for automation or agents — without giving them full reign over your laptop or prod?
Feels like we’ve got all the tooling in the world, just nothing tying it all together. But I think we actually can. A few hashes here, a little provenance there, a sprinkle of attestations… some layered, composable, declarative, and verified tooling. Now I’ve got a verified, maybe even signed environment.
No signature? No soup for you.
(No creds, either.)
Yes, I know it’s not that simple. But all elegant solutions seem simple in hindsight.
Lots of thoughts here. Reign me in. Roast me. Work with me. But I feel naked and exposed now that I’ve seen the light.
And yeah, I ran this past GPT.
It agreed a little too quickly — which makes me even more suspicious. But it fixed all my punctuation and typos, so here we are.
Am I off, or did I just invent the next buzzword we’re all gonna hate?
https://redd.it/1okm1ih
@r_devops
  
  Been a while folks... long-time lurker — also engineer / architect / DevOps / whatever we’re calling ourselves this week.
I’ve racked physical servers, written plenty of code, automated all the things, and (like everyone else lately) built a few LLM agents on the side — because that’s the modern-day “todo app,” isn’t it? I’ve collected dotfiles, custom zsh prompts, fzf scripts, shell aliases, and eventually moved most of that mess into devcontainers.
They’ve become one of my favorite building blocks, and honestly they’re wildly undersold in the ops world. (Don’t get me started on Jupyter notebooks... squirrel!) They make a great foundation for standardized stacks and keep all those wriggly little ops scripts from sprawling into fifteen different versions across a team. Remember when Terraform wasn’t backwards compatible with state? Joy.
Recently I was brushing up for the AWS Security cert (which, honestly, barely scratches real-world security... SASL what? Sigstore who?), and during one of the practice tests something clicked out of nowhere. Something I’ve been trying to scratch for years suddenly felt reachable.
I don’t want zero trust — I want zero drift. From laptop to prod.
Everything we do depends on where it runs. Same tooling, same policies, same runtime assumptions. If your laptop can deploy to prod, that laptop is prod.
So I’m here asking for guidance or abuse... actually both, from the infinite wisdom of the r/devops trenches. I’m calling it “EnvSecOps.” Change my mind.
But in all seriousness, I can’t unsee it now. We scan containers, lock down pipelines, version our infrastructure... but the developer environment itself is still treated like a disposable snowflake. Why? Why can’t the same container that’s used to develop a service also build it, deploy it, run it, and support it in production? Wouldn’t that also make a perfect sandbox for automation or agents — without giving them full reign over your laptop or prod?
Feels like we’ve got all the tooling in the world, just nothing tying it all together. But I think we actually can. A few hashes here, a little provenance there, a sprinkle of attestations… some layered, composable, declarative, and verified tooling. Now I’ve got a verified, maybe even signed environment.
No signature? No soup for you.
(No creds, either.)
Yes, I know it’s not that simple. But all elegant solutions seem simple in hindsight.
Lots of thoughts here. Reign me in. Roast me. Work with me. But I feel naked and exposed now that I’ve seen the light.
And yeah, I ran this past GPT.
It agreed a little too quickly — which makes me even more suspicious. But it fixed all my punctuation and typos, so here we are.
Am I off, or did I just invent the next buzzword we’re all gonna hate?
https://redd.it/1okm1ih
@r_devops
Reddit
  
  From the devops community on Reddit
  Explore this post and more from the devops community
  "terraform template" similar to "helm template"
I use
I wish there were a similar tool for Terraform modules so that I could run like
I tried building it myself, but my skills aren't enough for the task.
Does anyone else think this would be a great idea?
https://redd.it/1okn28g
@r_devops
  
  I use
helm template to pre-render all my manifests, and it works beautifully for PR reviews.I wish there were a similar tool for Terraform modules so that I could run like
terraform template, and it would output the raw HCL resources instead of the one-line git diff that could potentially trigger hundreds of resources during terraform plan.I tried building it myself, but my skills aren't enough for the task.
Does anyone else think this would be a great idea?
https://redd.it/1okn28g
@r_devops
Reddit
  
  From the devops community on Reddit
  Explore this post and more from the devops community
  DoubleClickjacking: Modern UI Redressing Attacks Explained
https://instatunnel.my/blog/doubleclickjacking-modern-ui-redressing-attacks-explained
https://redd.it/1okleal
@r_devops
  
  https://instatunnel.my/blog/doubleclickjacking-modern-ui-redressing-attacks-explained
https://redd.it/1okleal
@r_devops
InstaTunnel
  
  Clickjacking 2.0: UI Redressing Attacks in SPAs (2025)
  Discover advanced clickjacking techniques in 2025: DoubleClickjacking bypasses all defenses, drag-and-drop file theft, and SPA vulnerabilities. Learn protection
  What’s that one cloud mistake that still haunts your budget? [Halloween spl]
A while back, I asked the Reddit community to share some of their worst cloud cost horror stories, and you guys did not disappoint.
For Halloween, I thought I’d bring back a few of the most haunting ones:
* There was one where a DDoS attack quietly racked up $450K in egress charges overnight.
* Another where a BigQuery script ran on dev Friday night and by Saturday morning, €1M was gone.
* And one where a Lambda retry loop spiraled out of control that turned $0.12/day into $400/day before anyone noticed.
The scary part is obviously that these aren’t at all rare. They happen all the time and are hidden behind dashboards, forgotten tags, or that one “testing” account nobody checks.
Check out the full list here: [https://amnic.com/blogs/cloud-cost-horror-stories](https://amnic.com/blogs/cloud-cost-horror-stories?utm_source=chatgpt.com)
And if you’ve got your own such story, drop it below. I’m so gonna make a part 2 of these stories!!
https://redd.it/1okps2j
@r_devops
  
  A while back, I asked the Reddit community to share some of their worst cloud cost horror stories, and you guys did not disappoint.
For Halloween, I thought I’d bring back a few of the most haunting ones:
* There was one where a DDoS attack quietly racked up $450K in egress charges overnight.
* Another where a BigQuery script ran on dev Friday night and by Saturday morning, €1M was gone.
* And one where a Lambda retry loop spiraled out of control that turned $0.12/day into $400/day before anyone noticed.
The scary part is obviously that these aren’t at all rare. They happen all the time and are hidden behind dashboards, forgotten tags, or that one “testing” account nobody checks.
Check out the full list here: [https://amnic.com/blogs/cloud-cost-horror-stories](https://amnic.com/blogs/cloud-cost-horror-stories?utm_source=chatgpt.com)
And if you’ve got your own such story, drop it below. I’m so gonna make a part 2 of these stories!!
https://redd.it/1okps2j
@r_devops
Amnic
  
  24 Cloud Cost Horror Stories Redditors Shared That’ll Keep You Up at Night | Amnic
  Explore 24 real cloud cost horror stories shared by Redditors that includes runaway autoscaling, forgotten logs that racked up six-figure bills. A must-read for anyone managing cloud costs.
  GlueKube: Kubernetes integration test with ansible and molecule
https://medium.com/@GlueOps/gluekube-kubernetes-integration-test-with-molecule-f88da7c41a34
https://redd.it/1okp4bi
@r_devops
  
  https://medium.com/@GlueOps/gluekube-kubernetes-integration-test-with-molecule-f88da7c41a34
https://redd.it/1okp4bi
@r_devops
Medium
  
  GlueKube: Kubernetes integration test with molecule
  At GlueOps, we have been working on an internal tool to deploy and manage Kubernetes clusters across cloud providers and datacenters…
  