Senior Site Reliability Engineer

28k - 31k PLN/ mies.B2B
SeniorFull-time·B2B
#371594·Dodano dziś·1
Źródło: justjoin.it
Aplikuj teraz

Tech Stack / Keywords

GrafanaDockerTerraformKubernetesAnsibleDevOpsAzure

Firma i stanowisko

Hard Rock Digital is a team focused on becoming the best online sportsbook, casino, and social gaming company in the world. Rooted in the Seminole Tribe of Florida, it brings a globally recognized brand in gaming, entertainment, and hospitality to the digital space.

Wymagania

Core SRE & Infrastructure:

  • Degree in Computer Science or related field, or equivalent experience.
  • 5+ years in SRE, DevOps, or similar roles managing large-scale production systems.
  • 3+ years managing production Kubernetes clusters with deep architecture, networking, storage, and security knowledge.
  • Experience with cluster autoscaling (Karpenter), upgrades, and multi-cluster management.
  • Proficiency with kubectl, Helm, Kubernetes operators, and container orchestration troubleshooting.
  • Advanced expertise with Grafana observability stack and PromQL; experience with Loki.
  • Hands-on experience managing Java applications including JVM tuning.
  • Cloud platform expertise (AWS preferred; GCP or Azure valued).
  • Familiarity with Infrastructure as Code tools like Terraform/Terragrunt or Ansible.
  • ArgoCD proficiency for GitOps and continuous deployment.
  • Strong scripting skills in Python, Bash, or Go; experience with CI/CD pipelines.
  • Proven on-call, incident response, and root cause analysis experience.

AI, Automation & Agentic Systems:

  • 1+ years practical experience building or operating AI/LLM-powered tools or workflows.
  • Ability to design agentic systems using tool calling, RAG, or multi-step reasoning.
  • Experience integrating LLM APIs (Anthropic Claude, OpenAI, or open-source).
  • Familiarity with agentic orchestration frameworks (LangChain, LangGraph, CrewAI, n8n, Temporal).
  • Understanding of prompt engineering best practices.
  • Familiarity with AI-assisted coding tools and MCP servers.
  • Awareness of AI safety, hallucination mitigation, and human-in-the-loop design.

Preferred / Bonus:

  • Experience with vector databases for RAG knowledge retrieval.
  • Experience with LLM evaluation frameworks for monitoring agent quality.
  • Contributions to open-source AI/ML or SRE tooling.
  • Background in data engineering or ML pipelines.

Soft Skills:

  • Strong communication skills to translate complex AI and infrastructure concepts.
  • Proactive problem-solving with automation focus.
  • Ability to mentor junior team members.
  • Positive attitude and openness to feedback.

Obowiązki

Application Reliability & Performance:

  • Ensure availability, reliability, and performance of high-traffic Java-based applications.
  • Troubleshoot and resolve complex issues in production and non-production environments.
  • Participate in performance testing and monitoring.
  • Optimize Java application performance focusing on JVM tuning and scaling.

Monitoring, Observability & AIOps:

  • Deploy and manage Grafana stack for monitoring, logging, and alerting.
  • Implement observability strategies to enhance system health visibility.
  • Create and maintain dashboards, alerts, and log queries.
  • Integrate AI/ML models for anomaly detection and alert correlation.

AI & Agentic Workflow Engineering:

  • Design and operate AI workflows automating alert triage, root cause analysis, and incident summarization.
  • Develop LLM agents interacting with infrastructure APIs for autonomous or human-in-the-loop actions.
  • Build and maintain MCP servers exposing internal systems to AI agents.
  • Evaluate and operationalize LLM frameworks for production agentic systems.
  • Implement guardrails and feedback loops for AI agent accuracy and safety.
  • Promote AI-assisted development and operations practices.

Incident Management & Root Cause Analysis:

  • Support incident response, conduct post-mortems, and identify root causes.
  • Use AI tools to accelerate incident timelines and generate post-mortem drafts.
  • Document and share lessons learned.

Automation & Toil Reduction:

  • Identify repetitive workflows and engineer AI-augmented or automated replacements.
  • Build self-service tools and chatbot interfaces for system queries and procedures.
  • Measure and report toil reduction metrics.

Collaboration & Cross-functional Support:

  • Work with developers, architects, and data/ML engineers to improve reliability and AI capabilities.
  • Collaborate with DevOps and NOC teams.
  • Communicate SRE practices and AI capabilities to stakeholders.
  • Provide feedback on application performance and observability.

Inne informacje

Please be informed that the data controller is Hard Rock Digital (hereinafter "controller"). You have the right to request access to your personal data, their rectification, erasure or restriction of processing, the right to object to processing, as well as the right to data portability and to lodge a complaint to the supervisory authority. Personal data will be processed for the purpose of the recruitment process. Provision of data to the extent resulting from the Act of 26 June 1974 Labour Code is mandatory. In the remaining scope, providing data is voluntary. Refusal to provide mandatory data may result in the impossibility to carry out the recruitment process. The Administrator processes mandatory data on the basis of a legal obligation incumbent upon him/her, while with regard to additional data, the basis for processing is consent. Personal data will be processed until the recruitment procedure is completed and for the period of the possibility of asserting potential claims, and in the case of consent to participate in future recruitment procedures - until the withdrawal of such consent. Consent to the processing of personal data can be withdrawn at any time. The recipient of the data is the Just Join IT service and other entities to whom we have entrusted the processing of data in connection with recruitment.

Hard Rock Digital

Hard Rock Digital

3 aktywne oferty

Zobacz wszystkie oferty
Aplikuj teraz