Manage Temporary Access to Kafka Resources

This article introduces kPow for Apache Kafka®’s new Temporary Policies feature.

Introducing Temporary Policies

New to the kPow Kafka Management and Monitoring toolkit in v79 is the ability to Stage Mutations, create Temporary Role Based Access Control Policies (temporary policies), and a suite of new admin features giving greater control over kPow to Admin Users.

This blog post introduces temporary policies through the lense of a common real-world scenario.

Temporary policies allow Admins the ability to assign access control policies for a fixed duration. A common use-case would be providing a user TOPIC_INSPECT access to read data from a topic for an hour while resolving an issue in a Production environment.

Scenario

You wake up one morning to a dreaded sight: a poison message has taken down one of your services.  

Your team decides the simplest solution is to skip the message by incrementing your consumer group’s offset for the topic.

Now here’s the problem. Access to production is limited, and for such a simple action (incrementing the offset), a team member generally must jump through the hoops of configuring the VPN, connecting to the jumpbox, and making sure they execute the right combination of bash commands against the Kafka cluster.

Often these operations are unnecessarily time-consuming, brittle, and frustrating in a time-critical moment when you need to restore production access. Furthermore, the jumpbox generally has full access to the Kafka cluster, and there is no audit log recording the actions being committed.

In combination with kPow’s existing Role-Based Access Controls and powerful mutation actions, Temporary Policies improve this experience by giving teams the tools they need to easily effect change in a secured environment, like production, when things go wrong.

Configuring Role-Based Access Control

In this example, two roles are coming from our Identity provider: devs and owners.

We will assign anyone with the role owners admin access, and give them GROUP_EDIT access to the production cluster.

The devs role will be implicitly denied from undertaking any action against the cluster, but are authorized for read-only access to view the production cluster in kPow.

Our example RBAC yaml file might look something like:

admin_roles:
  - "owners"

authorized_roles:
  - "owners"
  - "devs"

policies:
  -
    actions:
      - GROUP_EDIT
    effect: Allow
    resource:
      - "*"
    role: "owners"

This configuration prevents regular developers from making changes against the production cluster.

The Poison Pill

Today is the day when your team has to fix the consumer group on the production cluster.

Everyone has been briefed on the plan, and it has been decided that the team lead will temporarily grant the devs role Allow access for GROUP_EDIT. This will enable one of the developers on the team to make the required change to the production cluster.

This has been done through the Temporary Policies section of kPow’s settings UI:

Temporary policies UI in kPow

Once a temporary policy has been created, all team members will be notified via Slack: :

Slack notifications

Incrementing the offset

A team member has been tasked with the job of incrementing the offset of the consumer group for the problematic topic.

The developer looks to the application logs and notices that it is partition 3 of topic tx_trade1 that contains the poison message.

The erroring consumer group is named trade_b2.

The developer then opens kPow, navigates to the “Workflows” tab, and selects the consumer group.

From within the consumer group view, the dev clicks on the partition and selects “Skip Offset”.

This action will schedule the mutation, and once someone on the team scales down the trade_b2 service, the offset will be incremented.

Skipping group offsets in kPow

Post-Mortem

kPow also provides valuable information and insights for teams to use after a production incident when you are completing your incident post-mortem.

kPow has an Audit Log for Data Governance, and all the actions undertaken to resolve any production incident are persisted in kPow’s audit log topic. Meaning you can use the Audit Log to see the recorded history of all actions taken to restore the production service.

kPow's audit log

Inspecting the audit log message reveals the offset that was skipped.

Audit log message

You can use kPow’s data inspect functionality to view the poison message to help investigate why that message took down the consumer group.

kPow's data inspect functionality

You can find further information on setting up, viewing and managing temporary policies here

Further reading/references

Explore our documentation to learn more about the kPow’s features mentioned in this article:

You might also be interested in the following articles:


Enjoy this article?

Sign-up to the mailing list for operatr.io news, product updates and Kafka insights.



Manage, Monitor and Learn Apache Kafka with kPow by Operatr.IO

We know how easy Apache Kafka® can be with the right tools. We built kPow to make the developer experience with Kafka simple and enjoyable, and to save businesses time and money while growing their Kafka expertise. A single Docker container or JAR file that installs in minutes, kPow’s unique Kafka UI gives you instant visibility of your clusters and immediate access to your data.

kPow is compatible with Apache Kafka+1.0, Red Hat AMQ Streams, Amazon MSK, Instaclustr, Aiven, Vectorized, Azure Event Hubs, Confluent Platform, and Confluent Cloud.

Start with a free 30-day trial and solve your Kafka issues within minutes.