Get to Know Aembit and Workload IAM: Join Our Thursday Webinar!

RSAC™ Innovation Sandbox FINALIST 2024 banner
Aembit is an RSA Conference Innovation Sandbox finalist! Read the news
Blog

Ways to Steer Network Traffic to a Proxy

The idea of traffic steering is simple. Rather than sending it directly to its destination, we want to first send it to a proxy to inspect and modify it.
Ways to Steer Network Traffic to a Proxy header image with boat steering wheel

Brief Intro

When we started Aembit not so long ago, we were still figuring out what Aembit was about and what we needed to build. By the way, we are still doing that, but now with more certainty.

Something that became apparent quite early was that we needed a Layer 7 (Application level) proxy as part of our solution. There are many fascinating details on building a multi-protocol L7 proxy, but I will leave them for another article.

In this article, I want to concentrate on my research back then. I was trying to figure out how to send (“steer”) network traffic toward our future proxy.

Terminology

quotation-mark-icon

There are only two hard things in Computer Science: cache invalidation and naming things.

– Phil Karlton

I stumbled on several terms describing redirecting traffic somewhere other than its original destination). The terms I hear most are traffic steering, intercepting traffic, traffic hijacking, redirecting traffic, transparent proxying, and probably a few others (which I can’t remember).

Steering

The basic idea of steering is simple. Traffic, by default, goes to some destination, but we (using some machinery) force it to go to a proxy where we can inspect, modify and forward it.

I tried to identify various ways to steer traffic. I wanted to see what methods were available out “in the wild” and how they all worked:

  • How many other organizations or teams adopted this method?
  • Is the method transparent (apps don’t need to be aware of it) or explicit (apps do need to be aware)?
  • What permissions are required?
  • How complex is it to use and support it?
  • How selective could you be about what traffic you would steer?

There were a couple of guard rails on my research. First, I was looking for something that works in Kubernetes-land but preferably could be applicable beyond it, for example, in virtual machine (VM) scenarios. Second, I understand networking since I spent several years in this field, but I am not an expert, which most likely eliminates any esoteric options.

Steering methods

(Linux) Networking is a vast and incredibly complex ecosystem. I will go through the list of methods I discovered, but I am sure there are many other ways I didn’t identify. However, as I researched this topic and added options to the list, I quickly reached the point of diminishing returns.

IPTables

I think you have to talk about it first. It’s the king of packet filtering and NAT functionality. The most familiar feature of IPTables is firewall functionality. However, it has many other features, including network address translation (NAT). The beauty of it is that you can modify the destination address of IP packets (steering them to the destination you want).

I won’t try to cover everything that IPTable does or do an intro. It can easily take 100 pages, so if you want to dive into it, google it, and you’ll find lots of material. However, I will share some notes on IPTables relevant to steering.

How can you use it for traffic steering?

In short, IPTables allows you to manipulate IP packets and change their destination IP address and port from the original to your proxy’s IP address and port to deliver all network traffic to it.

Permissions

You will need Linux capabilities CAP_NET_ADMIN and CAP_NET_RAW to configure IPTables. This permission set is the minimum you need, but you have these capabilities if you have root privileges.

Note that it will translate to NET_ADMIN and NET_RAW in the container world.

Who uses it?

The list could be incredibly long. Considering that some parts of our product were similar to service-mesh products, I looked at who uses IPTables across this segment and found IstioConsulKuma, and Linkerd.

Other info

This method is transparent, so the client application doesn’t need to be aware of steering. Understanding IPTables may take a bit of time. However, the machinery for the steering is reasonably straightforward, and you may end up just doing 3–4 calls to set up IPTables in a specific way.

There is a newer and cleaner implementation of the same functionality called NFTables. And when I say “newer,” I mean, it’s only eight years old. However, people have been slow to migrate from IPTables to NFTables. Interestingly, the IPTables tool you might use may already utilize NFTables under the hood (again, it’s outside this article’s scope).

One last bit of helpful information One of the super cool features of IPTables, is that, when it redirects packets, you can get the original destination reading SO_ORIGINAL_DST option on a socket.

IPTables allows you to be quite selective with the traffic you want to steer toward your proxy.

TUN/TAP

TUN/TAP is another one of the heavy hitters in this area. Think about VPNs. Pretty much all of them are based on TUN/TAP kernel virtual devices.

How can you use it for traffic steering?

TUN/TAP allows your application to read traffic (either Layer 3 or Layer 2) from this virtual device. And with your application, you can do whatever you want with this traffic. You can look inside it, do NAT’ing, encrypt, send it anywhere, and so on.

Let me concentrate on TUN since I know a bit more about it. TUN will have its own network, and the traffic you will get in your application is the only traffic destined for this network. As a result, you won’t get all traffic just by using TUN. However, you have to direct traffic toward this network somehow. There are a couple of standard approaches (for example, you can set the default gateway to be on the TUN network or fiddle with DNS to resolve hosts to that network address). Both approaches will direct the traffic toward TUN, so your application can access it.

As a result, when a client application tries to send some traffic, this traffic is routed to the default gateway, which is on the TUN network. Your application will read it from TUN virtual device, and now you can do whatever you want with it (for example, put it through a proxy). 

The only complexity that arises is that your app will see the traffic on L2 or L3 (meaning that the data will include IP + TCP/UDP headers for L3 or Ethernet Frame for L2).

Permissions

You will need Linux capabilities CAP_NET_ADMIN (like in IPTables)

Other info

As I mentioned above, many end-user VPNs (and enterprise zero-trust products) are TUN based (to name a few: OpenVPNWireGuardNetskope Private Access). Frankly, I am too unfamiliar with TAP to give you good examples.

Both TUN/TAP are well documented. However, in both cases, you will need to have your application get the traffic and deal with it on Layer 3 or Layer 2 to do whatever manipulation you want. There are a bunch of open-source projects that help you with that. However, the amount of code and knowledge to use it is non-trivial.

How you route the traffic to TUN depends on your approach. However, you cannot be selective with the default gateway – all traffic will go through your app.

eBPF

eBPF is a newcomer to the traffic steering game. On the one hand, it was introduced quite a long time ago (in 2015). On the other hand, it still feels new compared to IPTables and TUN/TAP, which have been around for more than 20 years.

In short, eBPF allows you to execute some of your code on kernel hooks, including many socket-related ones. These kernel hooks will enable you to inspect/modify/redirect traffic. That said, eBPF goes beyond networking; you can run your code on many hooks unrelated to networking.

How can you use it for traffic steering?

You will need an eBPF application, which will do some equivalent of what IPTable does: rewrite IP and TCP headers to redirect packets toward your proxy.

Permissions

Unless unprivileged eBPF is allowed (per my understanding, a terrible idea), your process will need to be “root” to do eBPF.

Other info

There are quite a lot of newer companies and projects (CiliumCalicoWeave) that are leveraging eBPF for networking.

Unfortunately, I only read about eBPF vs. experimenting with it. So, my knowledge in this area is limited. 

I saw some quite big debates on the limits of eBPF, for example, the pros and cons of its application to service mesh. Based on what I read, using eBPF for simple steering could be severe overkill considering that simpler technologies solve that problem. On the other hand, it’s limited enough that it may not work for your use case, e.g., you probably can’t implement a full-blown L7 proxy with it. So, it sits somewhere between when you want to do something more complex than NAT’ing but less complex than an L7 proxy.

I believe eBPF shines in performance. It eliminates jumps between kernel and user space, allowing it to boost performance. So, if you need to squeeze out the last bit of performance, this may be useful.

TProxy + IPTables

TProxy with IPTables is a permutation of the IPTables approach. The summary is that you can mark all packets as local and create a socket with a special option (in your proxy application) that will receive all such packets, even though these packets didn’t have this IP and port as a destination. It’s a bit complicated for my taste. It took me several tries until I understood what was going on. Here is a good overview if you’re interested.

Permissions

You will need Linux capabilities CAP_NET_ADMIN (like in IPTables)

Other info

Per my understanding, the beauty of it is that it doesn’t need to rewrite IP and TCP headers. As a result, it may be faster than other IPTables-based approaches.

I read that HAProxy and NGINX both support this option. However, I haven’t stumbled on references that the newer wave of networking software uses it. I am not sure why that’s the case. Mostly the references I stumbled upon were people geeking out on this topic rather than using this approach for production scenarios. However, again, this is based on relatively lightweight research.

The socket option which I mentioned is IP_TRANSPARENT. Note that I included these bits of info as breadcrumbs. It’s a great way to google any serious references or examples based on these key flags. 

Other approaches

It makes sense to mention several other approaches, but fewer products rely on these.

DNS

On the one hand, if you control DNS resolution, you can direct the application to send traffic toward your proxy rather than the original destination. On the other hand, it comes with caveats since it can steer toward IP but can’t do anything about specifying the port.

Usually, DNS by itself isn’t a suitable steering method. However, it’s incredible how often it pops up as a helper to other steering methods to achieve additional benefits.

http_proxy, https_proxy, ftp_proxy environment variables

Several environment variables are almost standard and supported by numerous HTTP libraries. Thus, they are also supported by products that depend on these libraries.

The nice part about using them is that they’re trivial to set and don’t require special permissions. However, it’s pretty limited because it works only for HTTP/HTTPS/FTP and only for applications that support that.

Intercepting API calls

There are ways to intercept API calls (including socket calls). One of the well-known methods is using LD_PRELOAD. You can call this a predecessor of eBPF.

This method has tons of hair, including the security and risk implications of your code running in the third-party process space.

Changes to the code or configuration

Last but not least. There is a very low-tech approach to steering. You can ask the client application (nicely) to send you traffic by either changing some configurations or code. It works for client applications under your control, where you can change configuration or code, but it doesn’t work in other cases.

Summary

As you can see, there are many different approaches to steering network traffic—each with its own attributes. Therefore, there is no “perfect” way to solve this problem, and your requirements may lead you to one solution or another.

You probably will be puzzled a bit by our initial choice. The way we initially went was the usage of http_proxy and changing the client application configuration path. The main reason was that it was so technologically trivial. It worked immediately and didn’t require any additional time investment. As a result, It allowed us to concentrate all our effort on building the functionality we needed instead of investing time into developing a deeper understanding, implementing, and packaging more complicated approaches.

That being said, as we got to a decent footing with our proxy, we switched gears into using IPTables to do steering (and we will post another article later on with more details about that).

***

Aembit is the Identity Platform that lets DevOps and Security manage, enforce, and audit access between federated workloads. To learn more or schedule a demo, visit our website.

You might also like

If this definitive list doesn't convince you to pay us a visit, learn about Workload IAM, and meet the people behind the product, nothing will.
Snowflake shines in storage and analytics, yet your success hinges on adhering to security best practices, with workload IAM acting as a crucial ally.
This attestation method is designed for on-premises setups without the availability of AWS or Azure metadata services.