How Selenium Actually Works

Steve Pryde
5 min readApr 25, 2022

--

Selenium is a suite of tools for automating web browsers and it has been around for a long time. Perhaps you have even used it yourself. But do you really understand how Selenium actually works under the hood?

Understanding how Selenium works will improve your Web Browser automation skills and give you greater insights into how you could improve test performance or better leverage your test automation system resources.

Selenium Basics

In order for Selenium to talk to a Web Browser it requires another application to be running in the background — the WebDriver server. This is a small HTTP server that talks directly to the Web Browser, and it is typically developed and maintained by the same company that develops the Web Browser itself (Google provides chromedriver, Firefox provides geckodriver, and so on).

As long as the WebDriver server is running it can start a new Web Browser instance and automate it directly. The WebDriver server also provides a REST API that a Selenium client (your test application) can issue commands to, in order to manage a Web Browser “session”.

The version of the WebDriver server must match the version of the Web Browser it controls. When you download chromedriver for example, it will tell you which Chrome version(s) a particular version of chromedriver will support.

Minimal Setup: WebDriver + Browser

Something many people don’t realise is that you don’t actually need to download a Selenium server at all in order to do local development. You can just use the WebDriver server directly. For example if you start geckodriver with no arguments, it will default to running on localhost at port 4444. Just point your Selenium setup code at http://localhost:4444 and you can run your tests or scripts directly against Firefox.

So in the simplest scenario, you have your application which uses the Selenium client library (e.g. webdriver.io), and that will issue HTTP requests to the WebDriver server, and that in turn will control a Web Browser session.

But there’s more. Since the WebDriver is just a web server with an API, it doesn’t need to run on the same machine as your application. It will spawn a new Web Browser instance on its local machine but your application could run elsewhere, as long as it has network access to the WebDriver server. This is essentially how cloud services like BrowserStack and SauceLabs work.

What Does Selenium Server Do? Do I Need It?

If you run geckodriver directly, it can only control a single Firefox session. If you attempt to start another Firefox session in your application, it will return an error saying that a session has already been started.

NOTE: chromedriver does allow multiple sessions on one chromedriver instance, so if you only need Chrome, you may be able to get away with using chromedriver directly.

If you wanted, you could start a second instance of geckodriver, running on, say, port 4445. And then in your application you could create another Firefox session pointing at that port, and so on.

Or, if you’re clever, you could create a proxy server that can start new geckodriver instances for each session and then route requests for those sessions to the specific geckodriver instance that corresponds to it.

And that is precisely what Selenium Server does (with some additional features on top).

Selenium Server can do four primary things:

  1. Start and manage WebDriver server instances (it will choose a different port for each one)
  2. Forward requests for a WebDriver session to the appropriate WebDriver server instance
  3. Forward requests to another Selenium Server (which might reside on another machine)
  4. Provide information about current sessions and available Web Browsers

Selenium Server Variants

Selenium 3.x required you to specify whether a server was running as a hub or as a node.

  • A hub could accept incoming HTTP requests and forward requests to nodes but could not start its own WebDriver instances
  • A node could start WebDriver instances but could not forward to other nodes and could not accept HTTP requests directly
  • You could also run a “standalone” server which acted as both a hub and a node but could not forward requests to other nodes

In Selenium 4, all servers can act as either a hub or a node or both, which simplifies everything and makes it more composable and thus more powerful. A Selenium 4.x server can accept HTTP requests and either handle them directly or forward them to another Selenium 4.x server. Easy!

You can think of Selenium server as a proxy server that simply manages a pool of WebDriver server instances and forwards requests for a particular session to the corresponding WebDriver server that owns that session. Selenium server can also act as a proxy for other Selenium servers, which means it knows which WebDriver servers are provided by that other Selenium server, and can forward requests there if appropriate. This gives you far greater flexibility in managing load across your test machines as the number of WebDriver server instances increases.

Selenium server can also support multiple different Web Browser versions, each one with its own WebDriver server instance. It all comes down to the configuration you provide. You may need to specify the full path to each WebDriver server in the configuration, along with the Web Browser version it corresponds to.

What Happens When You Do A findElement() Call?

When your application uses the Selenium client library (i.e. WebDriver.io) to perform some command in the Web Browser (find an element, click an element, run Javascript, etc.), the Selenium library will issue a HTTP request to the WebDriver REST server and wait for the response. For an element click the response might be discarded, but for something like finding an element, the response will contain details about the element that has been located, or an appropriate error code if none was found. The Selenium library will parse the response for you and return a new WebElement object, allowing you to interact with that element. The WebElement object typically just contains a unique identifier and a handle to the underlying WebDriver session id.

The WebDriver Specification

If you’re looking for documentation for the WebDriver server’s REST API, see the W3C WebDriver Specification.

You can find it at https://www.w3.org/TR/webdriver1/

Meanwhile, chromedriver (and recently geckodriver) also support a newer API called Chrome Devtools Protocol (CDP), which is what provides some of the additional features of Selenium 4.x and also other frameworks like Cypress and Puppeteer. Yes, those other frameworks are using exactly the same HTTP requests under the hood that you can do with Selenium! They just add some layers of abstraction to make it easier to use, often hiding the details from you.

You can find the Chrome Devtools Protocol documentation here: https://chromedevtools.github.io/devtools-protocol/

Selenium Provides The Tools

The Selenium project provides tools for automating Web Browsers. What you do with those tools is up to you. The most common use case is Test Automation for web applications, but you can use it to automate virtually any task that can be performed using a Web Browser. You can also use it to automate apps built using Electron, and also any WebView in mobile applications.

The primary tool you use will be the Selenium client library for your programming language of choice.

If you only need a single Browser instance running locally you can run an instance of chromedriver or geckodriver locally.

If you need to run a more complicated setup, for example in production Test Automation use-cases, take a look at Selenium Server (also called Selenium Grid)

If you’re familiar with Docker, you can also easily run a Selenium server complete with bundled browsers using docker-compose.

I hope you found this article helpful. Thanks for reading :)

--

--

Steve Pryde
Steve Pryde

Written by Steve Pryde

I’m a Software Engineer and the creator of the Rust crate “thirtyfour”, a batteries-included Selenium client for Rust.

No responses yet