What? Why?
It may sound crazy at first, since most of the Rails applications we tend to work with connect to just one database. But, it actually isn't that unusual to have a single app communicate with more than one database. Consider the following use cases:
- You have a legacy system connecting to a legacy database. You spin up a shiny new Rails 5 app and incorporate all the new hotness, but still need to query and incorporate data from your legacy system. Enter...the second database.
- Your enterprise system relies on a workhorse of a database with hundreds of thousands of intergalactic prisoners I mean records. Certain queries that your end-user makes are extremely slow and negatively impact user experience. You decide to create a data warehouse––a brand new DB in which you aggregate and index data from various database tables in your original database in order to increase query times. Your brand new single-purpose data warehouse app needs to be able to connect to and grab records from the original DB in order to populate them in your shiny new warehouse db.
Now that we see it is not at all unusual to design a Rails application that can manage connections to multiple databases, let's take a look at how we can implement such a design in an efficient manner.
How?
As a first-time multiple database user, my basic understanding of how Rails + Active Record manages the interactions with a single database went something like this:
- Rails gives us a
database.yml
file where we write some mysterious configurations. - Active Record does the things (technical term) and magically connects the app to a database.
- Active Record maps each of my
ActiveRecord::Base
inherited models to a database table. - More magic ensues.
As you can see from the number of times I just used the word "magic", Rails and Active Record abstract away a lot of the work of managing database connections. In order to configure our app to connect to more than one database, we're going to have to dive a little deeper.
Let's get started!
Domain Model
Let's say we're helping Rick Sanchez build an app to catalogue his many inventions. Our inventory app catalogues inventions in an inventions
table. Some inventions, however, deal with inter-dimensional travel and can't be stored in our regular database--they are stored in the 4th dimension. These inventions need to be stored in their very own database all the way over in the 4th dimension.
Our domain model is pretty simple. We have an Invention
model and a InterdimensionalInvention
model. The Invention
model will correspond to an
inventions
table in the main database for our application, and the InterdimensionalInvention
model will correspond to a interdimensional_inventions
table in our fourth dimension database.
Let's say that Rick's fourth dimension database also keeps track of the invention's permits, since the time cops of the fourth dimension are really big sticklers for that kind of thing. So we'll also have a Permit
model and a permits
table in the fourth dimension db as well. An interdimensional invention has many permits and a permit belongs to an interdimensional invention.
First things first, we'll create a new YAML file to hold our second database's configuration.
Configuring the Second Database
Our app will be configured to connect to our second database with the help of a YAML file, fourth_dimension_database.yml
.
# config/fourth_dimension_database.yml
default: &default
encoding: utf8
adapter: mysql2
port: 5500
development:
<<: *default
database: fourth_dimension_db_prod
host:
username:
password:
test:
<<: *default
host: localhost
username:
password:
database: fourth_dimension_db_test
Here, we assume that the fourth dimension DB has already been created, and that it can be found under a name of fourth_dimension_db_prod
, running on port 5500
.
The purpose of this configuration file is to have a nice clean place to store our database connection configuration options to be passed to the Active Record method we will call to establish the connection.
Connecting the the Second Database
Active Record provides us with an API for connecting to databases and managing those connections.
The ActiveRecord::Base.establish_connection
method creates an instance of ActiveRecord::ConnectionAdapters::ConnectionPool
using the configuration hash that we pass as an argument:
ActiveRecord::Base.establish_connection(config_options)
We stored our configuration options in a YAML file, so let's create an initializer to read the YAML file into a Ruby hash and store it in an environment variable.
# config/initializers/fourth_dimension_database.rb
FOURTH_DIMENSION_DB = YAML.load_file(File.join(Rails.root, "config", "fourth_dimension_database.yml"))[Rails.env.to_s]
Now we can call .establish_connection
with an argument of FOURTH_DIMENSION_DB
.
But where should we invoke this method?
We have to tell any and all classes that need to map to database tables in our separate fourth dimension database to establish and use a connection to that database. We could call .establish_connection
in both of the models that need to use this connection:
class InterdimensionInvention < ActiveRecord::Base
establish_connection FOURTH_DIMENSION_DB
end
class Permit < ActiveRecord::Base
establish_connection FOURTH_DIMENSION_DB
end
This isn't terribly efficient though, and will result in more database connections than we bargained for.
To understand why, we need to take a closer look at the .establish_connection
method and learn how Active Record manages database connections in the context of our Rails app.
Active Record and the Database Connection Pool
Establishing the Connection Pool
The .establish_connection
method returns an instance of ActiveRecord::ConnectionAdapters::ConnectionPool
def establish_connection(spec) owner_to_pool[spec.name] = ConnectionAdapters::ConnectionPool.new(spec) end
The connection pool instance manages the connections that your application opens up to the database. The default pool size is 5, although we can specify some other pool size via the optional pool:
attribute in our YAML configuration file.
"Pool size" refers to the number of connections that your application will allow to be opened to the database. Active Record will only open a new connection to the database when a new thread or process attempts to communicate to the database by executing a SQL query.
Connecting to the Database
When this happens, Active Record will grab the connection pool instance and invoke another method on it, the #connection
method.
def connection @thread_cached_conns[connection_cache_key(Thread.current)] ||= checkout end
The #connection
method looks up up a stored map, @thread_cached_conns
, in which the keys represent thread ids and the values represent database connections.
This map is created, empty, when the connection pool instance initializes, and gets populated with database connections when the connections are created.
Let's take a look at a portion of the connection pool instance's #initialize
method:
def initialize(spec) super() @spec = spec ... @thread_cached_conns = Concurrent::Map.new(:initial_capacity => @size) @connections = [] end
So, the #connection_method
introspects on the @threaded_cached_conns
map. If the map already contains a key representing the current thread, #connection
will return the connection, otherwise it will call another method, #checkout
to open and store such a connection.
The #checkout
method will in turn, eventually, invoke the #new_connection
method, which does the actual work of creating our database connection:
def new_connection Base.send(spec.adapter_method, spec.config).tap do |conn| conn.schema_cache = schema_cache.dup if schema_cache end end
Managing Connections via the Connection Pool
Active Record will use the connection pool instance that we created when we called establish_connection
to manage the creation and use of each database connection.
When we initialized our connection pool instance via .establish_connection
, it set an instance variable, @size
, equal to the pool size defined in the configuration that we passed as an argument to .establish_connection
, defaulting it to 5 if no such option is included in the configuration.
Here's a look at the relevant portion of the #initialize
method:
def initialize(spec) super() @spec = spec ... # default max pool size to 5 @size = (spec.config[:pool] && spec.config[:pool].to_i) || 5 end
So, the pool is aware of the max connection size, which defaults to 5, and will only allow 5 connections to be "checked out", i.e. created and/or used, at a given time. The connection pool's #checkout
method implements some conditional logic, more or less saying: "if the number of open connections is less than the number of allowed connections, call #new_connection
, else, throw an error."
The error that is raised if your app tries to use more connections than your pool has made available will look something like this:
ActiveRecord::ConnectionTimeoutError - could not obtain a database connection
So, if our app is trying to connect to our fourth dimension database by enacting a SQL query with any fourth_dimension_database_prod
-connected tables, Active Record will look for available connections and use them if they are found. Or it will open a new connection if no open connections are available. The app determines whether or not it can open a new connection to the database by asking the connection pool how many connections are already opened and how many connections are allowed to be open.
But, if we call .establish_connection
twice, once in each of the InterdimensionalInvention
and Permit
models, then we are creating two connection pools for our one fourth dimension databse.
Both connection pools will do their work at the same time, effectively allowing twice as many database connections to be opened between our app and our fourth dimension database than we have anticipated or intended. If our fourth dimension database isn't configured to allow for that many concurrent connections, our database will error out and we won't be able to interact with it at all! Oh no!
The fourth dimension time cops would not stand for this--look what they did to Albert Einstein because he overloaded their database:
We need to ensure that we only establish one connection pool instance to manage connections between our application and our fourth dimension db.
To do this, we'll implement a base class to handle the connection pool and we'll inherit the appropriate models from this base class.
Handling the Connection Pool in a Base Class
# app/models/fourth_dimension_db_base.rb class FourthDimensionDbBase < ActiveRecord::Base self.abstract_class = true establish_connection FOURTH_DIMENSION_DB end
# app/models/fourth_dimension_db_base.rb class InterdimensionalInvention < FourthDimensionDbBase has_many :permits end
# app/models/fourth_dimension_db_base.rb class Permit < FourthDimensionDbBase belongs_to :interdimensional_invention end
Note that we've made our parent class an abstract class. This leverages an ActiveRecord::Base
class method, .abstract_class
which tells Active Record that we do not want this class to map to a database table. Active Record will not look for and query a table with the name of this class and it will understand that this class does not have any attributes.
That should do it for our refactor. Our app is now all set up to efficiently manage connections to two different databases!