Defining relationships in Entity Framework¶
We store both entities and the relationships that connect them in relational databases. This allows us to query related entities through joining tables, expressed with the join
SQL command. Entity Framework, which is an ORM framework, provides us with built-in support for conveniently managing these relationships.
Defining relationships¶
Convention-based mapping
Entity Framework has conventions that enable mapping relationships automatically without explicit configuration. We will not rely on this feature here; instead will define the relationships explicitly.
Let us look at our example classes and the relationship among them:
public class Product
{
public int ID;
public string Name;
public int Price;
public int VATID;
public VAT VAT { get; set; }
}
public class VAT
{
public int ID;
public int Percentage;
public ICollection<Product> Product { get; set; }
}
We can set up the configuration in the OnModelCreating
function inherited from the DBContext
base class. We can use the following functions for configuring an entity:
modelBuilder.Entity<Example>
.HasOne()/.HasMany()
.WithOne()/.WithMany()
In the example above, we can see a one-to-many connections, which can be described as follows:
modelBuilder.Entity<Product>()
.HasOne(d => d.VAT)
.WithMany(p => p.Product)
.HasForeignKey(d => d.VatId);
The connection between the two entities is provided by the foreign key of the Product
table pointing to VAT
table (which foreign key, naturally, also appears in the database as a column). In the C# code, a VAT
object reference appears in class Product
. Similarly, VAT
objects have a list of connected Product
instances. These C# properties are called navigation properties.
Explicit joining¶
The DBContext
offers the tables as DBSets
, on which we can perform LINQ operations. One such operation is the join
function. Two DBSets
can be joined via the appropriate foreign key. Similar to it's SQL equivalent, the following LINQ expression declaratively describes what we want to get.
var query =
from p in dbContext.Product
join v in dbContext.Vat on p.VatId equals v.Id
where p.Name.Contains("test")
select v.Percentage;
// Displays the generated SQL query
Console.WriteLine(query.ToQueryString());
The generated SQL query will look similar to this:
SELECT [v].[Percentage]
FROM [Product] AS [p]
INNER JOIN [VAT] AS [v] ON [p].[VatId] = [v].[ID]
WHERE [p].[Name] LIKE N'%test%'
We rarely need to use explicit joins. As a matter of fact, we should avoid using them when navigation properties are available.
Navigation property¶
Since we have configured the relationship between the Product
and VAT
EF entities in our DbContext
, we can use the VAT
property in the Product
class: this is the navigation property. The joining "behind" the navigation property is handled automatically by EF without us having to define it in the query. This simplifies our previous query to:
var query =
from p in dbContext.Product
where p.Name.Contains("test")
select p.VAT.Percentage;
// Displays the generated SQL query
Console.WriteLine(query.ToQueryString());
Below we can see the generated SQL query, which differs from the previous one only in the type of join, but otherwise, we get to the same solution.
SELECT [v].[Percentage]
FROM [Product] AS [p]
LEFT JOIN [VAT] AS [v] ON [p].[VatId] = [v].[ID]
WHERE [p].[Name] LIKE N'%test%'
Prefer navigation properties
In EF, we should always strive to use the navigation properties when possible. We should avoid performing explicit joins.
Include¶
In the previous example, only one scalar result was queried. But what happens to the navigation properties when we query an entire entity? For example:
var prod = dbContext.Product.Where(p => p.Name.Contains("test")).First();
Console.WriteLine(prod.Name); // this works, it will print the name
Console.WriteLine(prod.VAT.Percentage); // accessing the referenced entity via the navigation property
In this example, we would get a runtime error in the last line. Why is that? Despite the navigation property being configured, EF does not load referenced entities by default. We can work with them in queries (as we wrote p.VAT.Percentage
in a previous query), but if we query a Product
entity, it does not include the referenced VAT
entity. The referenced record(s) could be fetched. But it is up to the developer to decide if they really need them. Just consider, if all the referenced entities were fetched automatically (even transitively), the database would have to look up hundreds or thousands or records to get a single entity and all of it's referenced data via navigation properties. This is unnecessary in most cases.
If we really need the referenced entities, then we need to specify this in the code using Include
as follows:
var query =
from p in dbContext.Products.Include(p => p.VAT)
where p.Name.Contains("test")
select p;
// or an alternative syntax for the same:
// var query = products
// .Include(p => p.VAT)
// .Where(p => p.Name.Contains("test"));
Console.WriteLine(query.ToQueryString());
If we look at the generated SQL statement, it shows both the appropriate join
and the required data appearing within the select
statement.
SELECT [p].[Id], [p].[CategoryId], [p].[Description], [p].[Name], [p].[Price], [p].[Stock], [p].[VatId], [v].[ID], [v].[Percentage]
FROM [Product] AS [p]
LEFT JOIN [VAT] AS [v] ON [p].[VatId] = [v].[ID]
WHERE [p].[Name] LIKE N'%test%'
Automatic lazy loading of referenced entities
In Entity Framework, it is possible to turn on lazy loading, which causes entities to be loaded through navigation properties on demand. The loading is performed in a lazy way (that is, only when needed) without an explicit Include
. While this solution is convenient for the developer, it comes at a price: loading data when needed (when the code reaches a statement referencing the property) will typically result in several separate database queries. In the Include
solution, you can see above that a single query loads both the Product
and VAT
data. If we used lazy loading, there would be a query for the Product
data and another one for the referenced VAT
properties at a later time. Thus, lazy loading is usually worse in terms of performance.