This is a simple N:M(many-to-many) relationship, and your approach that you've laid out will likely spell out disaster in terms of efficiency and management.
Here is your situation:
- You have two entities:
recipes
and ingredients
.
-
One ingredient may be a part of many recipes.
-
One recipe may be made up of many ingredients.
Whenever you have this relationship between any two entities, you are going to want to have not two, but three tables:
+-----------+ +-------------------------+ +-------------------+
| recipes | | recipes_has_ingredients | | ingredients |
+-----------+ +-------------------------+ +-------------------+
| recipe_id | | recipe_id | | ingredient_id |
| name | | ingredient_id | | name |
| ... | +-------------------------+ | calories |
+-----------+ +-------------------+
recipes
and ingredients
are what's known as base tables, where they store intrinsic information about that particular entity.
The recipes_has_ingredients
table is what's known as a cross-reference table (or "XREF"), which stores the associations between the two entities. The fields in this table: recipe_id
and ingredient_id
both link to their respective base tables, and the combination of the two in each row in the XREF table is unique. It basically maps the many associations that each recipe_id
may have to different ingredient_id
's and vice versa.
Why does this design facilitate many-to-many relationships? Because data in that XREF table is allowed to look like this:
+-----------------------------+
| recipe_id | ingredient_id |
+-----------------------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
+-----------------------------+
As you can clearly see: One recipe is associated with many(3) ingredients, and one ingredient is associated with many(3) recipes. Also notice how values in either of the columns are allowed to repeat, but that the combination of the two columns is unique — this is really the key aspect of this design that makes the N:M relationship work.
So here are some simple examples of how you can easily retrieve and manage data using this design:
// Given a particular recipe_id, retrieve all ingredients used in that recipe:
SELECT name
FROM recipes_has_ingredients
INNER JOIN ingredients USING (ingredient_id)
WHERE recipe_id = <id>
// Retrieve the name of recipe (of id 4), and total amount of calories it has:
SELECT a.name,
SUM(c.calories) AS calorie_count
FROM recipes a
INNER JOIN recipes_has_ingredients b ON a.recipe_id = b.recipe_id
INNER JOIN ingredients c ON b.ingredient_id = c.ingredient_id
WHERE a.recipe_id = 4
GROUP BY a.recipe_id,
a.name
// Given a list of ingredient_id's, retrieve all recipes that contain
// ALL of the listed ingredients
SELECT name
FROM recipes
INNER JOIN recipes_has_ingredients USING (recipe_id)
WHERE ingredient_id IN (1,2,3)
GROUP BY recipe_id
HAVING COUNT(*) = 3
// Given a particular recipe_id (id 6), add two more ingredients
// that it has (ids 4 & 9):
INSERT INTO recipes_has_ingredients VALUES (6,4), (6,9);
// Delete a particular recipe:
DELETE FROM recipe WHERE recipe_id = 4
^ The above DELETE
operation also deletes all of that recipe's associations if you've properly defined the CASCADE rules between your relationships.
Looking back at your original design, what if you wanted to update or delete certain ingredients that a recipe has, or how about change the name of an ingredient? You would need hacky procedural code to modify the right positions in csv strings, or you would need to update every row in the table to reflect even the slightest changes in a single ingredient.
There are also many more compelling questions you could answer that you couldn't really otherwise using your original design such as:
- Recipe with highest/lowest calorie count?
- Ingredient that is contained by the most recipes?
...The list goes on, and the benefits of implementing this design will serve you well. You will prevent yourself from a great deal of hardship and suffering by doing things the right way. =)