140 lines
5.8 KiB
Markdown
140 lines
5.8 KiB
Markdown
|
# lower\_bound, upper\_bound in c++, visually explained
|
||
|
|
||
|
2024-04-02
|
||
|
|
||
|
One of the most common problems programmers have to solve is retrieving specific data — finding the needle in the haystack.
|
||
|
To make this simpler, languages provide standard tools to perform this task:
|
||
|
in particular, this post focuses on C++'s `lower_bound` and `upper_bound`.
|
||
|
|
||
|
Personally, I find that the documentation for these functions is quite unclear and verbose.
|
||
|
For example, [cppreference.com](https://en.cppreference.com/w/cpp/algorithm/lower_bound)
|
||
|
describes `lower_bound` like this:
|
||
|
|
||
|
```
|
||
|
Searches for the first element in the partitioned range [first, last) which is
|
||
|
**not** ordered before value.
|
||
|
```
|
||
|
|
||
|
That sounds like gibberish, and is too technical to quickly understand.
|
||
|
For that reason, I'm making this blog post to explain my own mental model of these functions.
|
||
|
|
||
|
## refresher on binary search
|
||
|
|
||
|
First, it's important to understand how `lower_bound` and `upper_bound` work under the hood.
|
||
|
|
||
|
As you know, finding words in a dictionary is relatively fast.
|
||
|
This is possible because the words are in alphabetical order.
|
||
|
If the words weren't ordered, you'd have to look through every single word in the dictionary, one by one.
|
||
|
That would be an excruciating, and much slower process.
|
||
|
Because of the ordering, you can rapidly narrow down the word you want.
|
||
|
|
||
|
Computers can do the same with ordered data: this is called *binary search*,
|
||
|
and is what powers `lower_bound` and `upper_bound`.
|
||
|
Binary search is like searching for a word in the dictionary, but more structured.
|
||
|
For example, say our dictionary is 1000 pages, and the computer wants to look for the word "rabbit".
|
||
|
These are the steps it takes:
|
||
|
|
||
|
1. Start at exactly page 500.
|
||
|
2. See the word "murmur", so go forwards to page 750.
|
||
|
3. See the word "sunny", so go backwards to page 625.
|
||
|
4. And so on.
|
||
|
|
||
|
This is called "binary search" because we halve the region we are looking in every time (we pick either the left half, or the right half.)
|
||
|
For step 1, the computer is halving the range `1-1000`.
|
||
|
In step 2, `500-1000`. Then for step 3, `500-750`.
|
||
|
This is like the way humans look at dictionaries, but more structured.
|
||
|
|
||
|
Anyways, this is not intended to be a full explanation of binary search: refer to [Tom Scott's video](https://youtube.com/watch?v=KXJSjte_OAI) about it for more information.
|
||
|
|
||
|
## lower bound and upper bound
|
||
|
|
||
|
Back to the real subject of this post: `lower_bound` and `upper_bound` in C++.
|
||
|
What I used to understand of these functions is that they use binary search to find elements in a sorted container.
|
||
|
However, I didn't get what differentiated them.
|
||
|
Again, if you read solely the documentation about these functions, it's not easily comprehensible.
|
||
|
|
||
|
First of all, say we wish to search for the integer `k` (k for key) in a sorted vector (array) of integers `v`.
|
||
|
We can find the lower and upper bounds with these function calls:
|
||
|
|
||
|
```
|
||
|
// (you could use auto here instead of the verbose type)
|
||
|
std::vector<int>::iterator lb = std::lower_bound(v.begin(), v.end(), k);
|
||
|
std::vector<int>::iterator ub = std::upper_bound(v.begin(), v.end(), k);
|
||
|
```
|
||
|
|
||
|
Based on the documentation, we know
|
||
|
the first two arguments specify the region of `v` we're looking in.
|
||
|
Here, it's the entire vector (from the beginning to the end).
|
||
|
Also, put simply, the functions return by default:
|
||
|
|
||
|
- `lower_bound`: the first element `e` where `k <= e`;
|
||
|
- `upper_bound`: the first element `e` where `k < e`.
|
||
|
|
||
|
> Note: Both functions return `v.end()` if no valid element is found.
|
||
|
> This iterator points just **after** the last element of `v`.
|
||
|
|
||
|
This is the technical definition; it doesn't mean much by itself.
|
||
|
However, with a concrete example with real numbers, it clicked in my mind.
|
||
|
For example, let `k = 3`.
|
||
|
Here is an example sorted array `v`, with upper and lower bounds marked:
|
||
|
|
||
|
```
|
||
|
lower upper
|
||
|
↓ ↓
|
||
|
1 2 2 3 3 3 3 4 5 6
|
||
|
───────
|
||
|
↑
|
||
|
matching interval
|
||
|
```
|
||
|
|
||
|
The first `3` is the lower bound: it's the first element bigger or equal to our key.
|
||
|
The `4` is the upper bound, the first element strictly bigger than our key.
|
||
|
|
||
|
Here, when it's laid out visually, it's now clear what the lower and upper bounds mean:
|
||
|
it's the *bounds of the interval* that matches our search key.
|
||
|
This is mostly useful if the array has duplicate elements.
|
||
|
|
||
|
Notice how the upper bound is one past the end of the interval,
|
||
|
just like how `v.end()` is one past the last element of the vector.
|
||
|
This is usually how C++ iterators work, and makes some tasks more convenient.
|
||
|
Take this regular for loop:
|
||
|
|
||
|
```
|
||
|
for (int i = 0; i < 10; i++) { ... }
|
||
|
```
|
||
|
|
||
|
This loop will iterate over the numbers `0` to `9`,
|
||
|
excluding the upper bound `10`.
|
||
|
The same logic applies to C++ iterators.
|
||
|
If we want to iterate over all elements of a vector, we'd use:
|
||
|
|
||
|
```
|
||
|
for (auto it = v.begin(); it != v.end(); it++) { ... }
|
||
|
```
|
||
|
|
||
|
Here, we use `!=` instead of `<` for iterators, but it does practically the same thing.
|
||
|
When the iterator goes past the end of the vector, it'll hit `v.end()` (which is one past the last element),
|
||
|
and as such the loop stops.
|
||
|
|
||
|
> Note: Usually, you'd do `for (auto number : v)` to iterate over the entire array.
|
||
|
|
||
|
So, having the upper bound be right past the end of the interval makes this possible:
|
||
|
|
||
|
```
|
||
|
for (auto it = lb; it != ub; it++) {
|
||
|
// *it is like pointer dereference:
|
||
|
// it gets the number pointed to by the iterator
|
||
|
std::cout << *it << std::endl;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Anyways, I'll repeat it again: `lower_bound` and `upper_bound` represent the *interval* that matches what you're looking for.
|
||
|
|
||
|
## conclusion
|
||
|
|
||
|
So, that is my "visual" explanation how lower and upper bound works in C++.
|
||
|
In hindsight, this seems obvious, but back when I was first told about these functions,
|
||
|
I could not understand it because of the confusing descriptions.
|
||
|
Having this intuition for concepts is pretty helpful for truly understanding them:
|
||
|
you don't want to be stuck memorizing things that don't make sense.
|