Home sql Why SELECT… FOR UPDATE?

Why SELECT… FOR UPDATE?

Author

Date

Category

V
documentation
PostgreSQL says:

Read Committed is the default isolation level in PostgreSQL. When
a transaction uses this isolation level, a SELECTquery
(without a FOR UPDATE/ SHAREclause) sees only
data committed before the query began; it never sees either uncommitted
data or changes committed during query execution by concurrent
transactions. In effect, a SELECTquery sees a snapshot of
the database as of the instant the query begins to run. (…)

My understanding is that if I am in transaction without explicitly specifying
isolation level
will do:

SELECT "id" FROM "table" WHERE /* … */

Then I will process the results in the application, and then:

UPDATE "table" SET "flag" = TRUE WHERE "id" IN (/* … */)

Then I will get the desired result, namely, updating all rows with the necessary
ID , regardless of what others delete or change
transactions. But judging by the fact that there is such a thing as
SELECT /* … */ FOR UPDATE, isn’t it?

When to use SELECT /* … */ FOR UPDATE
necessary?

How does the transaction isolation level affect this? Do I need
SELECT /* … */ FOR UPDATEat the
SERIALIZABLE?


Answer 1, authority 100%

As long as your transactions follow strictly one after the other, everything is fine with you and there are no problems. Except for performance. And to improve performance, it is necessary to allow transactions to be executed in parallel. And this is where the rich and wondrous world of concurrently controlbegins. Moreover, not only in databases, but everywhere where at least something is executed in parallel.

On money, people usually understand better, so let’s talk about money. Let’s say there is a user, he has 100 money in his account. The user can spend them, you check the balance select balance ..., then update the balance when buying update ... set balance = ? where .... And on a happy day, it somehow happened that two purchase requests for this user come at once. One for 50 money, the second for 70. One of them must be rejected, because not enough money. But as a result, it turns out that both purchases went through and you have a problem, you sold what you didn’t need. And this is not even visible from the user’s balance. How?

This is a typical race condition, both transactions first read data, then do something locally, then write something.

  • no one bothers them to read, because both transactions read that the user has 100 money
  • both transactions naturally decided that there is enough money
  • both transactions updated the user’s balance

On concurrent access to the resource, they fought only at the last step, the transaction that started updating the data later first waited for the completion of the first transaction. And then she rewrote the balance into the one she considered correct herself. The so-called lost updateanomaly.

That is, as they stand, these two transactions were not serialized correctly. For the logic to execute correctly, the second incoming transaction had to wait for the result of the first transaction before reading the user’s balance. But the database was not warned about this, and quite naturally this first select was regarded as not interfering with others.

Just to warn the DBMS that we are planning to do something with the data, and therefore we need to serialize transactions differently, and there is FOR SHARE, FOR UPDATEadd-ons. Therefore, by the way, they are documented in the section Explicit Locking

  • If nothing is specified in select, then due to the essence of MVCC, postgresql implementations will be able to read data a little less than always. Even if right now another transaction is already updating this row, we will get the last known committed value of this row.
  • if we explicitly request FOR SHARE, then we will be able to read in many threads with this FOR SHARElock without blocking each other. But if someone wants to update this line, then he will stand in the waiting queue until all transactions holding the reading lock are completed and at the same time delay all subsequent FOR SHAREtransactions.
  • if we request FOR UPDATE, then we can be sure that no other transaction will be able to update this row until the end of our transaction.

That is, they are needed in those places where, without this, a concurrent transaction can be serialized logically from the point of view of the DBMS, but not correct for the business logic of the application.

Serializable

Serializable transactions are just Repeatable Read transactions which
add nonblocking monitoring for dangerous patterns of read / write
conflicts

The postgresql implementation of Serializable monitors changes that would violate transaction isolation and all affected transactions will receive a serialization error. The application itself must be ready to run at this isolation level and expect to receive a serialization error at any given time, and be ready to retry the transaction.

Do I need FOR UPDATEhere – I’ll just quote documentation for tips on how to reduce performance drawdown from Serializable isolation levels:

Eliminate explicit locks, SELECT FOR UPDATE, and SELECT FOR SHARE where no longer needed due to the protections automatically provided by Serializable transactions.

Explicit locks are not required at this isolation level.


in a transaction without an explicit isolation level

The level specified in the default_transaction_isolationsetting will be used. That is, it can be Serializable instead of Read Committed.

I will get the desired result, namely, updating all rows with the necessary IDs, regardless of what other transactions are deleted or changed there.

You will receive an update of all lines matching the condition. Whether this particular result is needed is the key question.

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions