Summary
SolidCache::Store::Failsafe::TRANSIENT_ACTIVE_RECORD_ERRORS rescues ActiveRecord::ConnectionNotEstablished but not ActiveRecord::ConnectionFailed. When the cache database connection is terminated mid-transaction, Solid Cache raises ConnectionFailed instead of degrading to a cache miss, so the error propagates to the caller (a 500 on a web request, for example).
How we hit it
We run Solid Cache on a dedicated PostgreSQL database and set idle_in_transaction_session_timeout on that connection to bound row-lock contention on solid_cache_entries. A hot key_hash gets held by Entry.lock_and_write while the Ruby block runs, and under load the holding thread can stay idle-in-transaction long enough for PostgreSQL to terminate the backend:
PG::ConnectionBad: PQconsumeInput() FATAL: terminating connection due to idle-in-transaction timeout
The write that follows the SELECT ... FOR UPDATE inside lock_and_write then raises ActiveRecord::ConnectionFailed. That class is a subclass of ActiveRecord::StatementInvalid (added in Rails 7.1), not of ActiveRecord::ConnectionNotEstablished, so the current failsafe list does not catch it.
Reproduction
conn = SolidCache::Record.connection
conn.execute("SET idle_in_transaction_session_timeout = '300ms'")
SolidCache::Entry.lock_and_write("probe") do |_value|
sleep 1 # transaction stays idle > 300ms -> PostgreSQL kills the session
"v"
end
# => ActiveRecord::ConnectionFailed: ... terminating connection due to idle-in-transaction timeout
SolidCache::Store::Failsafe::TRANSIENT_ACTIVE_RECORD_ERRORS.any? { |k| error.is_a?(k) }
# => false
Proposal
Add ActiveRecord::ConnectionFailed to TRANSIENT_ACTIVE_RECORD_ERRORS. A terminated connection is the kind of transient failure the cache should degrade on, consistent with the existing ConnectionNotEstablished entry.
TRANSIENT_ACTIVE_RECORD_ERRORS = [
ActiveRecord::AdapterTimeout,
ActiveRecord::ConnectionFailed, # added
ActiveRecord::ConnectionNotEstablished,
ActiveRecord::Deadlocked,
ActiveRecord::LockWaitTimeout,
ActiveRecord::QueryCanceled,
ActiveRecord::StatementTimeout
]
ActiveRecord::ConnectionFailed exists in Rails 7.1+, which is within Solid Cache's supported range, so referencing it directly is safe.
Happy to send a PR with a test if this looks right.
Environment: solid_cache 1.0.10, Rails 8.1, PostgreSQL 17.
Summary
SolidCache::Store::Failsafe::TRANSIENT_ACTIVE_RECORD_ERRORSrescuesActiveRecord::ConnectionNotEstablishedbut notActiveRecord::ConnectionFailed. When the cache database connection is terminated mid-transaction, Solid Cache raisesConnectionFailedinstead of degrading to a cache miss, so the error propagates to the caller (a 500 on a web request, for example).How we hit it
We run Solid Cache on a dedicated PostgreSQL database and set
idle_in_transaction_session_timeouton that connection to bound row-lock contention onsolid_cache_entries. A hotkey_hashgets held byEntry.lock_and_writewhile the Ruby block runs, and under load the holding thread can stay idle-in-transaction long enough for PostgreSQL to terminate the backend:The
writethat follows theSELECT ... FOR UPDATEinsidelock_and_writethen raisesActiveRecord::ConnectionFailed. That class is a subclass ofActiveRecord::StatementInvalid(added in Rails 7.1), not ofActiveRecord::ConnectionNotEstablished, so the current failsafe list does not catch it.Reproduction
Proposal
Add
ActiveRecord::ConnectionFailedtoTRANSIENT_ACTIVE_RECORD_ERRORS. A terminated connection is the kind of transient failure the cache should degrade on, consistent with the existingConnectionNotEstablishedentry.ActiveRecord::ConnectionFailedexists in Rails 7.1+, which is within Solid Cache's supported range, so referencing it directly is safe.Happy to send a PR with a test if this looks right.
Environment: solid_cache 1.0.10, Rails 8.1, PostgreSQL 17.