Is preloading/caching data before the actual method call an (anti)pattern?

Cyno@programming.dev · 1 year ago

Is preloading/caching data before the actual method call an (anti)pattern?

deceitfulsteve@programming.dev · 1 year ago

I work with a code base that is perhaps going through a similar transition. Performance hasn’t really been a consideration and so when new functionality is tacked on we’re frequently making a new API call even though we might already have the data somewhere else.

I don’t have a name for the pattern or anti-pattern, but people’s responses seem to indicate that it’s largely a good thing, the change that you’re making. I’m reminded of a Martin Fowler-esque or TDD idea that a function should either retrieve data or process it. However I wasn’t able to find a blog post about that with a quick search.

Cyno@programming.dev · 1 year ago

If you find or run into that article later please share it, I’d definitely like to read it!

SwitchUp@programming.dev · 1 year ago

Sounds like Command-Query Separation (CQS)

It states that every method should either be a command that performs an action, or a query that returns data to the caller, but not both.

Ledivin@lemmy.world · 1 year ago

It sounds like you’re more-or-less describing memoization? A method caches its input and output, often in some sort of dictionary, and utilizes that cache to return if it receives the same input multiple times.

Cyno@programming.dev · 1 year ago

I’m not caching or reusing method results however, and even the inputs are not necessarily cached for multiple uses. I’m just preparing all potentially required input data before the method is actually called so I don’t have to do any loads within the method itself, so the method is just pure code logic and no db interaction.

For example, imagine you have a method that scores the performance of an athlete. The common “pattern” in this legacy code base is to just go through the logic and make a database load whenever you need something, so maybe at the beginning you load the athlete, then you load his tournament records, then few dozen lines later you load his medical records, then his amateur league matches, etc.

What I do is I just load all of this into a cache before the actual method call, and then send it into the method as a data source. The method will only use the cache and do all the calculations in-memory, and when it’s done the result would be in the cache as well. Then outside of the method I can just trigger a save or abandon it to persist the result. If I want to unit test it, I can easily just manually fill a cache with my data and use it as the data source (usually you’d have to mock custom response from the repository or something like that, inject an in-memory repository with the same data anyway or just resign to using an integrated test).

It’s like I’m “containerizing” the method in a way? It’s a pretty simple concept but I’m having trouble googling for it since I don’t know how to call it.

pohart@programming.dev · 1 year ago

What language are you using? Is a good idea to limit db calls, but maybe we can help with specific techniques idiomatic to your language

Cyno@programming.dev · edit-2 1 year ago

Ah sorry, forgot to mention it here because I originally posted it on csharp and then crossposted. I’m specifically thinking about c#, EF and .net core for web dev.

pohart@programming.dev · edit-2 1 year ago

I don’t know .net and sometimes quite some janky code, but I think in this case I would preload everything I definitely needed, locking the records I’m modifying. Then use ConcurrentDictionary.GetOrAdd(Tkey,Func<…>) to load values I might need only when they’re needed.

companero [he/him]@hexbear.net · 1 year ago

Sounds like the command pattern to me.

pohart@programming.dev · 1 year ago

Regardless of what pattern it is, you have a clear performance need and a testable implementation. That’s a win.

Beyond looking for a pattern, I’d look at what your doing to make sure you’re not loading a ton of extra dependencies of your know you won’t use them.

Also, you generally want a database transacting to be one logical unit of work, that all commits or all rolls back together, if you’re combining multiple transactions is likely what you want, but be aware that you might be holding locks for longer, so you might be introducing contention.

By the same token, make sure you’ve got records locked if you need them locked. If you had atomic updates before, or your first update locked the records you needed, you may need to lock records explicitly to keep your database consistent.