Notice: Undefined property: Friendica\App::$performance in /var/www/virtual/ on line 28

Warning: Cannot modify header information - headers already sent by (output started at /var/www/virtual/ in /var/www/virtual/ on line 1422

Warning: Cannot modify header information - headers already sent by (output started at /var/www/virtual/ in /var/www/virtual/ on line 1423

Warning: Cannot modify header information - headers already sent by (output started at /var/www/virtual/ in /var/www/virtual/ on line 1430

Warning: Cannot modify header information - headers already sent by (output started at /var/www/virtual/ in /var/www/virtual/ on line 1431

Warning: Cannot modify header information - headers already sent by (output started at /var/www/virtual/ in /var/www/virtual/ on line 1432

Warning: Cannot modify header information - headers already sent by (output started at /var/www/virtual/ in /var/www/virtual/ on line 1433
Roland's Friendica Network (profile)

JPA causes tons of SQL queries

I'm currently developing (free time project) a Java Server Faces application which uses the #JPA (very common to do so) over EJBs (Enterprise Java Beans). When I turned on logging of SQL queries, I thought I didn't see it right. For each entity ("row" in database table) a distinct query has been issued. Means, when I have 1,000 rows, the same amount of queries are being issued.

This can become a nightmare when "only" 100 users are on the server. Does someone know how to reconfigure the JPA (I use #Payara ) to issue only one SQL query per table?

Surely I have used JPQL (Java Persistence Query Language) and no WHERE was used there. So normally the whole table should be fetched. But this is not the case for linked tables (e.g. with @OneToOne) where each such referenced entity is being fetched distinctly.

So is there a way to prevent this from happening? RAM is not a issue here. 16 GB are installed and data will not grow so much.
#JPA #Payara jpa performance sql tweaks
Limit the amount of fetched rows with pagination for example?
Maybe I need to explain more about my program. On initialization of the backing bean (means when an instance is being created and the web container is creating a wrapper class around it) I load the entire table rows from the EJB to into the controller to cache it there (see #JCache #JS107 ).

This gives great performance improvement due to not with every request (POST/GET) data is not being loaded from database but taken from cache. This is what I call asynchronous loading of data, most PHP applications however are synchronous, means with every click data is fetched from database and scripts are loaded and parsed (OpCache is maybe a bit improving here) and are being "forgotten" after the request was finished.

But with a JSF application, the "controller" (backing bean) remains instanced (and wrapped) in the web container's heap until you redeploy the application or restart the web container.

So sure I want to use that advantage of having all loaded at "all" (not at the beginning, but you can implement a ServletContextListener interface where you can "hook" on the initialization phase) times. Still this issue is there.

As you can see, not on every request these SQL queries are performed (which is very good for overall performance) but still they are really a lot. Just imagine 1,000 users on your web server (you may need a cluster then) and several 10,000 records.

Sure, when you cache them all in RAM, then you need a lot RAM. The #Payara application server uses here #Hazelcast ( ) for having a distributed heap (cluster members contribute their heap to the cluster) and JCache (JSR107).

So what now? I still found this a bit to much. Currently I use eager fetch-mode, I could switch to lazy but that only delays the problem as foreign entities are being lazyless (on getter invocation) loaded.
#JSR107 is the right one ...
So limiting the amount of fetched rows by pagination is only working with synchronous applications where on each request, the database is being queried. This is not the case with asynchronous applications where the whole data is already loaded (and cached) and SELECT statements are (normally) reduced to a minimum.
I think you have the terminology backwards. In PHP-Javascript web applications asynchronous means the data is fetched when it is needed, after the main page is loaded. This gave the term AJAX ("Asynchronous JavaScript + XML"). On the contrary, synchronous would mean the data is loaded on page load. For example, on Friendica, the network page is loaded synchronously, but then the post refresh or the infinite scroll are loaded asynchronously. I don't think it matters for your problem, but it may confuse other people.

Back to your issue. Initial data load takes more RAM. This actually was the subject of a recent performance improvement in Friendica: should we load all the configuration values at once at the start of each script or should we query each individual values? it turns out RAM consumption rarely is the bottleneck in a PHP application because of the "fire and forget" architecture you described: the script is loaded, executed, forgotten. So a increase in memory consumption wouldn't be significant, especially if it implies a reduction in script execution because the memory consumption pike will be shorter. However, for a long-running server application, using more RAM probably is a bigger deal.

No matter what, here are the general axes of improvement to reduce the number of requests:
  • Write your own SQL query: I don't know if is it possible, especially in Java where everything seems to be broken down in atomic parts.
  • Limit the initial data load, use lazy loading beyond: Like you said, the initial data load doesn't scale. If you can cap the size of the initial data load, you probably can use a dynamic cache checking for row existence before querying either the cache, or the DB directly.
I understand and know what AJAX means and I'm not targeting PHP or labeling it as bad and "look how cool this or that is". No, that is not my style.

Back to my issue, the #JPA (Java Persistence API) fetches records from database, creates an instance for each record of your entity class and wraps it into a proxy class. Then that proxy class is being compiled and most JPA implementations are caching them (the entity manager does this). This has nothing to do with the application or that it uses AJAX. My main goal is to reduce invocations of EJB business methods as this is "expensive" (a lot code need to be executed).

Just imagine, you can deploy your model classes on an other server or even data center and from your backing bean's perspective you have to change nothing at all as all is encapsulated away for you (most #JavaEE application servers use #CORBA for serializing and deserializing data).

Means on one server (or cluster, doesn't make a difference in Java code) you have your web "controllers" (they are not called controllers, backing beans are the right words for them) and on the other your model classes are running. This means one thing: distributed application load. :-) BTW: Amazon is running on Java Server Faces, when you hit the "order" button, EJBs are being invoked.

Okay, I'm explaining to much off-topic. With "synchronous" I mean with every request (even AJAX requests as they are basically HTTP requests. too) data is being fetched from the database. With "asynchronous" this is not the case. And I mean with this, that data is not being fetched from on each database, but maybe from a cache (that needs to stay updated, of course).
I didn't think you looked down on PHP, I was just reiterating the structure I personally know inside and out, since I don't know Java much.

Anyway, I probably would go the caching way, it seems more in the spirit of Java (adding a middle block between two atomic blocks) than trying to coerce an atomic block into doing a non-atomic task.
Okay. :-) I have found something programmatic (with annotations) for this, but it looks a bit like an overhead when you have +20 entities. What I would prefer is a 3rd fetch type ALL to the existing EAGER and LAZY. That would have to go into JPA specification, of course which made all persistence providers, like #eclipselink , #hibernate, #datanucleus and so on unified as before every provider did it on their way.

This is why I like the JPA, because it is dbms-independent and provider-independent at the same time. No need to worry if your data is stored in a SQLite, MsSQL, Oracle DB or good-old MySQL or even "exotic" database systems like MongoDB.
I’ve always felt that “things”-independent software often ends up being the lowest common denominator between the abstracted products, and that very few actual migrations occur to justify the loss in features. Sure, on paper not having to care about which DBMS is used is great, but this often erases any advantage of choosing a specific DBMS over another, and migration between DBMS is so expensive that the DBMS-independent sofyware will often be used with a single DBMS for the project lifetime.

Have you seen such pattern professionally?
Remember from which times #JPA came: 2006, there was not much #BigData or so. And yes, there are annotations now that will help model your entities. Maybe I take that approach.
Ah, @NamedEntityGraph(s) were the annotations I was looking for.
For PHP, Doctrine is using annotations as well, but I still don’t like it. I believe the foreign relationships should be defined once, in the database schema, and the application should automatically take them into account, in a long-cached manner if it is too expensive to query the schema on each load.
Yes, that is what I mean with synchronous loading, on each load/request all over again, no persisted caching (except when you use memcache/redis but that is not the solution for all scenarios). One thing that could be interesting for non-Java but #PHP people can be the PHP Application server which aims to be a #JavaEE application server written in PHP. There also the application is loaded at all times and entities are cached there, too.
It's foreign to me because I don't see PHP developers suddenly take the Java infrastructure approach without switching to Java altogether, and conversely, don't see Java developers suddenly learn PHP and reproduce the same infrastructure as well.
I have learned both languages. :-) So I can live in both worlds.
If you get the freedom of choosing a technology for a personal project, which one would you naturally lean towards to?
Hard to say as you cannot compare PHP and Java as they are fundamentally different languages. PHP is type-lazy (allows no type at all) and Java is type-safe (unless you do "unsafe casts").

To answer your question, I like both, if that satisfies your question. :-)
My question was more about ease of use and programming fun versus professional obligation.

No performance improvement by memcache or redis?

I have now tried both, #memcache and #redis, to set single keys in cache. It seems both perform very poor compared to #in-progress caching.

So really no improvement if they are being used? And pipelining in redis won't help here much as I really have to "atomically" set/test/get key-value pairs.

So my in-progress cache as following seems to be the fastest:

function someCachedFooValue ($someValue) {
	if (!isset($cache[__FUNCTION__][$someValue])) {
		$cache[__FUNCTION__][$someValue] = doSomethingFooExpensive($someValue);

	return $cache[__FUNCTION__][$someValue];

Here I want to cache the value from the expensive (long-taking) function doSomethingFooExpensive() if it is is not here.

This way seems to be the fastest way, sadly.
#memcache #redis #in-progress caching in-progress memcache performance redis
newer older