Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Welcome to the CollectiveAccess support forum! Here the developers and community answer questions related to use of the software. Please include the following information in every new issue posted here:

  1. Version of the software that is used, along with browser and version

  2. If the issue pertains to Providence, Pawtucket or both

  3. What steps you’ve taken to try to resolve the issue

  4. Screenshots demonstrating the issue

  5. The relevant sections of your installation profile or configuration including the codes and settings defined for your local elements.


If your question pertains to data import or export, please also include:

  1. Data sample

  2. Your mapping


Answers may be delayed for posts that do not include sufficient information.

Optimizing search indexing

I'm having a hard time optimizing my search_indexing.conf file. The API search I'm performing makes CA Providence execute SQL queries in a loop. I'm up to 37s response time and according to blackfire, the mysqli_query() function is called approx. 2200 times. (Which is ridiculous). These queries originate from the getRelatedItems method 

I've debugged around the class SearchResult#L1440 and the related components boil down to this list :
["ca_collections","idno"]
["ca_collections","preferred_labels"]
["ca_objects","idno"]"
["ca_objects_x_collections","type_id"]"
["ca_objects","preferred_labels"]"
["ca_objects_x_objects","type_id"]"
["ca_objects_x_occurrences","type_id"]"
["ca_objects_x_places","type_id"]"
["ca_objects_x_entities","type_id"]
["ca_objects_x_storage_locations","type_id"]
["ca_entities","idno"]
["ca_entities","preferred_labels"]
["ca_places","idno"]
["ca_places","preferred_labels"]
["ca_occurrences","idno"]
["ca_occurrences","preferred_labels"]
["ca_storage_locations","idno"]
["ca_storage_locations","preferred_labels"]

My search_indexing.conf is still the default one. What do I need to do to get the indexing optimized for these kind of search requests? 

Thank you in advance.

Comments

  • Can you provide details about the API call you're performing, the search and the approximate size of the data set?
  • I have about 163388 records in my ca_objects table to give you an idea. 

    My API call looks like this 

    curl -XPOST -d {  "bundles": {
        "ca_objects.preferred_labels": {},
        "ca_objects.ugent_collection": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.documentation.doc_type": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.documentation.doc_title": {},
        "ca_objects.documentation.doc_url": {},
        "ca_objects.documentation.doc_lastvisit": {},
        "ca_objects.documentation.doc_MediamosaID": {},
        "ca_objects.documentation.doc_publish": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.hierarchy.idno": {},
        "ca_objects.hierarchy.preferred_labels.name": {},
        "ca_collections.idno": {},
        "ca_collections.preferred_labels": {},
        "ca_objects_x_collections.type_id": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.related.idno": {},
        "ca_objects.related.preferred_labels": {},
        "ca_objects_x_objects.type_id": {
          "convertCodesToDisplayText": true
        },
        "ca_entities.idno": {},
        "ca_entities.preferred_labels": {},
        "ca_objects_x_entities.type_id": {
          "convertCodesToDisplayText": true
        },
        "ca_places.idno": {},
        "ca_places.preferred_labels": {},
        "ca_objects_x_places.type_id": {
          "convertCodesToDisplayText": true
        },
        "ca_occurrences.idno": {},
        "ca_occurrences.preferred_labels": {},
        "ca_objects_x_occurrences.type_id": {
          "convertCodesToDisplayText": true
        },
        "ca_storage_locations.idno": {},
        "ca_storage_locations.preferred_labels": {},
        "ca_objects_x_storage_locations.type_id": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.object_name": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.superfamily": {},
        "ca_objects.family": {},
        "ca_objects.prot": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.creator.creator_role": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.creator.creator_name": {},
        "ca_objects.creator.creator_extent": {},
        "ca_objects.classificationTerm": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.subject_terms": {},
        "ca_objects.usage": {},
        "ca_objects.ownership_history": {},
        "ca_objects.components": {},
        "ca_objects.technique": {},
        "ca_objects.creation_style": {},
        "ca_objects.skills": {},
        "ca_objects.institution": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.concordance": {},
        "ca_objects.genus": {},
        "ca_objects.species": {},
        "ca_objects.scientificNameAuthorship": {},
        "ca_objects.specimentype": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.specimendetails": {},
        "ca_objects.discovery_date": {},
        "ca_objects.discovery_finder": {},
        "ca_objects.kingdom": {},
        "ca_objects.phylum": {},
        "ca_objects.subphylum": {},
        "ca_objects.superclass": {},
        "ca_objects.class": {},
        "ca_objects.subclass": {},
        "ca_objects.infraclass": {},
        "ca_objects.superorder": {},
        "ca_objects.order": {},
        "ca_objects.suborder": {},
        "ca_objects.infraorder": {},
        "ca_objects.cites": {},
        "ca_objects.accessrestrict": {},
        "ca_objects.reproduction": {},
        "ca_objects.acqinfo_note": {},
        "ca_objects.accsta": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.acct": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.idql": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.pid": {},
        "ca_objects.don": {},
        "ca_objects.ipen": {},
        "ca_objects.alt": {},
        "ca_objects.altx": {},
        "ca_objects.cou": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.sru": {},
        "ca_objects.scopecontent": {},
        "ca_objects.extent_medium": {},
        "ca_objects.arrangement": {},
        "ca_objects.genreform": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.physdesc": {},
        "ca_objects.langmaterial": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.entityOrigin": {},
        "ca_objects.entityType": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.custodhist": {},
        "ca_objects.originalsloc": {},
        "ca_objects.altformavail": {},
        "ca_objects.techaccessrestrict": {},
        "ca_objects.searchtips": {},
        "ca_objects.content.content_type": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.content.content_description": {},
        "ca_objects.content.content_author": {},
        "ca_objects.height.height_value": {},
        "ca_objects.height.height_part": {},
        "ca_objects.width.width_value": {},
        "ca_objects.width.width_part": {},
        "ca_objects.depth.depth_value": {},
        "ca_objects.depth.depth_part": {},
        "ca_objects.diameter.diameter_value": {},
        "ca_objects.diameter.diameter_part": {},
        "ca_objects.weight.weight_value": {},
        "ca_objects.weight.weight_part": {},
        "ca_objects.volume.volume_value": {},
        "ca_objects.volume.volume_part": {},
        "ca_objects.scale.scale_value": {},
        "ca_objects.scale.scale_part": {},
        "ca_objects.inscription.inscription_type": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.inscription.inscription_transcription": {},
        "ca_objects.inscription.inscription_location": {},
        "ca_objects.inscription.inscription_notes": {},
        "ca_objects.model_series.model_series_type": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.model_series.model_series_nr": {},
        "ca_objects.color.color_extent": {},
        "ca_objects.color.color_name": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.material.material_name": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.material.material_extent": {},
        "ca_objects.target_audience.target_educationLevel": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.target_audience.target_notes": {},
        "ca_objects.specimens.specimens_nr": {},
        "ca_objects.specimens.specimens_sex": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.specimens.specimens_lifeStage": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.specimens.specimens_typeStatus": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.discovery_place.discovery_place_name": {},
        "ca_objects.discovery_place.discovery_place_coordin": {},
        "ca_objects.discovery_place.discovery_place_description": {},
        "ca_objects.discovery_place.discovery_place_type": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.range.range_description": {},
        "ca_objects.range.range_faunaBelgica": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.alternative_labels.alternative_labels_title": {},
        "ca_objects.alternative_labels.alternative_labels_type": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.otherSpecimens.otherSpecimens_scientificName": {},
        "ca_objects.otherSpecimens.otherSpecimens_nr": {},
        "ca_objects.otherSpecimens.otherSpecimens_sex": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.otherSpecimens.otherSpecimens_lifeStage": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.otherSpecimens.otherSpecimens_typeStatus": {
          "convertCodesToDisplayText": true
        },
        "ca_objects.unitdate": {},
        "ca_objects.brief_physcesc": {},
        "ca_objects.discovery_context": {},
        "ca_objects.discovery_method": {},
        "ca_objects.synonyms": {},
        "ca_objects.subfamily": {},
        "ca_objects.individualCount_mainSpecies": {},
        "ca_objects.individualCount_otherSpecies": {}
      }
    }
    ' http://localhost:8070/providence/service.php/find/ca_objects?q={keyword}&lang=nl_NL

    and it matches about 338 objects.
  • Thanks. Would it be possible to send a database dump, at least with partial data? I'm wondering if this is being triggered by something in the structure of your profile, and having your setup will make a much easier see what's going on  That many queries being triggered doesn't make sense. (We have Blackfire here too).
  • Thanks for asking, but I'm afraid that won't be possible. And basically, there's nothing special going on with the setup. I think the post body is just amazingly big. I think the problem is that not all requested details are indexed in ElasticSearch, that's why there's a whole lot of additional querying going on, a couple per result row. 

    So I was wondering what I can do to optimize the search_indexing.conf to prevent the SQL querying from happening. If I eliminate all the _x_ bundles from the request body, the load time decreases by 65%.
  • I don’t believe it’s the indexing. It’s the something in the way the service pulls related records once it has established the initial result set from the index. At least that’s my current guess. I’ll try to simulate your set up using a natural history data set I have that is not unlike what yours appears to be from the api call.
  • Thanks Seth. It all kinda happens in the SearchResult class , the get method on line 1440. 
  • Well obviously it originates from the SearchJSONService on line 162, because that's where the loop from the index results is located.
  • Yes and it goes down into the basement from there.
  • Hi Seth,

    Still on this issue. Do you think I can change my search_indexing.conf to store the related data, so that SearchResult::prefetchRelated isn't gonna get called anymore?

    These are the fields that the method is called for.

    ["ca_collections","idno"]
    ["ca_collections","preferred_labels"]
    ["ca_objects","idno"]"
    ["ca_objects_x_collections","type_id"]"
    ["ca_objects","preferred_labels"]"
    ["ca_objects_x_objects","type_id"]"
    ["ca_objects_x_occurrences","type_id"]"
    ["ca_objects_x_places","type_id"]"
    ["ca_objects_x_entities","type_id"]
    ["ca_objects_x_storage_locations","type_id"]
    ["ca_entities","idno"]
    ["ca_entities","preferred_labels"]
    ["ca_places","idno"]
    ["ca_places","preferred_labels"]
    ["ca_occurrences","idno"]
    ["ca_occurrences","preferred_labels"]
    ["ca_storage_locations","idno"]
    ["ca_storage_locations","preferred_labels"]
  • That's probably not going to make a difference but you can try. I'm looking into this as time allows, but time hasn't allowed as much as I would have liked to now.
Sign In or Register to comment.