Jump to content

Advanced Search - Where do I begin.


MuchDev
 Share

Recommended Posts

    Hey friends! Well I am finally to the part of my development where I need to begin thinking hard about implementing some form of advanced search. I am a bit of a newbie so please forgive me for asking such a general question. Up until now I have been able to see quite clearly where this project is going. I have all of the items defined that I will be using and I also have all of my main sections mostly fleshed out, what I do not have is anything other then ajax searching and the basic search.php template (with a couple extra fields defined). I have read several form posts about specific advanced search features, but I was hoping for something a bit more general and I would just code from there.  

   What I would like some insight on is where should I start in the development of an advanced search interface, what methodologies do people implement when producing a site wide search, and finally are there any really good tools that would make my life easier in this process? Any info is extremely appreciated. 

Link to comment
Share on other sites

A highly categorised site is a different beast then a textual 'animal' site. So what advanced is depends a lot from the context of your site.

For a big news portal (mainly textual), I used the google search api to search the site. This way all textual content is searched, even in PDF,s. The results I get back from google (json) I store for 1 hour with markupCache to reduce requests to google. I use the urls from the results to get back to the page Objects. This way I let google do the search and ProcessWire the presentation of the results. ( showing thumbnails etc, headlines descriptions etc. )

But if you have a site with a lot of categories, highly structural. I could imagine you build a chain of selectboxes. This search is a Perfect fit for ProcessWire.

  • Like 10
Link to comment
Share on other sites

We have used Apache SOLR as search index server. Extremely fast and extensive but not very easy to implement.

But first of all it depends on what sort of search interface you want to offer to your visitors.

Just one search field or more choices (like in Ryan's skyscraper site).

Link to comment
Share on other sites

@BernhardB, look at the following structure:

vehicles
    |
    +-- Cars
    |     |    
    |     +-- Coupe
    |     |       
    |     +-- SUV
    |     |    
    |     +-- Crossover
    |
    +-- Bikes
    |     |    
    |     +-- City bike
    |     |    
    |     +-- Comfort bike
    |     |    
    |     +-- Trekking bike
    |
    +-- Boats
          |    
          +-- Canoes
          |    
          +-- Rafts
          |    
          +-- Yachts‎
    

Say someone wants to search a SUV

First <select> is Vehicles, that lists all vehicles:  Cars, Bikes and Boats.

When cars is selected, query all car type & present the user with a select with all types.

When the user select a type, present the user with all found SUV's.

  • Like 1
Link to comment
Share on other sites

Thanks for all the response! All of this is really helping me to wrap my head around this. So the site that I am building is for an art gallery that my partner works for. On the site will be a combination of blogging for events, pdfs for catalogs (which are just downloadable and not necessary searchable), exhibitions which hold artworks and text writeups, and all of their current inventory for browsing. The site is layed out very hierarchical with all the sections nested like so:

(I'll follow your formatting example)

Exhibitions
   |
   +--Antique
   |     |
   |  exhibition---+       
   |               +--artwork  (same item type as in exhibitions) 
   |               +--artwork
   |
   +---Modern
   |      |
   |   exhibition---+       
   |                +--artwork 
   |                +--artwork
   |   
   |   
   +---Contemporary
         |
      exhibition---+       
                   +--artwork 
                   +--artwork
   Artists
      |
      +--Antique
      |     |
      |  section------+       
      |               +--artwork 
      |               +--artwork
      |
      +---Modern
      |      |
      |   artist-------+       
      |                +--artwork
      |                +--artwork
      |   
      |   
      +---Contemporary
            |
         artist-------+       
                      +--artwork 
                      +--artwork
					  
	
	News
	  |
      +--Antique
      |     |
      |  article------+       
      |               +--page1 
      |                     
      |                     
      |               
      |
      +---Modern
      |      |
      |   article------+       
      |                +--page1 - html from ckedit
      |                +--page2 - html from ckedit
      |   
      |   
      +---Contemporary
            |
         article------+       
                      +--page1 - html from ckedit
                      +--page2 - html from ckedit
					  
					  

     So what really needs to be filtered would be all of the special fields that are within the artwork like price, medium, dimensions, origin etc. As well as text fields that will hold specific information about events. Artists contain basic information and a thumbnail for a preview as well as biographical informaion (yet more text). 

A highly categorised site is a different beast then a textual 'animal' site. So what advanced is depends a lot from the context of your site.

For a big news portal (mainly textual), I used the google search api to search the site. This way all textual content is searched, even in PDF,s. The results I get back from google (json) I store for 1 hour with markupCache to reduce requests to google. I use the urls from the results to get back to the page Objects. This way I let google do the search and ProcessWire the presentation of the results. ( showing thumbnails etc, headlines descriptions etc. )

But if you have a site with a lot of categories, highly structural. I could imagine you build a chain of selectboxes. This search is a Perfect fit for ProcessWire.

Would you recommend google over a processwire solution for my example?

martijn, could you please explain what you mean by that?

@uprightbass360

maybe the elastic search module is interesting for you?

Thanks BernhardB elastic does look very powerful but I don't know if I have the chops to pull it off. After a two week hair pulling session with trying to get solr to work on another project I am a little adverse to setting this up. Do you happen to know much about elastic?

We have used Apache SOLR as search index server. Extremely fast and extensive but not very easy to implement.

But first of all it depends on what sort of search interface you want to offer to your visitors.

Just one search field or more choices (like in Ryan's skyscraper site).

Hey thanks reems, I originally thought about setting up solr as it seems to have exactly what I wanted but got completely confused. I applaud you for having the brains to get it working. I really wish that there was a more beginner method for implementing but everything I've found was exactly as you said, not easy to implement. What I am hoping to present the user with is a simple form with one input box, exactly like he has on the skyscraper, but combine that with the ability to search text without overloading the user. 

Link to comment
Share on other sites

Would you recommend google over a processwire solution for my example?

I would not recommend the one over the other. There are a lot of factors that will effect a decision. Time, money, experience in a 'search' technique, amount of traffic etc. etc.
 
Build what makes sense for your position. And you don't need to choose, maybe you're better of with 2 types of search in your site. 
Link to comment
Share on other sites

But, in your pagetree, can it be, that artwork under exhibition and under artist could be the same in certain situations?

Do you use then two pages for the same artwork or do you use the strength of Processwire to present it in the way you show it now?

  • Like 1
Link to comment
Share on other sites

For a big news portal (mainly textual), I used the google search api to search the site. This way all textual content is searched, even in PDF,s. The results I get back from google (json) I store for 1 hour with markupCache to reduce requests to google. I use the urls from the results to get back to the page Objects. This way I let google do the search and ProcessWire the presentation of the results. ( showing thumbnails etc, headlines descriptions etc. )

Hey Martijn - this sounds very interesting - when you have time, it would be cool to see your code - this sounds like a module waiting to happen :)

  • Like 6
Link to comment
Share on other sites

I would not recommend the one over the other. There are a lot of factors that will effect a decision. Time, money, experience in a 'search' technique, amount of traffic etc. etc.
 
Build what makes sense for your position. And you don't need to choose, maybe you're better of with 2 types of search in your site. 

I appreciate your strait forwardness. It's really nice to see quality direct answers. As far as doing 2 search engines, I fear that if I were to attemp this I would end up with something that would not be very usable, unless I could find an example of someone who had done it sucessfully already. Most everything that I am using or building is based on others examples and books. Now...If this were to be a module I think I could work some magic ;)

But, in your pagetree, can it be, that artwork under exhibition and under artist could be the same in certain situations?

Do you use then two pages for the same artwork or do you use the strength of Processwire to present it in the way you show it now?

This is a fantastic point reems! I should have thought of this. So right now if you were to want an item in both categories you would need to make a duplicate page. So to get around this I suppose I will throw the option for a page field in to exhibitions so that artwork can be added from other areas of the site, how does that sound?

Hey Martijn - this sounds very interesting - when you have time, it would be cool to see your code - this sounds like a module waiting to happen :)

Yes! What he said! Module, yes :)

Link to comment
Share on other sites

...........This is a fantastic point reems! I should have thought of this. So right now if you were to want an item in both categories you would need to make a duplicate page. So to get around this I suppose I will throw the option for a page field in to exhibitions so that artwork can be added from other areas of the site, how does that sound?...

Doesn't sound right  ;):) . No need to duplicate pages. Where an item can belong to more than 1 category, the 'normal' way is to create categories as separate pages, make them selectable in the items pages using a page reference field....This guy explains it better, have a read here instead: https://processwire.com/talk/topic/3579-tutorial-approaches-to-categorising-site-content/

  • Like 4
Link to comment
Share on other sites

Doesn't sound right  ;):) . No need to duplicate pages. Where an item can belong to more than 1 category, the 'normal' way is to create categories as separate pages, make them selectable in the items pages using a page reference field....This guy explains it better, have a read here instead: https://processwire.com/talk/topic/3579-tutorial-approaches-to-categorising-site-content/

    What an awesome piece of reading, for this I will kill a tree and sip my coffee. Thank you for your hard work . :) At the moment the artworks can be categorized under an optional generic template called section. This is pretty much just to break the display of items up and allow for subheadings within the display of an artist's works. This works but none of the items would then have any other category other than the department that they live in and the artist that they are made by (from a page tree perspective). I have just been determining this by an if else check on the parent. I really would like to make the site more categorized and I think you sent me all of the answers right there. 

  • Like 1
Link to comment
Share on other sites

Well it looks as if you nailed the heart of my problem, I'm not using enough page fields. As a side note how would I go about batch importing and relating a large amount of data by using a page field method? Is this something that batcher is well suited to?

post-2490-0-02520000-1408573306_thumb.jp

post-2490-0-96824200-1408573306_thumb.jp

post-2490-0-57976700-1408573307_thumb.jp

Link to comment
Share on other sites

.......As a side note how would I go about batch importing and relating a large amount of data by using a page field method? Is this something that batcher is well suited to?

Not quite following. Where are you importing from? Some external database? From within PW? If the former, batcher doesn't have such capabilities. if from within PW, meaning you want to add page fields to some template and then populate those page fields with existing PW pages, that can be done using the PW API using a custom script. How large is the data?

Link to comment
Share on other sites

So what I will be importing is all of their current data from their website which before parsing looks like this 

ackroyd-minster.jpg,Norman Ackroyd,Minster Lovell - Oxfordshire, 1992,Etching, aquatint. Edition of 90,23-1/2 x 29-1/2 inches,$1,500

This will be around 10,000 artworks possibly more if I can ever get their in house database to give me any sort of usable data set. All of the data is roughly coma separated so I will be using csv importer to bring the records into processwire. 

Link to comment
Share on other sites

OK, so that's external data, but what format is it in? CSV? SQL? other? I can't really tell from your sample :-) But yes, it will have to be a custom script. Before that, you will have to map out the relations, create the receiving fields in PW, etc. There some examples in the forums (can't find them now) but others will chime in :-)

Link to comment
Share on other sites

Sorry it wasn't clear I think I am unintentionally being vague because this part of the project kind of scares me :) . So this data is coming in the form of text that I have parsed out of manually entered php files. Their website is set up in a way that an item is little more than echoing a string from a simple php function. So one arwork looks like this 

artinfo('sato-plum.jpg','Takahiro Sato','Plum','Mezzotint, 2006. 4/50','3 x 5 inches','$195','sato');

Now imagine 500 or so pages with that function. That is where I am getting the data. For this reason I am choosing to go with more fields as there will be an unfortunate amount of non-normal data. In some cases you will see a function defined like above. Other times fields would contain different formatting. Ie dimensions can be defined like 3 x 5 inches or 3'' x 5'' or 3in by 5in. 

Link to comment
Share on other sites

That will be a difficult one.

I see that the two examples of artinfo you show are already different. In the first one the year is the 4th item, in the second example the 5th. Simply exporting to csv and expecting to be able to import it into Processwire is not going to work.

Hope the differences are not too big over all the data..... ???

So, the data shall have to be cleaned up first.

The different ways of showing size can be overcome in the beginning by just using a textfield.

How to deal with the data that has to be entered in pagefields I don't no yet. Maybe others can explain that.

Btw....Maybe the cleaning up is faster to do in Excel. I've done it before in database migrations and having a good view over the columns and rows helps and also a little bit of formulas and Visual Basic afcourse.

  • Like 1
Link to comment
Share on other sites

Thanks reems. You got it right away. The formatting is indeed this bad throughout the entire data set. Using some regex and excel I have managed to get things better, but they are still a ways off. I have a friend that is also in the computer science program that I am attending who is a bit better with db / vb. I hope to get him on board, but when labor is free you can't be in too much of a hurry. So at this point I am just chugging through the framework of the site while I wait on the data itself, which I think is one reason why I am messing some things up. Don't worry I wasn't planning on pointing pw to that junk quite yet :).

Yeah its still a mystery if I should use some more page fields, but I am thinking I can use the skyscraper example to build out a pretty good search with some help from another buddy. Hopefully I get a little more feedback but  We'll see I guess.

Link to comment
Share on other sites

Ok so I managed to wrangle the first set of about 7000 records. Now I feel as if I am staring into the abyss. I am not expecting anyone to chime in but given the expertise already displayed I figured posting here couldn't hurt.  So I now have about 1700 more records left, but I fear these will be the worst of it.  Below is a chunk that I rough parsed just for importing into excel (using pipe delimiters) . The data below fits a pattern but throughout the data set there is a whole pile of incorrectly mapped fields around 3000 or so is my guess. Does anyone have any experience with any tools that would be able to analyze this crud?

 hasui-32319.jpg | Kawase Hasui (Japanese, 1883-1957) | Snow at Hie Shrine (Shatō no yuki (Hie jinja)) | Color woodblock, 1931. Hasui signature in the plate. Artist seal. Watanabe seal. Ref: Hotei 242 | Oban tate-e | 
    hasui-32320.jpg | Kawase Hasui (Japanese, 1883-1957) | Tamon temple, Hamahagi, Bōshū (Bōshū Hamagahi Tamonji) | Color woodblock, 1934. Hasui signature in the plate. Artist seal. Watanabe seal. Ref: Hotei 349 | Oban tate-e | 
    hasui-32321.jpg | Kawase Hasui (Japanese, 1883-1957) | Zensetsu temple, Sanshu (Sanshū Zensetsūji) | From COLLECTION OF SCENIC VIEWS OF JAPAN II, Kansai edition. Color woodblock, 1937. Hasui signature in the plate. Ref.: Hotei 334. Tape residue top and bottom margins | Ōban tate-e | [32321c] | ');
    hasui-32322.jpg | Kawase Hasui (Japanese, 1883-1957) | Cherry Blossoms at Yasukuni Shrine, Tokyo | Color woodblock, 1936. Hasui seal. Watanabe seal. Ref: Hotei Hb-s1 | 6-3/4 x 4-12 inches | 
    hasui-32323.jpg | Kawase Hasui (Japanese, 1883-1957) | Hachiman Shrine, Kamakura | Color woodblock, 1936. Hasui seal. Ref: Hotei Hb-s10 | 6-3/4 x 4-12 inches | 
    hasui-32324.jpg | Kawase Hasui (Japanese, 1883-1957) | Snow at Miyajima shrine | Color woodblock, ca. 1930s. Hasui seal. Ref: Hotei Hp-55 | 5-7/8 x 3-3/4 inches | 
    hasui-32325.jpg | Kawase Hasui (Japanese, 1883-1957) | Chūzen Temple, Utagahama (Chūzenji Utagahama) | Color woodblock, ca. 1930s. Hasui seal. Ref: Hotei Hp-50 | 5-7/8 x 3-3/4 inches | 
    hasui-42501.jpg | Kawase Hasui (Japanese, 1883-1957) | Sakurada Gate (Sakuradamon) | From <u>Twenty Views of Tokyo</u>. Color woodblock, 1928. Ref.: Hotei 156. A postwar impression. Watanabe seal in lower left | Oban yoko-e | 
    hasui-42565.jpg | Kawase Hasui (Japanese, 1883-1957) | Evening snow at Terajima village (Yuki ni fururu Terajima mura) | From <u>Twelve Scenes of Tokyo</u>. Color woodblock, 1920. Hasui signature in the plate. Artist seal. Watanabe seal. Ref: Hotei 35 | Oban tate-e. Trimmed margins | 
Link to comment
Share on other sites

@BernhardB, look at the following structure:

vehicles
    |
    +-- Cars
    |     |    
    |     +-- Coupe
    |     |       
    |     +-- SUV
    |     |    
    |     +-- Crossover
    |
    +-- Bikes
    |     |    
    |     +-- City bike
    |     |    
    |     +-- Comfort bike
    |     |    
    |     +-- Trekking bike
    |
    +-- Boats
          |    
          +-- Canoes
          |    
          +-- Rafts
          |    
          +-- Yachts‎
    

Say someone wants to search a SUV

First <select> is Vehicles, that lists all vehicles:  Cars, Bikes and Boats.

When cars is selected, query all car type & present the user with a select with all types.

When the user select a type, present the user with all found SUV's.

thank you martijn, now as i found out about this cool feature i know what you mean! here is an explanation of how to do selects based on values of previous selects: https://processwire.com/talk/topic/5941-state-and-cities/?p=58161

sorry uprightbass360, little late and offtopic for your current problem...

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...