How Did we make it: DirectHERS Search Engine

DirectHERS is a project that I was part of for the course DH Methodology and Practice. As a team, we build a text encoding project to represent Women Directors. I was in charge of building the search engine and was a part of the Dev team with Gemma for the project. When looking at possibilities for search engine creation, we considered three main routes:

1)  Building a search engine with vanilla Javascript and Ajax which would be running within the limited capacity of GitHub pages. Although this option seemed feasible, the downside was the latency of information processing as the team would not be able to use a dedicated virtual machine (VM) in the cloud, hence causing the crawler to be slow at indexing and showing results. This solution would have also required optimization and code refactoring to enhance its performance.

2)  Incorporating a basic search within GitHub pages in the knowledge that this would be limited to keywords search only but could grant a functional engine that could produce the desired output.

3) Creating a search engine within Tableau and leveraging Tableau Public’s resources that would later be embedded into our GitHub page. This solution seemed very palatable as it would meet the requirements for minimal, but efficient, computing.

Further to various attempts, the team would settle for a hybrid option, which also ended up being the most integrable solution for our website. We indeed used JavaScript but instead of allowing the crawler to dynamically crawl through XML files, we decided to build a common structure for our directors. This is basically a long structured xml file with tags that are relevant to all the directors. We later ingest the files directly with JavaScript to make them searchable. This uniformity enhances the query search runtime significantly. Since our sources are not changing dynamically, we do not need a dynamic crawler like a traditional search engine(for instance Google). Also, to finetune the runtime further, we have used a dictionary-based approach(key, value) where the key is the XML tags for the directors and the value is the associated information contained in the tag. The solution is great for cross-director search and can be used as a unique pedagogical tool to research the directors.