Road Map for an Open Source Search Engine
Some details of what I have done and what I am currently working on. I have several other projects that take up a substantial amount of my time and I am doing a Maths Degree so this project does not get as much work as I would like to put into it but its getting there. I am always looking for help so if you want to get involved let me know.
I have actually just started building the lexicon. This is just a simple parser written in Perl that store the data in a Postgresql database. I have been quite strict with the lexicon so I am not expecting it to become too huge. I don't have the processing power or the room to cope with something huge to cope with something massive. Unfortunately I need more SCSI because the IO involved is really slowing the parser down. Perl is finding words quicker than I can store them. I suppose I should look at Berkely DB or some other method
| Task | State | Skills required |
|---|---|---|
| Write Polite Spiders | Done | Perl Postgres and HTTP Protocol |
| Collect 1 million Test Pages | 600,000 Collected | Postgres Perl Linux |
| Build Lexicon | Current Work: 90,000 entries found | C/C++ |
| Build Reverse Index for 1 million pages | Current Work: | C/C++ |
| Write C++ for handling the reverse indexes | TODO | C/C++ Linux |
| Research Ranking Algorithm | TODO | C/C++, Maths, Comp Sci |
| Build Front end to the search engine | TODO | Web design, HTML |
| Buy or scroungs an online test machine for the search engine | TODO | Sales |
| Get it hosted | TODO | Sales, Marketing, Money Money Money |