Interactive Question Answering Chatbot (IQABOT)

For my final year project at uni I decided to write an interactive question answering chatbot, which I lovingly named IQABOT. Though the project ultimately received a First, I think I could have planned better had I known a little more about what writing a question answering chatbot involved.

I’ll start by talking about my ideas and how insane they first were, then move on to resources I used and pitfalls to avoid. You can also find attached the Final Report that I submitted for the project. I’ll include some bibtex stuff in case any one decides to reference the paper but for the most part you should go through the bibliography and read the papers I’ve used. So let’s begin.

Planning to Fail

We are all familiar with the saying “failing to plan is planning to fail.”. Never could I have thought it so true than when it came to planning my final year project at King’s College London under the excellent supervision of Dr. Jonathan Ginzburg.

As a natural born geek (read: fan of Star Trek) I had always been a little excited about human-computer interaction using natural language. In fact I wrote my first programme on a childs laptop (I was that child) that came with some flavour of BASIC. The programme would return messages depending on what the user typed and all-in-all was a bit pants.

Anyway, a few years and many Star Trek movies later I decided to undertake the Interactive Question Answering project as posted by Dr. Ginzburg. With all the excitement of a dog at Crufts (they it like right? Otherwise why would they keep coming back) I set about reading papers on question answering, chatbots and natural language processing in general. Having read enough papers to light a decent sized bonfire I began drawing up sketches for a complete interactive question answering programme.

After a swift kick in the head from reality I quickly realised that I had nowhere near enough time to do the whole system myself. Worse still, I’d already wasted almost 2 months.

Don’t get too excited and try to do it all. Plan your project around the most important aspect of your projects title. In my case ‘interactive’.

With this in mind I recycled a truck load of papers and set about looking at what made my system interactive. So now I had significantly reduced my workload to something feasible and luckily had done enough research on the interactive element to not have completely wasted those first few months.

Resources

Natural language processing (NLP) has a lot of “hot topics” including sentiment analysis, speech recognition, automatic translation, (interactive) question answering, explanation generation and I’m sure a lot more.

And precisely because NLP has so much interesting research going on you will find a wealth of resources available to you. I will make mention of those I used but you can find alternatives in the included report and by searching the web of course.

The Natural Language Tool Kit
First up is the Natural Language Tool Kit (NLTK). NLTK is a swiss-army programming suite for those who wish to dive in to natural language processing. The libraries are written in Python (more on that later) and everything is open-sourced. As if this wasn’t enough there is also an NLTK book which is available for free online but purchase one (or donate to the project) to help fund further efforts.

However, ne’er let it be said that my voice sings out praise alone. NLTK is not a commercial product and so lacks the rounded edges that might help novices feel safer when using it. This isn’t too much of a problem though and shouldn’t put off anyone who is eager to learn.
Another problem stems from the fact that some code hasn’t been included in the book or online. One problem I faced was trying to find out precisely what corpora and features had been used to train the default named-entity taggers. Posts in the NLTK users group suggest that this may be remedied when the contributors have the time.
The third and final issue with NLTK is solely to do with the book and the noticeable lack of answers for the exercises contained within. Posts on the user group suggest that “official” answers may never be provided. But chances are that if you join the user group and search previous posts you will find the answer you seek.

TrueKnowledge & START
Although I would be writing the code for the interactive parts of IQABOT I needed existing programmes to provide the answers. My designs for IQABOT would allow for answers to be provided by multiple question-answer services. Originally I had intended to implement some redundancy so that when one service didn’t know the answer another could be queried without user intervention. I was again short on time and the XML service for START was still in its experimental phase so I only got to use TrueKnowledge which was very good but still had some way to go before catching up with STARTs vast knowledge.

Python
Python has ruined other programming languages for me. You cannot know the joy of programming in Python until you have tried it yourself.

Choosing the right language for your project is incredibly important. I whittled my choices down to Python and Java and after reading about Pythons strength with regards to text processing my choice was made. There is an example somewhere on the net that demonstrates some text processing functionality written in Python and Java. The Python code took up 1 line whereas the equivalent Java code was somewhere in the region of 5 or 6 lines. Not a huge difference, but as your code begins to grow to thousands of lines you will be glad to be writing in Python.

I used Python 2.6 for the project as NLTK had not yet (and at the time of writing still hasn’t) migrated to Python 3.0. NLTKs authors have said that at some point in the future NLTK will be migrated though so we should be able to take advantage of some of version 3.0s niceties.

PyAIML
Some of you may be familiar with AIML (the Artificial Intelligence Markup Language). For those that aren’t; it’s an XML-compliant markup language specifically designed for use with with A.L.I.C.E. A very advanced chatterbot that won several prizes. Unfortunately I wasn’t able to get the Python implementation of AIML (PyAIML) to work before project deadline so didn’t have a chance to play with it but Dr. Suresh Manandhar published results that suggested his system, YourQA, made good use of AIML.

That’s all I have for now. I’m attaching – quite what that means I’m not entirely sure – my report for the IQABOT project to this post in the hope that it might help someone out there who is getting started with interactive question answering. It is a fun and challenging area and I hope you have as much fun, working on a programme, as I did.

Here is the report

James.

Advertisements
This entry was posted in NLP, Programming, Python. Bookmark the permalink.

13 Responses to Interactive Question Answering Chatbot (IQABOT)

  1. Timothy says:

    Interesting project.

    Since I could not find your email address to correspond please email me to discuss the feasibility of a project to leverage this chatbot experience if you are interested. We also are using NLTK but have only scratched the surface.

    Tim

  2. praveen seela says:

    hi…ur article so helpful…i am also trying to do a project on the same same tools.so if u can share ur project’s code it ll b a great help for me..plz…share your code..plz…

    • James says:

      Hi Praveen,
      The code is still a bit of a mess after I got halfway through refactoring it. I don’t anticipate I’ll have the time to sort it out in the near future but please feel free to ask me any questions or you can always use the nltk forums.

      Best,
      James.

  3. tako says:

    Hi,

    Very nice article. I’m new to NLP and really like it. I’ve downloaded NTLK tool kit but not sure where to start.

    Ever plan on release your code for educational purposes? Can you send me an email?

    Thanks,
    Tako

  4. uday says:

    Hi,

    I’m trying to create a FAQ website using NLTK.
    Basically I have a text file that has all the information needed. When the user gives a question , the website will process it and return the probable answer.

    So far i was able to get the keywords from the question and make a search of those keywords in text file.
    But I feel like I’m lost. I’m not getting the appropriate answer. Not even close to what i’m expecting.

    Another approach i felt is to use this example:

    from nltk import load_parser
    cp = load_parser(‘grammars/book_grammars/sql0.fcfg’)
    query = ‘What cities are located in China’
    trees = cp.nbest_parse(query.split())
    answer = trees[0].node[‘SEM’]
    q = ‘ ‘.join(answer)
    print q
    SELECT City FROM city_table WHERE Country=”china”

    This will generate a query and based on that I can retrieve the answer but then Ill have to modify the text file and convert it into a table. Push it in mysql database and get the answers . But that is not optimum solution. It will be a hassle to convert the text file to a table.

    Can anyone please guide me what I should be doing?
    Any sample code or tutorial would be really helpful.

    • James says:

      Can’t you just read through the text file one word at a time?

      keywords = [list of keywords]
      keywords_matched = 0
      for word in file:
      if word in keywords:
      keywords_matched = keywords_matched + 1

      Then place the file in a list of results if it matches some criteria. Might want to do some ranking too.

      This approach is quite simple and naive but it should do a basic job.

      James.

  5. Sebastien says:

    Hi,

    I read you quote the “Automatic categorization of questions for user-interactive question answering” paper via scholar, but the link for your report is broken. Can you email me your report or repair the link ?

    Sebastien.

    • James says:

      Hi Sebastien,

      My files are in a state of disarray at the moment so I’m having trouble finding the paper. I imagine I have a copy somewhere so if I do find it I’ll send it on.

      As the paper was only published in 2008 I imagine that it shouldn’t be too difficult to track down from a university network if you or anyone you know is at one.
      I sometimes found it useful to specify the file type in Google when I was searching for papers too, if that helps at all.

      As I said though, I’ll let you know if I come across it.

      Regards,
      James.

  6. Kevin says:

    Hey James,
    Good post. The link to the final report seems broken, any chance you can fix it?
    Cheers

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s