Wednesday, February 25, 2009

DeepWeb exploration (blogpost#4)

The New York Times recently published an article called " Exploring a 'Deep Web' That Google Can't Grasp ," dealing with new technologies that are attempting to break into the "Web of hidden data."  Last year, Google added its one trillionth Web page, but as NYT points out,  still can't satisfactorily answer questions like "What's the best fare from New York to London next Thursday?"   The Web does contain this kind of data, but search engines often have trouble locating the answers in an efficient way.

Most search engines use spider/crawler type programs to find information, following trails of hyperlinks that link the never-ending Web.   But this leaves an almost infinite amount of data which lies below the surface unexplored.  Deep Websearch start-up companies are trying to develop programs that analyze search terms to then broker the query to relevant databases.  

Google's strategy sends a program to analyze every database's content, "define" its content, then hit it with related search terms to develop a predictive model of what the database contains.  Another start-up company is attempting to index every public database, hitting them with automated search terms to "dislodge" the information.  The goal is interconnected data... a cross-referencing of pre-analyzed information to best answer a specific query.  It's almost as if these Deep Web programs are alive, reasoning, and thinking for themselves.

The article mentions that in the future, Google may have problems implementing a "change."  There is a fear of overcomplication and driving away faithful users.  Also, I wonder if all of this will end up making things more efficient.  When described on paper, it seems very promising.  But, how does a program sift through an infinite amount of information to find, link, and cross-reference reliable and accurate information?  How does it link seemingly unrelated information?  How does it stay up-to-date?  If we are going to be able to answer questions about flight fares, etc.. those are changing minute-by-minute.  

Finally, it is mentioned that the long-term implications for something like this are directed more towards businesses and less towards individual web-surfers.  But, I think that it is also very important for libraries.  The article mentions health sites cross-referencing pharmaceutical companies and medical research, or news sites cross-referencing public records on government databases.  This could have a large impact on the types of information that libraries could provide.

Monday, February 16, 2009

i swear i don't hate technology, but here's my two cents about chacha (blogpost#3)

Besides searching online catalogs, my most common interaction with technology at the library occurs with ChaCha.  Through some sort of contract, IU has a deal with ChaCha where IU related questions are fielded through the libraries here, and at Bloomington.  We at the reference desk have essentially become ChaCha guides.  I work both for University Library and for Herron Art Library.  I've only received one question during my shift at UL... about an IU basketball game, I believe.  However, at Herron, it is a whole different story.  Within the past 2 weeks, I've seen questions about who would win in a fight between Superman and Spiderman, how to make paper claws for your fingers, how much weight the world's strongest toothpick bridge could hold, how to tattoo yourself with a toothbrush, how to time travel, etc.  I'm not sure why I've received those questions... they definitely aren't about IU.

I've been disappointed with the results of this partnership between IU and ChaCha.  Maybe others have, too.  Recently, they've come up with a way to test the system by having students sign up and send questions to ChaCha, putting "IU" before their question so that an IU library would definitely be the guide.  The test is still ongoing, but I've noticed no difference in the type or amount of questions received.  

I'm curious what others think about ChaCha and its place in a library.  When libraries already offer a chat service, phone service, face-to-face service, email service, is ChaCha really necessary?  Is it opening the market to a new audience?  I doubt it.  Especially when we are bound by a 160 character answer (which btw, disappears quicker than you would think), how can we offer the level of service that we need to provide?

Sunday, February 1, 2009

bob loblaw's law blog (blogpost#2)

OK, probably anyone who has ever watched Arrested Development will recognize the title to this blog entry.  (sidenote: am I supposed to cite that?!)  I really only chose it because I'm still coming to terms with blogging.  I feel like I'm talking for the sake of talking, and to my ears, it comes across as "blah blah blah blah blah."  That being said... I've only posted once so far, so my insight may not be fully justified.

I see why blogging can be useful and important.  Obviously, it is a form of communication.  My feeling is that blogging is part of, or a result of, the general public's access to continually advancing technology... it is a place to express oneself, debate with others, or inform the masses about any particular thing.   How does one discern which information is important or useful and which is excessive nonsense?  What if I blogged: "elephant toothpick roller skate truck" 477 times?  Some search engine would inevitably log that entry.  And if by chance, someone were to be looking for a toothpick holder shaped like an elephant, undoubtedly, somewhere within the results, my useless blog would turn up.  

I think that is relevant to today's libraries.  People are in search of information.  I work a reference desk right now... I try to find information.  Excessive nonsense affects the efficiency of my work.  Obviously, technology is not solely to blame for this, but how do we control the use of technology to keep our work accurate and efficient?  Technology has the reputation of making our job easier, and it does.. to some extent.  However, can something be easier and also be less efficient?  

Maybe this will be my last skeptical look at technology... I might be giving myself a bad reputation.  Another sidenote: I found it funny that my computer's automatic spellcheck did not underline "blogged."  I know that it has become a part of everyday vocabulary, but I wonder how long it took.  I checked the word "texting."  Still underlined in red... I'll check back.