What is Google doing to our language?

The other day, I attended a course in the lab on concordances and how these can be used for linguistic research and in language education. Jon, who was giving the course, pointed out that the most commonly used concordance tool today probably is Google, and we had an interesting discussion about the consequences of this.

I am sure many of you use Google to check different constructions and compare frequencies; at least this behavior seems to be common among those of us who do not have English as our native language. But how reliable are these results? In some cases the frequency numbers differ so greatly that there is no question that one construction is incorrect. However, sometimes the differences are not as great, and a more thorough analysis of the results is needed. For example, one has to evaluate some of the sources for the hits found (for instance, if you only get a small sample and the majority are from .se addresses, it is likely that we are only dealing with a common Swenglish expression).

One advantage I see with using Google instead of traditional corpora for concordance is that Google captures language at it is used today. We may well find examples of constructions previously considered as incorrect which now appearto be common, and this might lead us to accept that language is constantly changing. Maybe to sometimes split infinitives is not that terrible a crime after all?

This entry was posted in linguistics. Bookmark the permalink.

5 Responses to What is Google doing to our language?

  1. Stephanie says:

    How funny. I blogged about the same thing a couple of years ago. http://www.sumofmyparts.org/blog/?p=393 It *is* an interesting question.

  2. therese says:

    Thanks for pointing it out, Steph :-) . I didn’t remember that, but now that I read it I remember at least taking part in the class you’re referring to. (oh oh… preposition stranding… but I’m sure Google would approve).

  3. Patrik says:

    There are also concordance-like tools that draw on the web as material. E.g. http://www.webcorp.org.uk/. There are a number of methodological concerns with using “the web” as a corpus, of course, but it is quite useful (and as you point out, Therese, pervasive). There is a big difference, though, between checking a construction or comparing two alternatives, and doing a more full-scale linguistic analysis. Both in terms of methodological issues such as representativity, chronology etc. and in terms of the complexity of the tools (ability to handle complex queries, tags, frequency analysis, collocations etc.)

  4. therese says:

    Interesting reference, Patrik. You are absolutely right about the different functions of different tools… For linguistic analysis, it would also be interesting to compare results from Google with those generated in more advanced concordance tools.

  5. Patrik says:

    Another note: It is also interesting to see how companies like Amazon and various web 2.0-services are pushing the envelope. I think some of their concordance/search/visualization functions (nost least interface and accessability) in the way they have implemented are much more interesting than many traditional concordance implementations.

Comments are closed.