Futurium – European Commission

27 Nov

See on Scoop.itComputational Music Analysis

“… 

Art practice will gain a whole new status and role in future societies. Creativity will be key to harness the new possibilities offered by science and technology, and by the hyper-connected environments that will surround us, in useful directions. Art, science and humanities will connect to help boost this wave of change and creativity in Europe.

…”

Olivier Lartillot‘s insight:

Here is first of all a bit of background related to this Futurium project from the European Commission:

“If you are interested in policy-making, this is the right place to be! Have a say on eleven compelling themes that will likely shape policy debates in the coming few decades!

They are a synthesis of more than 200 futures co-created by hundreds of "futurizens", including young thinkers as well as renowned scientists from different disciplines, in brainstorming sessions, both online and actual events all around Europe.

The themes include many insights on how policy-making could evolve in the near future. They can potentially help to guide future policy choices or to steer the direction of research funding; for instance, because they cast new light on the sweeping changes that could occur in areas like jobs and welfare; also by furthering our understanding of new routes to the greater empowerment of human-beings; and by exploring the societal impacts of the emergence of super-centenarians.

Everyone can now provide feedback and rate the relevance and timing of the themes.

Which one has the greatest impact? When will these themes become relevant?

Vote and help shape the most compelling options for future policies!”

Below is the theme “Art, sciences, humanities”. All these ideas seem to have important repercussion in music research. It would be splendid to see such ideals having an impact in future European research policies. So if you would support these ideas, please vote for this theme in the poll, which closes at the end of the week.

 

“The challenges facing humanity are revealing themselves as increasingly global and highly interconnected. The next few decades will give us the tools to start mastering this complexity in terms of a deeper understanding, but also in terms of policy and action with more predictability of impacts.

This will result from a combination of thus far unseen Big Data from various sources of evidence (smart grids, mobility data, sensor data, socio-economic data) along with the rise of dynamical modelling and new visualisation, analysis, and synthesis techniques (like narrative). It will also rely on a new alliance between science and society.

The virtualisation of the scientific process and the advent of social networks will allow every scientist to join forces with others in the open global virtual laboratory.  Human performance enhancement and embeddable sensors will enable scientists to perceive and observe processes in the real world in new ways. New ICT tools will allow better understanding of the social processes underlying all societal actions.

Digital games will increasingly be used as training grounds for developing worlds that work – from testing new systems of governance, to new systems of economy, medical and healing applications, industrial applications, educational systems and models – across every aspect of life, work, and culture.

Digital technologies will also empower people to co-create their environments, the products they buy, the science they learn, and the art they enjoy.  Digital media will break apart traditional models of art practice, production, and creativity, making production of previously expensive art forms like films affordable to anyone.

The blurring boundaries between artist and audience will completely disappear as audiences increasingly ‘applaud’ a great work by replying with works of their own, which the originating artist will in turn build upon for new pieces.  Digital media creates a fertile space for a virtuous circle of society-wide creativity and art production.

Art practice will gain a whole new status and role in future societies. Creativity will be key to harness the new possibilities offered by science and technology, and by the hyper-connected environments that will surround us, in useful directions. Art, science and humanities will connect to help boost this wave of change and creativity in Europe.

Key Issues

•How do we engage policy makers and civic society throughout the process of gathering data and analysing evidence on global systems? How do we cross-fertilise sciences, humanities and art?

•How do we ensure reward and recognition in a world of co-creation where everyone can be a scientist or an artist from his/her own desktop? How do we deal with ownership, responsibility and liability?

•How do we keep scientific standards alive as peer-reviewed research and quality standards are challenged by the proliferation of open-access publication? How do we assure the quality and credibility of data and models?

•How do we channel the force of creativity into areas of society that are critical but often slow to change, like healthcare, education, etc.?

•How do we ensure universal access and competency with emerging digital and creative technologies? Greater engagement of citizens in science and the arts? How do we disseminate learning about creativity and the arts to currently underserved populations?

•Equitable benefit distribution: how do we ensure that the benefits scientific discoveries and innovations are distributed evenly in society?

•Clear, effective communication, across multiple languages: how do we communicate insights from complex systems analyses to people who were not participants in the process in ways that create value shifts and behavioural changes to achieve solutions to global issues?

•Can the development of new narratives and metaphors make scientific results accessible to all humanity to reframe global challenges?

•Can the virtualisation of research and innovation lifecycles, the multidisciplinary collaboration and the cross fertilisation with arts and humanities help improve the impact of research?

•Transformation of education: how might the roles of schools and professional educators evolve in the light of the science and art revolution? What might be the impact on jobs and productivity?

•How do we respond to the increasing demand for data scientists and data analysts?

•How do we cope with unintended and undesirable effects of pervasive digitization of society such as media addictions, IPR and authenticity, counterfeiting, plagiarism, life history theft? How do we build trust in both artists and audiences?

•How do we ensure that supercomputing, simulation and big data are not invasive to privacy and support free will and personal aspirations?

•Can crowd-financing platforms for art initiatives balance the roles in current artistic economies (e.g. arts granting agencies, wealthy patrons)?

•How do we harness digital gaming technologies, and developments in live gaming, to allow users to create imagined worlds that empower them and the communities they live within?”

See on ec.europa.eu

Advertisements

Shazam-Like Dolphin System ID’s Their Whistles: Scientific American Podcast

6 Nov

See on Scoop.itComputational Music Analysis

Olivier Lartillot‘s insight:

I am glad to see such popularization of research related to “melodic” pattern identification that generalizes beyond the music context and beyond the human species, and also this interesting link to music identification technologies (like Shazam). Before discussing further on this, here is first of all what this Scientific American podcast explains in a simple way the computational attempt of mimicking dolphins’ melodic pattern identification abilities:

 

“Shazam-Like Dolphin System ID’s Their Whistles: A program uses an algorithm to identify dolphin whistles similar to that of the Shazam app, which identifies music from databases by changes in pitch over time.

Used to be, if you happened on a great tune on the radio, you might miss hearing what it was. Of course, now you can just Shazam it—let your smartphone listen, and a few seconds later, the song and performer pop up. Now scientists have developed a similar tool—for identifying dolphins.

Every dolphin has a unique whistle.  They use their signature whistles like names: to introduce themselves, or keep track of each other. Mothers, for example, call a stray offspring by whistling the calf’s ID.

To tease apart who’s saying what, researchers devised an algorithm based on the Parsons code, the software that mammals, I mean that fishes songs from music databases, by tracking changes in pitch over time.

They tested the program on 400 whistles from 20 dolphins. Once a database of dolphin sounds was created, the program identified subsequent dolphins by their sounds nearly as well as humans who eyeballed the whistles’ spectrograms.

Seems that in noisy waters, just small bits of key frequency change information may be enough to help Flipper find a friend.”

 

More precisely, the computer program generates a compact description of each dolphin whistle indicating how the pitch curve progressively ascends and descends. This enables to get a description that is characteristic of each dolphin, and to compare these whistle curves and see which curve belongs to which dolphin.

 

But to be more precise, Shazam does not use this kind of approach to identify music. It does not try to detect melodic lines in the music recorded by the user, but take a series of several-second snapshot of each song, such that each snapshot contains all the complex sound at that particular moment (with the polyphony of instruments). A compact description (a “fingerprint”) of each snapshot is produced, that indicate the most important spectral peaks (let’s say the more prominent pitch of the polyphony). This fingerprint is then compared with those of each songs in the music database. Finally the identified song in the database is the one whose series of fingerprints fits best with the series of fingerprints of the user’s music query. Here is a simple explanation of how Shazam works: http://laplacian.wordpress.com/2009/01/10/how-shazam-works/

 

Shazam does not model *how* humans identify music. The dolphin whistle comparison program does not model *how* dolphins identify each other. And Shazam and the dolphin whistle ID program do not use similar approaches. But on the other hand, we might assume that dolphins and humans abilities of identifying auditory patterns (in whistles, in music for humans) rely on same core cognitive processes?

See on www.scientificamerican.com

Scientific Data Has Become So Complex, We Have to Invent New Math to Deal With It – Wired Science

12 Oct

See on Scoop.itComputational Music Analysis

Olivier Lartillot‘s insight:

[ Note from curator: Wired already wrote an article about Carlsson and his compressed sensing method.

There are interesting critical comments about this article in Slashdot: http://science.slashdot.org/comments.pl?sid=4328305&cid=45105969

Olivier ]

 

“It is not sufficient to simply collect and store massive amounts of data; they must be intelligently curated, and that requires a global framework. “We have all the pieces of the puzzle — now how do we actually assemble them so we can see the big picture? You may have a very simplistic model at the tiny local scale, but calculus lets you take a lot of simple models and integrate them into one big picture.” Similarly, modern mathematics — notably geometry — could help identify the underlying global structure of big datasets.

 

Gunnar Carlsson, a mathematician at Stanford University, is representing cumbersome, complex big data sets as a network of nodes and edges, creating an intuitive map of data based solely on the similarity of the data points; this uses distance as an input that translates into a topological shape or network. The more similar the data points are, the closer they will be to each other on the resulting map; the more different they are, the further apart they will be on the map. This is the essence of topological data analysis (TDA).

 

TDA is an outgrowth of machine learning, a set of techniques that serves as a standard workhorse of big data analysis. Many of the methods in machine learning are most effective when working with data matrices, like an Excel spreadsheet, but what if your data set doesn’t fit that framework? “Topological data analysis is a way of getting structured data out of unstructured data so that machine-learning algorithms can act more directly on it.”

 

As with Euler’s bridges, it’s all about the connections. Social networks map out the relationships between people, with clusters of names (nodes) and connections (edges) illustrating how we’re all connected. There will be clusters relating to family, college buddies, workplace acquaintances, and so forth. Carlsson thinks it is possible to extend this approach to other kinds of data sets as well, such as genomic sequences.”

[… and music?!]

 

 “One can lay the sequences out next to each other and count the number of places where they differ,” he explained. “That number becomes a measure of how similar or dissimilar they are, and you can encode that as a distance function.”

 

The idea behind topological data analysis is to reduce large, raw data sets of many dimensions to compressed representation of the data sets in smaller lower dimensions without sacrificing the most relevant topological properties. Ideally, this will reveal the underlying shape of the data. For example, a sphere technically exists in every dimension, but we can perceive only the three spatial dimensions. However, there are mathematical glasses through which one can glean information about these higher-dimensional shapes, Carlsson said. “A shape is an infinite number of points and an infinite amount of distances between those points. But if you’re willing to sacrifice a little roundness, you can represent [a circle] by a hexagon with six nodes and six edges, and it’s still recognizable as a circular shape.”

 

That is the basis of the proprietary technology Carlsson offers through his start-up venture, Ayasdi, which produces a compressed representation of high dimensional data in smaller bits, similar to a map of London’s tube system. Such a map might not accurately represent the city’s every last defining feature, but it does highlight the primary regions and how those regions are connected. In the case of Ayasdi’s software, the resulting map is not just an eye-catching visualization of the data; it also enables users to interact directly with the data set the same way they would use Photoshop or Illustrator. “It means we won’t be entirely faithful to the data, but if that set at lower representations has topological features in it, that’s a good indication that there are features in the original data also.”

 

Topological methods are a lot like casting a two-dimensional shadow of a three-dimensional object on the wall: they enable us to visualize a large, high-dimensional data set by projecting it down into a lower dimension. The danger is that, as with the illusions created by shadow puppets, one might be seeing patterns and images that aren’t really there.

 

It is so far unclear when TDA works and when it might not. The technique rests on the assumption that a high-dimensional big data set has an intrinsic low-dimensional structure, and that it is possible to discover that structure mathematically. Recht believes that some data sets are intrinsically high in dimension and cannot be reduced by topological analysis. “If it turns out there is a spherical cow lurking underneath all your data, then TDA would be the way to go,” he said. “But if it’s not there, what can you do?” And if your dataset is corrupted or incomplete, topological methods will yield similarly flawed results.

 

Emmanuel Candes, a mathematician at Stanford University, and his then-postdoc, Justin Romberg, were fiddling with a badly mangled image on his computer, the sort typically used by computer scientists to test imaging algorithms. They were trying to find a method for improving fuzzy images, such as the ones generated by MRIs when there is insufficient time to complete a scan. On a hunch, Candes applied an algorithm designed to clean up fuzzy images, expecting to see a slight improvement. What appeared on his computer screen instead was a perfectly rendered image. Candes compares the unlikeliness of the result to being given just the first three digits of a 10-digit bank account number, and correctly guessing the remaining seven digits. But it wasn’t a fluke. The same thing happened when he applied the same technique to other incomplete images.

 

The key to the technique’s success is a concept known as sparsity, which usually denotes an image’s complexity, or lack thereof. It’s a mathematical version of Occam’s razor: While there may be millions of possible reconstructions for a fuzzy, ill-defined image, the simplest (sparsest) version is probably the best fit. Out of this serendipitous discovery, compressed sensing was born. With compressed sensing, one can determine which bits are significant without first having to collect and store them all.

 

This approach can even be useful for applications that are not, strictly speaking, compressed sensing problems, such as the Netflix prize. In October 2006, Netflix announced a competition offering a $1 million grand prize to whoever could improve the filtering algorithm for their in-house movie recommendation engine, Cinematch. An international team of statisticians, machine learning experts and computer engineers claimed the grand prize in 2009, but the academic community in general also benefited, since they gained access to Netflix’s very large, high quality data set. Recht was among those who tinkered with it. His work confirmed the viability of applying the compressed sensing approach to the challenge of filling in the missing ratings in the dataset.

 

Cinematch operates by using customer feedback: Users are encouraged to rate the films they watch, and based on those ratings, the engine must determine how much a given user will like similar films. The dataset is enormous, but it is incomplete: on average, users only rate about 200 movies, out of nearly 18,000 titles. Given the enormous popularity of Netflix, even an incremental improvement in the predictive algorithm results in a substantial boost to the company’s bottom line. Recht found that he could accurately predict which movies customers might be interested in purchasing, provided he saw enough products per person. Between 25 and 100 products were sufficient to complete the matrix.

 

“We have shown mathematically that you can do this very accurately under certain conditions by tractable computational techniques,” Candes said, and the lessons learned from this proof of principle are now feeding back into the research community.

 

Recht and Candes may champion approaches like compressed sensing, while Carlsson and Coifman align themselves more with the topological approach, but fundamentally, these two methods are complementary rather than competitive. There are several other promising mathematical tools being developed to handle this brave new world of big, complicated data. Vespignani uses everything from network analysis — creating networks of relations between people, objects, documents, and so forth in order to uncover the structure within the data — to machine learning, and good old-fashioned statistics.

 

Coifman asserts the need for an underlying global theory on a par with calculus to enable researchers to become better curators of big data. In the same way, the various techniques and tools being developed need to be integrated under the umbrella of such a broader theoretical model. “In the end, data science is more than the sum of its methodological parts,” Vespignani insists, and the same is true for its analytical tools. “When you combine many things you create something greater that is new and different.”

See on www.wired.com

The Man Behind the Google Brain: Andrew Ng and the Quest for the New AI | Wired Enterprise | Wired.com

20 May

See on Scoop.itComputational Music Analysis

There’s a theory that human intelligence stems from a single algorithm. The idea arises from experiments suggesting that the portion of your brain dedicated to processing sound from your ears could also handle sight for your eyes.

Olivier Lartillot‘s insight:

My digest:

 

"

There’s a theory that human intelligence stems from a single algorithm. The idea arises from experiments suggesting that the portion of your brain dedicated to processing sound from your ears could also handle sight for your eyes. This is possible only while your brain is in the earliest stages of development, but it implies that the brain is — at its core — a general-purpose machine that can be tuned to specific tasks.

 

In the early days of artificial intelligence, the prevailing opinion was that human intelligence derived from thousands of simple agents working in concert, what MIT’s Marvin Minsky called “The Society of Mind.” To achieve AI, engineers believed, they would have to build and combine thousands of individual computing modules. One agent, or algorithm, would mimic language. Another would handle speech. And so on. It seemed an insurmountable feat.

 

A new field of computer science research known as Deep Learning seeks to build machines that can process data in much the same way the brain does, and this movement has extended well beyond academia, into big-name corporations like Google and Apple. Google is building one of the most ambitious artificial-intelligence systems to date, the so-called Google Brain.

 

This movement seeks to meld computer science with neuroscience — something that never quite happened in the world of artificial intelligence. “I’ve seen a surprisingly large gulf between the engineers and the scientists.” Engineers wanted to build AI systems that just worked, but scientists were still struggling to understand the intricacies of the brain. For a long time, neuroscience just didn’t have the information needed to help improve the intelligent machines engineers wanted to build.

 

What’s more, scientists often felt they “owned” the brain, so there was little collaboration with researchers in other fields. The end result is that engineers started building AI systems that didn’t necessarily mimic the way the brain operated. They focused on building pseudo-smart systems that turned out to be more like a Roomba vacuum cleaner than Rosie the robot maid from the Jetsons.

 

Deep Learning is a first step in this new direction. Basically, it involves building neural networks — networks that mimic the behavior of the human brain. Much like the brain, these multi-layered computer networks can gather information and react to it. They can build up an understanding of what objects look or sound like.

 

In an effort to recreate human vision, for example, you might build a basic layer of artificial neurons that can detect simple things like the edges of a particular shape. The next layer could then piece together these edges to identify the larger shape, and then the shapes could be strung together to understand an object. The key here is that the software does all this on its own — a big advantage over older AI models, which required engineers to massage the visual or auditory data so that it could be digested by the machine-learning algorithm.

 

With Deep Learning, you just give the system a lot of data “so it can discover by itself what some of the concepts in the world are.” Last year, one algorithms taught itself to recognize cats after scanning millions of images on the internet. The algorithm didn’t know the word “cat” but over time, it learned to identify the furry creatures we know as cats, all on its own.

 

This approach is inspired by how scientists believe that humans learn. As babies, we watch our environments and start to understand the structure of objects we encounter, but until a parent tells us what it is, we can’t put a name to it.

 

No, deep learning algorithms aren’t yet as accurate — or as versatile — as the human brain. But he says this will come.

 

In 2011, the Deep Learning project was launched at Google, and in recents months, the search giant has significantly expanded this effort, acquiring the artificial intelligence outfit founded by University of Toronto professor Geoffrey Hinton, widely known as the godfather of neural networks.

 

Chinese search giant Baidu has opened its own research lab dedicated to deep learning, vowing to invest heavy resources in this area. And big tech companies like Microsoft and Qualcomm are looking to hire more computer scientists with expertise in neuroscience-inspired algorithms.

 

Meanwhile, engineers in Japan are building artificial neural nets to control robots. And together with scientists from the European Union and Israel, neuroscientist Henry Markman is hoping to recreate a human brain inside a supercomputer, using data from thousands of real experiments.

 

The rub is that we still don’t completely understand how the brain works, but scientists are pushing forward in this as well. The Chinese are working on what they call the Brainnetdome, described as a new atlas of the brain, and in the U.S., the Era of Big Neuroscience is unfolding with ambitious, multidisciplinary projects like President Obama’s newly announced (and much criticized) Brain Research Through Advancing Innovative Neurotechnologies Initiative — BRAIN for short.

 

If we map how out how thousands of neurons are interconnected and “how information is stored and processed in neural networks,” engineers will have better idea of what their artificial brains should look like. The data could ultimately feed and improve Deep Learning algorithms underlying technologies like computer vision, language analysis, and the voice recognition tools offered on smartphones from the likes of Apple and Google.

 

“That’s where we’re going to start to learn about the tricks that biology uses. I think the key is that biology is hiding secrets well. We just don’t have the right tools to grasp the complexity of what’s going on.”

 

Right now, engineers design around these issues, so they skimp on speed, size, or energy efficiency to make their systems work. But AI may provide a better answer. “Instead of dodging the problem, what I think biology could tell us is just how to deal with it….The switches that biology is using are also inherently noisy, but biology has found a good way to adapt and live with that noise and exploit it. If we could figure out how biology naturally deals with noisy computing elements, it would lead to a completely different model of computation.”

 

But scientists aren’t just aiming for smaller. They’re trying to build machines that do things computer have never done before. No matter how sophisticated algorithms are, today’s machines can’t fetch your groceries or pick out a purse or a dress you might like. That requires a more advanced breed of image intelligence and an ability to store and recall pertinent information in a way that’s reminiscent of human attention and memory. If you can do that, the possibilities are almost endless.

 

“Everybody recognizes that if you could solve these problems, it’s going to open up a vast, vast potential of commercial value."

See on www.wired.com

LilyPond – About – Essay

28 Apr

See on Scoop.itInformatique musicale

What’s wrong with computer music notation?

http://t.co/V81ofhA6nV

What is behind LilyPond? http://t.co/5LvLiZjyAw

See on lilypond.org

Music Information Retrieval, a tutorial

16 Mar

See on Scoop.itComputational Music Analysis

George Tzanetakis provides give an overview of techniques, applications and capabilities of music information retrieval systems.

Olivier Lartillot‘s insight:

Great tutorial by George Tzanetakis about the research on computational music analysis (a discipline known as Music Information Retrieval). The tutorial includes introduction of engineering techniques commonly used in those research.

 

Here are the discussion topics that you will find:

Music Information Retrieval

Connections

Music Today

Industry

Music Collections

Overview

Audio Feature Extraction

Linear Systems and Sinusoids

Fourier Transform

Short Time Fourier Transform

Spectrum and Shape Descriptors

Mel Frequency Cepstral Coefficients

Audio Feature Extraction

Pitch Content

Pitch Detection

Time Domain

AutoCorrelation

Frequency Domain

Chroma – Pitch Perception

Automatic Rhythm Description

Beat Histograms

Analysis Overview

Content-based Similarity Retrieval (or query-by-example)

Classification

Classification

Multi-tag Annotation

Stacking

Polyphonic Audio-Score Alignment

Dynamic Time Warping

Query-by-humming

The MUSART system

Conclusions

See on www.brainshark.com

Literary History, Seen Through Big Data’s Lens

27 Jan

See on Scoop.itComputational Music Analysis

Big Data is pushing into the humanities, as evidenced by new, illuminating computer analyses of literary history.

Olivier Lartillot‘s insight:

My digest:

 

"

Big Data technology is steadily pushing beyond the Internet industry and scientific research into seemingly foreign fields like the social sciences and the humanities. The new tools of discovery provide a fresh look at culture, much as the microscope gave us a closer look at the subtleties of life and the telescope opened the way to faraway galaxies.

 

“Traditionally, literary history was done by studying a relative handful of texts. What this technology does is let you see the big picture — the context in which a writer worked — on a scale we’ve never seen before.”

 

Some of those tools are commonly described in terms familiar to an Internet software engineer — algorithms that use machine learning and network analysis techniques. For instance, mathematical models are tailored to identify word patterns and thematic elements in written text. The number and strength of links among novels determine influence, much the way Google ranks Web sites.

 

It is this ability to collect, measure and analyze data for meaningful insights that is the promise of Big Data technology. In the humanities and social sciences, the flood of new data comes from many sources including books scanned into digital form, Web sites, blog posts and social network communications.

 

Data-centric specialties are growing fast, giving rise to a new vocabulary. In political science, this quantitative analysis is called political methodology. In history, there is cliometrics, which applies econometrics to history. In literature, stylometry is the study of an author’s writing style, and these days it leans heavily on computing and statistical analysis. Culturomics is the umbrella term used to describe rigorous quantitative inquiries in the social sciences and humanities.

 

“Some call it computer science and some call it statistics, but the essence is that these algorithmic methods are increasingly part of every discipline now.”

 

Cultural data analysts often adapt biological analogies to describe their work. For example: “Computing and Visualizing the 19th-Century Literary Genome.”

 

Such biological metaphors seem apt, because much of the research is a quantitative examination of words. Just as genes are the fundamental building blocks of biology, words are the raw material of ideas.

 

“What is critical and distinctive to human evolution is ideas, and how they evolve.”

 

Some projects mine the virtual book depository known as Google Books and track the use of words over time, compare related words and even graph them. Google cooperated and built the software for making graphs open to the public. The initial version of Google’s cultural exploration site began at the end of 2010, based on more than five million books, dating from 1500. By now, Google has scanned 20 million books, and the site is used 50 times a minute. For example, type in “women” in comparison to “men,” and you see that for centuries the number of references to men dwarfed those for women. The crossover came in 1985, with women ahead ever since.

 

Researchers tapped the Google Books data to find how quickly the past fades from books. For instance, references to “1880,” which peaked in that year, fell to half by 1912, a lag of 32 years. By contrast, “1973” declined to half its peak by 1983, only 10 years later. “We are forgetting our past faster with each passing year.”

 

Other research approached collective memory from a very different perspective, focusing on what makes spoken lines in movies memorable. Sentences that endure in the public mind are evolutionary success stories, cf. “the fitness of language and the fitness of organisms.” As a yardstick, the researchers used the “memorable quotes” selected from the popular Internet Movie Database, or IMDb, and the number of times that a particular movie line appears on the Web. Then they compared the memorable lines to the complete scripts of the movies in which they appeared — about 1,000 movies. To train their statistical algorithms on common sentence structure, word order and most widely used words, they fed their computers a huge archive of articles from news wires. The memorable lines consisted of surprising words embedded in sentences of ordinary structure. “We can think of memorable quotes as consisting of unusual word choices built on a scaffolding of common part-of-speech patterns.”

 

Quantitative tools in the humanities and the social sciences, as in other fields, are most powerful when they are controlled by an intelligent human. Experts with deep knowledge of a subject are needed to ask the right questions and to recognize the shortcomings of statistical models.

 

“You’ll always need both. But we’re at a moment now when there is much greater acceptance of these methods than in the past. There will come a time when this kind of analysis is just part of the tool kit in the humanities, as in every other discipline.”

See on www.nytimes.com