IDEAS FOR I545/N564/N364 INDEPENDENT PROJECTS - Don Byrd, rev. 29 Jan. 2008 Here are some ideas for semester projects. I'd be happy to consider anything else you think is relevant to the course; in fact, it's better in some ways if students make their own projects up. However, I'll expect you to convince me your project can demonstrate your mastery of enough of what the course is about. Please feel free to discuss any of this with me at any time. "DAB" below refers to me. For general background information, see my Information Sources for Music Informatics Students: http://www.informatics.indiana.edu/donbyrd/Teach/GeneralInformationSources.HTML and my Music IR and Music Informatics Bibliography: http://www.informatics.indiana.edu/donbyrd/DonMusicIRBibliography.HTML Requiring Programming --------------------- - Write a simple program to compute the similarity between two melodies, rhythm patterns, chord progressions, or even complete pieces, represented in some symbolic form (such a program could be used as the basis for a music-retrieval program). With what types of music would you expect your program to work well? With what types would you expect it to work at least somewhat? Test it against a well-thought-out collection of test files, and report on the results. [topic: symbolic retrieval] !- Attempt to improve NightingaleSearch's music searching and evaluate the results in some way, preferably by MIREX or standard TREC (Cranfield model) methods. Caveat: doing this almost certainly requires porting NightingaleSearch to a modern development environment, a serious undertaking in itself. [topic: symbolic retrieval] !- NightingaleSearch's music IR feature is painfully slow; the obvious reason is that it does exhaustive sequential searching. Make it run much faster, presumably by adding indexing. Caveat: doing this almost certainly requires porting NightingaleSearch to a modern development environment, a serious undertaking in itself. [topic: symbolic retrieval] - Make significant improvements to an existing program that analyzes audio in some way. Most such programs are probably too difficult for you to do anything with, given the constraints of time and your limited knowledge, but there are lots of simple ones around. DAB's R program DescribeAudioSegments (one of the "R Program Examples" on my website) is an unusually simple -- a better word might be "crude" -- example of such a program. In what way(s) is your version an improvement? Test yours against the original with a well-thought-out collection of test files, and report on the results. - Make significant improvements to an existing plugin for the Sonic Visualiser (www.sonicvisualiser.org), or write a new one. Do a user study to determine whether your changes really are improvements with either subjective or objective metrics. - Write a program that takes music in a symbolic form (MIDI files or something equivalent to them) and creates some kind of model of the music. The obvious models are probabilistic, specifically Markov or Hidden Markov models. Write another program that uses models created by the first program to "compose" music. Evaluate the compositions and comment on how they might be improved. - Build a database (e.g., from MIDI files, or from an existing collection like CCARH's) of at least 500 music "documents"; build a suitable set of queries; and/or investigate how the choice of search parameters affect the results, as evaluated by MIREX or TREC methods. [topic: symbolic retrieval] - Same as above, but using one of the MIREX tasks, with M2K/D2K or any other software. Could Be Done With Or Without Programming ----------------------------------------- !- Use NightingaleSearch to study something appropriate to some collection of music, e.g., to confirm or refute accepted wisdom about that music. The 24 Preludes and 24 Fugues of Bach's Well-Tempered Clavier and a relative handful of other pieces already exist in NightingaleSearch format; otherwise, there are utilities to convert music from some other forms (e.g., MuseData) to its format, though they may not work very well. Caveat: doing this may require porting NightingaleSearch to a modern development environment, a serious undertaking in itself. [topic: music analysis] !- Same as above, but using the Humdrum Toolkit, and there's far more music already available in a format Humdrum can use. On the other hand, Humdrum is much, much less user-friendly than NightingaleSearch: among other things, it doesn't have any kind of music-notation input or display, though programs exist to convert its kern format to a format a program like Lilypond can display. [topic: music analysis] - Adapt Steve Larson's "theory of musical forces" (see, e.g., "Musical Forces and Melodic Expectations: Comparing Computer Models and Experimental Results"; abstract at http://caliber.ucpress.net/doi/abs/10.1525/mp.2004.21.4.457) to recognize similarity between melodies or even complete polyphonic pieces of music. Devise a way to test your version of the theory; test it with a well-thought-out collection of music -- it might be possible to do this manually with a small collection -- and report on the results. [topic: music analysis] - It's obvious that automated Schenkerian analysis, even going just two or three levels down from the surface, would be incredibly valuable for music IR; however, a general, style-independent solution is probably not possible in the forseeable future. But how about a solution for a very limited range of styles -- e.g., only Anglo-American folksongs or 12-bar blues? Cf. "controllers" in David Cope's EMI system; also cf. Alan Marsden's ISMIR 2007 paper, etc. Study the problem and report on what you find in as scholarly a way as possible; preferably write a program and test it, but at least say just what your method _would_ do for some real music. [topic: music analysis, music perception, cognitive science] - Investigate clustering musical documents on whatever basis; this could be very useful for visualization, recommender or improvisation systems, etc. Cf. several papers from ISMIR and elsewhere, and techniques like MDS, PCA, Kohonen maps, and spring embedding. - Investigate converting between representations of different types (e.g., notation to MIDI); if possible, write a program to implement your ideas, and test it on a well-thought-out collection of music. [topic: music representation] - Investigate user-interface issues in music searching, either content-based or bibliographic, by designing a user interface and testing it on a number of volunteers. The testing should be for both subjective and objective factors, and should conform to the scholarly method. [topic: HCI] - Follow up on the ISMIR 2000 Mozart Varations survey: do it more scientifically, or at least investigate how that could be done, preferably by designing a valid experiment. [topic: relevance judgments] - DAB's Extremes of CMN list (on my website) is interesting, but _distributions_ for some collection of music and one or more of the features (e.g., written pitch or duration, or just number of augmentation dots!) showing how often various values occur in a significant body of music would be much more revealing; such distributions could be useful in statistical authorship studies, for example. Compute distributions for some of the items in the list and investigate how the resulting information could be useful. For a music collection, you could use the CCARH database (http://www.ccarh.org/), with kern data (http://kern.humdrum.net/) accessed via the Humdrum toolkit, or with MusicXML data (available from me) accessed via a program of your own. In any case, the programming part of this is relatively easy. - The "V2V" offshoot of the Variations2 project was an attempt to combine content-based and metadata-based searching. Investigate further how the two forms of searching could be combined from any standpoint: user interface, ranking, etc. !- Investigate the "Mickey Mouse Club theme" problem: to what extent is a music- searching program likely to find matches in inner voices that are of little or no interest because they're completely inaudible? (The answer may well depend on whether the program knows about the voices and does not look for matches that cross voices: see the "disastrous loss of precision" idea below.) !- Investigate what it would take to identify 12-bar blues in a collection of, say, MIDI files or Humdrum/kern files, and try out your technique. - Propose a new task for MIREX. Why is this task significant? How could entries be evaluated? One candidate Stephen Downie, the director of MIREX, is interested in is OMR. [topic: evaluation] - Investigate a basis for ranking music documents in search results. With music, as with text, this is normally done by similarity between the results and the query, and justified via "relevance". But are these the best concepts for ranking music? Present solid evidence one way or the other, perhaps via a user study. - Do something with the new "games with a purpose" for creating metadata for music (Luis van Ahn's "human computation" idea; cf. the video of his fascinating Google Tech Talk). For example, compare the existing ones, or design a new one. Probably Not Involving Programming ---------------------------------- !- Investigate a basis for ranking music documents in search results. With music as with text, this is normally done by similarity, and justified via "relevance". But are these the best concepts for ranking music? !- The "music as different as possible" problem. The idea is to find six or so pieces of music each of which is considered by listeners to be as different as possible from all the others; the hard part is to find objective evidence for the choices! Once you've chosen them, either justify or refute your claim that each is as different as possible from the others on a basis that's as objective as possible, almost certainly via a survey of listeners. Better, use such a measure to find the pieces in the first place. In either case, "as objective as possible" is not likely to be very objective: discuss the inherent limitations of objectivity here. This would be a step towards defining a musical style space. [topic: music classification] - Study perception of musical-instrument timbre. For example, there's evidence that people can identify instruments from note _attacks_ more reliably than from note "steady states" (sustains); can you find strong evidence one way or the other? Of course it depends on the definition of "attack". Another question: if you "morph" notes from one instrument into notes from another, at what point (in terms of the physical characteristics of the sound) do people switch from hearing one to the other? And how well do people agree? You might expect players of an instrument to be more likely to think intermediate sounds are their instrument -- or more likely to think they're _not_ their instrument. Do a user study. !- Investigate abstract spaces relevant to music, e.g., timbre or musical style. To my knowledge, published work on timbre space has been around since the 1970's, but little has been done in many years. Far less has been done on musical style spaces, which are a far more difficult problem -- but making progress from almost nothing shouldn't be that hard! [topic: music classification] - Study how MIREX works. Compare it to similar undertakings in other domains (TREC for text IR, the standard speech-recognition and question-answering tests, etc.). Say in detail how MIREX be improved. Can you give a convincing argument that your changes would improve MIREX? This seems like a difficult question of experimental methodology. One way might be to show what would have happened in a previous MIREX with your changes, but I'm skeptical that could be done in a convincing way. [topic: evaluation] !- List and discuss several of what you consider the most important unsolved problems of music IR. (I'll be glad to tell you what I think some of these problems are, but you're welcome to choose your own.) !- In Sept. 2005, a former director of engineering for All Music Guide said, in so many words, that programs that do automatic genre classification from audio are probably finding _something_, and something useful, but it may not be genres as people understand them. Investigate and report on the accuracy of this statement. [topic: music classification] - There is very little agreement among existing style-genre classifications: the numbers of categories varies wildly, some are "flat" lists and some are heirarchic, and some confuse styles and forms. Compare at least three existing classifications, and comment on which seems most practical for computer implementation and why. [topic: music classification] !- Study existing sets of relevance judgments for music; and/or create a new set. [topic: relevance judgments] - The "Amen Break" is a famous break beat that apparently has been sampled and used over and over again in many contexts since its creation; there's an interesting video about it on YouTube (www.youtube.com/watch?v=5SaFTm2bcac), and article in Wikipedia. However, I'm not convinced that all the supposed usages involve samples of the original, even modified samples. Investigate a substantial number of alleged uses and report on what you found out and how you did it.