Notes on interesting papers, to remind myself why I found them interesting, and to help find them again.
Primarily "blue-skies" research reading, not papers read in the course of writing one of my own, so not necessarily representative of my research.
2018-07-02 – On embeddings as alternative paradigm for relational learning
2018-06-07 – The abstract organism: Towards a prehistory for a-life art
2018-05-30 – Measuring the intrinsic dimension of objective landscapes
2018-05-24 – A trans-Canada computer communications network
2018-05-08 – Vagueness, rationality and undecidability: A theory of why there is vagueness
2018-05-02 – Practical reinforcement learning in continuous spaces
2018-04-28 – Six mathematical gems from the history of distance geometry
2018-04-10 – Theorizing affordances: From request to refuse
2018-04-05 – How to do research at the MIT AI Lab
2018-03-30 – Extremely randomized trees
2018-03-28 – Perspectives of approximate dynamic programming
2018-03-27 – Datafun: A functional Datalog
2018-03-26 – Finding alternative musical scales
2018-02-26 – Seagull: A bird's-eye view of the evolution of technical games research
2018-02-14 – Propp's morphology of the folk tale as a grammar for generation
2017-11-16 – Wordless games: Gameplay as narrative technique
2017-10-27 – On the notion of interestingness in automated mathematical discovery
2017-09-12 – The ordinal nature of emotions
2017-08-26 – Grimes' fairy tales: A 1960s story generator
2017-08-25 – Some natural solutions to the p-value communication problem—and why they won't work
2017-07-27 – Twenty things to do with a computer
2017-05-30 – Propositional & analogical generation of coordinated verbal, visual & musical texts
Abstract:
Many real-world domains can be expressed as graphs and, more generally, as multi-relational knowledge graphs. Though reasoning and learning with knowledge graphs has traditionally been addressed by symbolic approaches, recent methods in (deep) representation learning has shown promising results for specialized tasks such as knowledge base completion. These approaches abandon the traditional symbolic paradigm by replacing symbols with vectors in Euclidean space. With few exceptions, symbolic and distributional approaches are explored in different communities and little is known about their respective strengths and weaknesses. In this work, we compare representation learning and relational learning on various relational classification and clustering tasks and analyse the complexity of the rules used implicitly by these approaches. Preliminary results reveal possible indicators that could help in choosing one approach over the other for particular knowledge graphs.
Abstract:
The author examines historical precedents for contemporary art practice using artificial life, in particular in the work of Paul Klee and Kasimir Malevich. Similarities are identified between artificial life and the philosophical tradition of organicism; specific examples from Klee and Malevich indicate that those artists were engaged in a form of creative organicist thought that imagined the realization of living structures in artificial media.
Abstract:
Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.
Excerpt:
In the sector of service to users, there is one Major Project which would dramatically improve the quality and quantity of computer services available to all Canadians and would greatly assist Canadian industry. It is the organization of a nationwide system of computer communications networks. To achieve this goal a positive, vigorous Canadian policy is urgently needed. A "laissez-faire" attitude will eventually result in the supply of most computing and information services via spur lines from U.S. computer commu nications networks. Such an outcome is completely unacceptable on economic and social grounds. [...] The system of networks should not be allowed to practise "cream-skimming" by concentrating exclusively on the densely populated, highly profitable regions of Canada. It must link all important centres in Canada in order to bring computing and information services to the greatest possible number of Canadians.
Excerpt:
Vagueness is not undecidability, but undecidability does enter into an explanation of why there is vagueness. My theory, called the Undecidability Theory of Vagueness, explains vagueness largely as a result of the fact that we are computationally bound. Vagueness is not due to any particularly human weakness, but due to a weakness that any computationally bound agent possesses; even HAL from 2001: A Space Odyssey will probably experience vagueness. Furthermore, I will argue that vagueness is good for you. I will do so by showing that if you were a computationally bound, rational alien agent given the task of figuring out what our natural language predicates mean, you would very probably end up with vagueness. That is, unless two highly plausible hypotheses are false, it would be rational for you—i.e., in your best interest—to choose concepts in such a way that they are vague. Given that you are computationally bound, avoiding vagueness brings in greater costs than accepting it.
Abstract:
Dynamic control tasks are good candidates for the application of reinforcement learning techniques. However, many of these tasks inherently have continuous state or action variables. This can cause problems for traditional reinforcement learning algorithms which assume discrete states and actions. In this paper, we introduce an algorithm that safely approximates the value function for continuous state control tasks, and that learns quickly from a small amount of data. We give experimental results using this algorithm to learn policies for both a simulated task and also for a real robot, operating in an unaltered environment. The algorithm works well in a traditional learning setting, and demonstrates extremely good learning when bootstrapped with a small amount of human-provided data.
Abstract:
We report the detection of a 51-joule (320 +/- 90 EeV) cosmic ray by the Fly's Eye air shower detector in Utah. This is substantially greater than the energy of any previously reported cosmic ray. A Greisen-Zatsepin-Kuz'min cutoff of the energy spectrum (due to pion photoproduction energy losses) should occur below this energy unless the highest energy cosmic rays have traveled less than about 30 Mpc. The error box for the arrival direction in galactic coordinates is centered on b=9.6 deg, l=163.4 deg. The particle cascade reached a maximum size near a depth of 815 g/cm^2 in the atmosphere, a depth which does not uniquely identify the type of primary particle.
Abstract:
This is a partial account of the fascinating history of distance geometry. We make no claim to completeness, but we do promise a dazzling display of beautiful, elementary mathematics. We prove Heron's formula, Cauchy's theorem on the rigidity of polyhedra, Cayley's generalization of Heron's formula to higher dimensions, Menger's characterization of abstract semimetric spaces, a result of Gödel on metric spaces on the sphere, and Schoenberg's equivalence of distance and positive semidefinite matrices, which is at the basis of multidimensional scaling.
Abstract:
As a concept, affordance is integral to scholarly analysis across multiple fields—including media studies, science and technology studies, communication studies, ecological psychology, and design studies among others. Critics, however, rightly point to the following shortcomings: definitional confusion, a false binary in which artifacts either afford or do not, and failure to account for diverse subject-artifact relations. Addressing these critiques, this article demarcates the mechanisms of affordance—as artifacts request, demand, allow, encourage, discourage, and refuse—which take shape through interrelated conditions: perception, dexterity, and cultural and institutional legitimacy. Together, the mechanisms and conditions constitute a dynamic and structurally situated model that addresses how artifacts afford, for whom and under what circumstances.
Abstract:
This document presumptuously purports to explain how to do research. We give heuristics that may be useful in picking up the specific skills needed for research (reading, writing, programming) and for understanding and enjoying the process itself (methodology, topic and advisor selection, and emotional factors).
Abstract:
This paper proposes a new tree-based ensemble method for supervised classification and regression problems. It essentially consists of randomizing strongly both attribute and cut-point choice while splitting a tree node. In the extreme case, it builds totally randomized trees whose structures are independent of the output values of the learning sample. The strength of the randomization can be tuned to problem specifics by the appropriate choice of a parameter. We evaluate the robustness of the default choice of this parameter, and we also provide insight on how to adjust it in particular situations. Besides accuracy, the main strength of the resulting algorithm is computational efficiency. A bias/variance analysis of the Extra-Trees algorithm is also provided as well as a geometrical and a kernel characterization of the models induced.
Abstract:
Approximate dynamic programming has evolved, initially independently, within operations research, computer science and the engineering controls community, all searching for practical tools for solving sequential stochastic optimization problems. More so than other communities, operations research continued to develop the theory behind the basic model introduced by Bellman with discrete states and actions, even while authors as early as Bellman himself recognized its limits due to the "curse of dimensionality" inherent in discrete state spaces. In response to these limitations, subcommunities in computer science, control theory and operations research have developed a variety of methods for solving different classes of stochastic, dynamic optimization problems, creating the appearance of a jungle of competing approaches. In this article, we show that there is actually a common theme to these strategies, and underpinning the entire field remains the fundamental algorithmic strategies of value and policy iteration that were first introduced in the 1950's and 60's.
Abstract:
Datalog may be considered either an unusually powerful query language or a carefully limited logic programming language. Datalog is declarative, expressive, and optimizable, and has been applied successfully in a wide variety of problem domains. However, most use-cases require extending Datalog in an application-specific manner. In this paper we define Datafun, an analogue of Datalog supporting higher-order functional programming. The key idea is to track monotonicity with types.
Abstract:
We search for alternative musical scales that share the main advantages of classical scales: pitch frequencies that bear simple ratios to each other, and multiple keys based on an underlying chromatic scale with tempered tuning. We conduct the search by formulating a constraint satisfaction problem that is well suited for solution by constraint programming. We find that certain 11-note scales on a 19-note chromatic stand out as superior to all others. These scales enjoy harmonic and structural possibilities that go significantly beyond what is available in classical scales and therefore provide a possible medium for innovative musical composition.
Abstract:
Within Entertainment Computing, games research has grown to be its own area, with numerous publication venues dedicated to it. As this area evolves, it is fruitful to examine its overall development—which subcommunities and research interests were present from the start, which have come and gone, and which are currently active—to better understand the research community as a whole and where it may proceed. In this paper, we present a data-driven analysis and interactive visualization tool to shed light on how technical domains within the games research field have evolved from 2000 to 2013, based on publication data from over 8000 articles collected from 48 games research venues, including Entertainment Computing, FDG, AIIDE, and DiGRA. The approach we present is descriptive. We first used data mining algorithms to group related papers into clusters of similar research topics and evolve these clusters over time. We then designed an interactive visualization system, named Seagull, comprised of Sankey diagrams that allow us to interactively visualize and examine the transition and coalescing of different clusters across time. We present our descriptive analysis in this paper and also contribute the visualization interface to allow other researchers to examine the data and develop their own analysis.
Abstract:
The semi-formal analysis of Russian folk tales carried out by Vladimir Propp has often been used as theoretical background for the automated generation of stories. Its rigour and its exhaustive description of the constituent elements of Russian folk tales, and the enumeration of the patterns they follow, have acted as inspiration for several story generation systems, both sequential and interactive. Yet most of these efforts have attempted to generalize Propp's account to types of stories beyond the corpus that it arose from. In the process, a number of the valuable intuitions present in the original work are lost. The present paper revisits Propp's morphology to build a system that generates instances of Russian folk tales. Propp's view of the folk tale as a rigid sequence of character functions is phrased as a grammar that can be employed as a plot driver. Unification is used to incrementally build a conceptual representation of discourse by adding to an ongoing draft story actions that instantiate the character functions. Story actions are defined by pre and post conditions on the state of the plot to account for the causal relations crucial to narrative. The potential of the resulting system for providing a generic story generation system is discussed and possible lines of future work are discussed.
Abstract:
In this paper, we look at how gameplay can be used to tell stories in a game without the help of words. Through close readings of three wordless games with a strong narrative focus, Journey, Brothers: A Tale of Two Sons and A Bird Story, we explore how gameplay within wordless games can help to convey a narrative. We have identified four techniques by which gameplay is used for storytelling: gameplay as enacting narrative, manipulating player controls for narrative effect, gameplay for exploring narrative setting, and gameplay as time progression. We discuss these techniques in relation to existing concepts of player experience, and suggest ways gameplay can help to circumvent issues of ambiguity in wordless narrative in games.
Abstract:
We survey five mathematical discovery programs by looking in detail at the discovery processes they illustrate and the success they had. We focus on how they estimate the interestingness of concepts and conjectures and extract some common notions about interestingness in automated mathematical discovery. We detail how empirical evidence is used to give plausibility to conjectures, and the different ways in which a result can be thought of as novel. We also look at the ways in which the programs assess how surprising and complex a conjecture statement is, and the different ways in which the applicability of a concept or conjecture is used. Finally, we note how a user can set tasks for the program to achieve and how this affects the calculation of interestingness. We conclude with some hints on the use of interestingness measures for future developers of discovery programs in mathematics.
Abstract:
We examined the sequence of decision problems that are encountered in the game of Tetris and found that most of the problems are easy in the following sense: One can choose well among the available actions without knowing an evaluation function that scores well in the game. This is a consequence of three conditions that are prevalent in the game: simple dominance, cumulative dominance, and noncompensation. These conditions can be exploited to develop faster and more effective learning algorithms. In addition, they allow certain types of domain knowledge to be incorporated with ease into a learning algorithm. Among the sequential decision problems we encounter, it is unlikely that Tetris is unique or rare in having these properties.
Abstract:
Representing computationally everyday emotional states is a challenging task and, arguably, one of the most fundamental for affective computing. Standard practice in emotion annotation is to ask humans to assign an absolute value of intensity to each emotional behavior they observe. Psychological theories and evidence from multiple disciplines including neuroscience, economics and artificial intelligence, however, suggest that the task of assigning reference-based (relative) values to subjective notions is better aligned with the underlying representations than assigning absolute values. Evidence also shows that we use reference points, or else anchors, against which we evaluate values such as the emotional state of a stimulus; suggesting again that ordinal labels are a more suitable way to represent emotions. This paper draws together the theoretical reasons to favor relative over absolute labels for representing and annotating emotion, reviewing the literature across several disciplines. We go on to discuss good and bad practices of treating ordinal and other forms of annotation data, and make the case for preference learning methods as the appropriate approach for treating ordinal labels. We finally discuss the advantages of relative annotation with respect to both reliability and validity through a number of case studies in affective computing, and address common objections to the use of ordinal data. Overall, the thesis that emotions are by nature relative is supported by both theoretical arguments and evidence, and opens new horizons for the way emotions are viewed, represented and analyzed computationally.
Abstract:
We provide the first extensive account of an unknown story generator that was developed by linguist Joseph E. Grimes in the early 1960s. A pioneering system, it was the first to take a grammar-based approach and the first to operationalize Propp’s famous model. This is the opening paper in a series that will aim to reformulate the prevailing history of story generation in light of new findings we have made pertaining to several forgotten early projects. Our study here has been made possible by personal communication with the system's creator, Grimes, and excavation of three obscure contemporaneous sources. While the accepted knowledge in our field is that the earliest story generator was Sheldon Klein’s automatic novel writer, first reported in 1971, we show that Grimes’s system and two others preceded it. In doing this, we reveal a new earliest known system. With this paper, and follow-ups to it that are in progress, we aim to provide a new account of the area of story generation that lends our community insight as to where it came from and where it should go next. We hope others will join us in this mission.
Excerpt:
It is well known that even experienced scientists routinely misinterpret p-values in all sorts of ways, including confusion of statistical and practical significance, treating non-rejection as acceptance of the null hypothesis, and interpreting the p-value as some sort of replication probability or as the posterior probability that the null hypothesis is true. [...] At this point it would be natural for statisticians to think that this is a problem of education and communication. If we could just add a few more paragraphs to the relevant sections of our textbooks, and persuade applied practitioners to consult more with statisticians, then all would be well, or so goes this logic. [...] The problem does not seem just to be technical misunderstandings; rather, statistical analysis is being asked to do something that it simply can’t do, to bring out a signal from any data, no matter how noisy. We suspect that, to make progress in pedagogy, statisticians will have to give up some of the claims we have implicitly been making about the effectiveness of our methods.
Excerpt:
When people talk about computers in education they do not all have the same image in mind. Some think of using the computer to program the kid; others think of using the kid to program the computer. [...] In the real world computers are used in many different ways. Some are programmed to fly airplanes; not to *tell* a human pilot what to do, but to pull levers with their own electro-mechanical effectors... [...] Why then should computers in schools be confined to computing the sum of the squares of the first twenty odd numbers and similar so-called "problem-solving" uses? Why not use them to produce some action? [...] But our purpose here is not to complain of what other people have not done, but to tell of some of the exciting things you can do with the computer you have now or with the one you will be incited to get by the pages that follow.
Excerpt:
[L]ooking back to the first encounters of computers and design, one discovers a rich legacy of speculation on the implications of this merging. Inquiry into the early computational era (1965-1975) can therefore expose (part of) the cultural and historical origins of the popular but loosely defined term "computational design." [...] The intense impulse to situate the new entity of the computer in the traditional, empirical processes of design lead to assignments of anthropomorphic roles to the machine, such as the "clerk," the "partner," the "accountant," and others. These different "occupations" were eloquent metaphors denoting different approaches to the ways that the innate characteristics of the computer could be reconciled with the elusive characteristics of design, as well as to the new relationship of the machine as a design actor with the designer-author. The main body of this paper places these metaphors in conversation, thus revealing different models of computation, as well as different processes of design. The purpose of this short survey is to bring forth computational "role models" which survive until today, assert them as historical and cultural artifacts, and present their conceptual counterpoints, re-opening them for discussion.
Excerpt:
We are developing a model for human cognitive processing which assumes that a major component of the rules for calculating behavior are resident outside the individual, in the inherited, collective phenomena that anthropologists call 'culture'. Our model contains rules of behavior encoded in propositional structures such as frames or scripts, plus a method for calculating behavior by analogy. Propositional rules are transformed into situation state descriptions that are related by transformation operators which are also valid for transforming situations by analogy, and for planning by analogy. [...] We use the simulation model as a generator of operas, complete with textual, pictorial and musical output, all derived from a common semantic source. The output is in the medium of a videotape recording, and is generally entitled, Revolt in Flatland.
Abstract:
We argue that an evaluation of system behavior at the level of the music is required to usefully address the fundamental problems of music genre recognition (MGR), and indeed other tasks of music information retrieval, such as autotagging. A recent review of works in MGR since 1995 shows that most (82%) measure the capacity of a system to recognize genre by its classification accuracy. After reviewing evaluation in MGR, we show that neither classification accuracy, nor recall and precision, nor confusion tables, necessarily reflect the capacity of a system to recognize genre in musical signals. Hence, such figures of merit cannot be used to reliably rank, promote or discount the genre recognition performance of MGR systems if genre recognition (rather than identification by irrelevant confounding factors) is the objective. This motivates the development of a richer experimental toolbox for evaluating any system designed to intelligently extract information from music signals.