Thesis Christ: From Sight to Meaning
Casey Harris’s thesis programs computers to go beyond just reading code, featuring the game Celeste
If you’ve ever watched someone playing a video game but was bummed your computer couldn’t “see” it as well, then this thesis is for you. For his thesis, computer science major Casey Harris decided to program a computer to “watch” the game Celeste, a 2-dimensional platformer about a woman trying to climb a mountain. For this game, the computer would have “no inherent knowledge of the code going on behind the scenes, the actual processes of the game. Just video output,” as Harris says. While generally fairly easy for a human, this process is incredibly complicated for a computer. “A computer can only see a matrix telling it how the dots are laid out, and maybe what color each dot is. From that, it needs to determine meaningful information” Harris states. “There’s no built-in sense of connectedness or scale, it’s just numbers. The challenge is what those numbers ‘mean’ visually, and why that’s important or not important to the work we’re trying to do.” This means that, instead of having a complete image to work with, Harris has an uncountable number of pixels with information, and he is the one that needs to create the program designating the meaning of each and every one for the computer. Correctly. Otherwise, the computer won’t know even the basics of what it’s “seeing”. Given the differences in color, movement, attributes, etc., within the game, this ends up being an extremely difficult enterprise.
The first step, in order to have a computer “watch” Celeste, was to acquire video of the game. “The main body I've been using is the trailer for the game. It's my demonstration video because it has some nice features. The game's resolution, by default, is very small and the trailer upscales to a value that's easier to work with.” After that, Harris had to do a lot of trial and error with computer vision techniques to see if they would work for the game. One example of a computer vision technique is object detection by reference. Object detection by reference is taking the video you are using and comparing what is on the screen with a preset object. For the example of Celeste, and in many other video games, this would be good for objects like the avatar or power-ups because they have consistent appearances that don’t really change throughout the game. On the other hand, it would be terrible for things like the platforms or the backgrounds because there’s too much variation between levels. Harris says, “They're so diverse that the number of reference images you would need to get a match on every version of it would be non-feasible because you'd be checking dozens or maybe even hundreds of reference images for every single calculation.” This is when he needed to use trial and error to tackle the more visually complicated aspects of the game. Once that is done, the computer should recognize the game and be able to “watch” it.
Despite only having two major steps in his thesis, the thesis itself is deceptively challenging. Much of the principles Harris has used in his thesis were self-taught, as he has never taken a course on computer vision at Reed. Not only that, but Harris kept running into issues while working on his thesis. “Aspect ratio and getting the images for object detection was the first hurdle that I can really remember stopping me hard” Harris recalls. He initially tried to use screenshots of objects from the trailer for his object detection, but there were too many issues with those images, such as accidentally getting part of the background as well as the object. Eventually he had to go into the games files to find the sprites (small graphics designed to be part of a larger scene) to use as reference, which took him some time. And, of course, this is only one section of many different techniques he had to use in order for the computer to visualize the screen.
It also didn’t help that problems created by the pandemic influenced his ability to focus on his work. “I really enjoy physical workspaces, and I was looking forward to getting a designated physical workspace for my thesis” says Harris. “And then I didn't get one. So, I've been trying to work around that for the entire year and it's been marginally successful, but it's definitely harder than it would have otherwise been.” Even with these setbacks, Harris’s work is continuing at a good pace and could lead into some extremely interesting research on the nature of computer visualization and actions.
The natural extension of Harris’s thesis is to turn the computer into a gamer. Once the computer can “see” what the game is doing, it should be a fairly straightforward process to actually have it play the game, according to Harris. “Once we understand where all the platforms are and where all the spikes are, we can take a guess at where we started and where the goal of the level is. The next step is handing the machine an interface to allow it to input commands and move the character and letting it loose and seeing if it figures [the game] out.” There are two ways this could be done. One is manually creating the algorithms under which it would be programmed to play. The other is using machine learning to teach it how to play the game through trial and error. Either way, it could provide exciting results through looking at how the computer interprets the parameters given to it. For example, in Celeste, the programmers took a shortcut where only the tip of the spikes is a failed state in the game. This means that if you get the angle exactly right, you can use the spikes like platforms. “It would be interesting to see if a computer could do this, to figure out that there are little quirks to the game,” Harris wonders. However, these questions aren’t all just fun and games. “This project is predicated off the analogy of automated driving. This kind of decision making by an autonomous vehicle where it has to both plan a route, but also react to other drivers on the road, weather conditions, things like that,” Harris states. So, while this specific project may be about video games, its principles are based in critically important research.
Algorithms like these are being used more and more in potentially literal life or death situations, such as self-driving cars, security cameras scanning for violent behavior, or even the medical industry. The research that can be built off of Harris’s work allows computer scientists to test the algorithms for edge cases or undesirable behavior the algorithm has within it and weed out the issues they find. This is a crucial process before the features using these algorithms can be released because, as Harris states, “You do not want a one in a million edge case where your car hits somebody on the road, because if even 50,000 cars are driving for a year, a one in a million chance is not that unlikely.” The car needs to consistently be able to read lines on a road, even if they are worn or wobbly, understand and react to obstructions in the road, recognize different weather conditions and change the driving accordingly, and a lot of other necessary tasks perfectly, all before the car can go out on the road. That’s not to say that we can’t trust these algorithms. Despite the difficulties, they present a lot of opportunities humans wouldn’t have access to otherwise. Computer programs can do things with more consistency and precision than a human can, and have access to places humans never could gain, both metaphorically and literally. If done properly, computer programs such as these are a great resource to have. These issues just illustrate how important the research computer scientists do in order for all of us to have all of these programs that can make our lives easier.