Professor Insik Shin (a professor at the school of computing, KAIST) and his research team have developed UbiTap, a novel sound-based touch localization technology that offers a new way of ubiquitous interaction, making it possible to use surrounding objects, such as furniture or mirrors, as touch input tools. For example, with the use of only the built-in microphones of smartphones or tablets, users can turn surrounding tables or walls into virtual keyboards and write lengthy e-mails much conveniently. Also, family members can make a virtual chessboard on their dining tables and enjoy the board game after having a dinner. In addition, we can use traditional smart devices (such as smart TVs, mirrors, etc.) in a smarter way, by using the microphones present in the devices. If children can play a game of catching germs in their mouth while brushing their teeth in front of smart mirrors, it may help to develop proper brushing habits. (See the figure below) Figure 1. Examples of using touch input technology: By using only smartphone, you can use surrounding objects as a touch screen anytime and anywhere. The most important thing for enabling the sound-based touch input method is to identify the location of touch inputs very precisely (within about 1 cm error). However, it is challenging to meet the requirement mainly because this technology can be used in diverse and dynamically-changing environmnets. Users may use different objects (such as desks, walls, or mirrors) as touch input tools and the surrounding environments (such as the location of nearby objects or ambient noise level) can be varied. These environmental changes can affect the characteristics of touch sounds. To address this challenge, the research team has focused on analyzing the fundamental properties of touch sounds, especially how they are transmitted through solid surfaces. On solid surfaces, sound experiences a dispersion phenomenon that makes dfferent frequency components travel at different speeds. Based on this phenomenon, we observed that 1) the arrival time difference (TDoA) between frequency components increases in proportion to the sound transmission distance and 2) this linear relationship is not affected by the variations of surround environments. Capturing these observations, the research team proposed a novel sound-based touch input technology that 1) records touch sounds transmitted through solid surfaces, 2) conducts a simple calibration process to identify the relationship between TDoA and sound transmission distance, and 3) finally achieves an accurate touch input localization. As a result of measuring the accuracy of the proposed system, the average localization error was lower than about 0.4 cm on a 17 inch touch screen. In particular, it provided a measurement error of less than 1 cm, even in a variety of objects such as wooden desks, glass mirrors, and acrylic boards, even when the position of nearby objects and noise levels change dynamically. Experiments with real-world users have also shown positive responses to all measurement factors, including user experience and accuracy. UbiTap is expected to encourage the emergence of creative and useful applications which provide rich user experience. In recognition of this contribution, this work was presented at ACM SenSys, a top-tier conference in the field of mobile computing and sensing, in November this year, and it was awarded a best paper runner-up. Demonstration video (http://cps.kaist.ac.kr/research/ubitap/ubitap_demo.mp4)
From cooking recipes to home improvement videos to software tutorials, people are increasingly turning to online instructions as guides for learning new skills or accomplishing unfamiliar tasks in everyday lives. However, the number of available instructional materials even for a single task is easily in the magnitude of thousands, the diversity and the scale of the instructions introduce new user challenges in currently used software interfaces for authoring, sharing and consuming these naturally crowdsourced instructions. It is difficult to find the contextually useful information because the interfaces are unfortunately not designed for effectively navigating and following the instructions, and do not support comparison and analysis for sensemaking of the various instructions. For example, for cooking professionals like chefs, cooking journalists, and culinary students, it is important to understand not only the culinary characteristics of the ingredients but also the diverse cooking processes. For example, categorizing different approaches to cooking a dish or identifying usage patterns of particular ingredients and cooking methods are crucial tasks for them RecipeScape is an analytics dashboard for analyzing and mining hundreds of instructions for a single dish. The interface was designed to solve a critical problem for cooking professionals, the need to understand not only the composition of ingredients but more importantly the diverse cooking processes. RecipeScape is powered by a novel computational pipeline that collects, annotates, computes structural and semantic similarities, and visualizes hundreds of recipes for a single dish. Cooking professionals and culinary students found that RecipeScape 1) empowers users with stronger analytical capabilities by enabling at-scale exploration of instructions, 2) supports user-centered queries like “what are recipes with more decorations?”, and 3) supports creativity by allowing comparative analysis like “where would my recipe stand against the rest?”. Moreover, the three visualization components allow users to reason and provide their own interpretations and explanations of how the recipes are grouped together in human language, suggesting user with appropriate tools can interpret clustering algorithms. We believe our computational pipeline and interactive visualization techniques can be extended to other media like tutorial videos as well as are highly applicable to different types of sequential tasks, such as software workflows, manufacturing, and customer service manuals. This work was presented at CHI 2018 in Montreal as “RecipeScape: An Interactive Tool for Analyzing Cooking Instructions at Scale”. For more detail, please visit our project website, https://recipescape.kixlab.org/
Advancement of preventive and predictive medicine in Healthcare: Uncertainty-aware Attention mechanism (UA) for reliable and accurate prediction. Artificial Intelligence technology in healthcare uses algorithm and software to clinically approximate human cognition and intuition in the analysis of complex medical data. Most of the recently developed AI-based software programs in healthcare are deep learning-based and, in the functionality term, those programs are called Computer-Aided Diagnosis (CAD) indicating diagnostic decision-support software programs that assist clinicians in their medical practice and its related works. Applications of CAD vary from detecting a tumor in CT or CAT scan images to diagnose a breast cancer or diabetes in preventive medical check-up records. The interdisciplinary technology combining elements of artificial intelligence and preventative and predictive medicine has opened a new era for high-quality healthcare services and improvement of human welfare. Professor Sungju Hwang and his research team have developed an effective computer-aided diagnosis system, called “Uncertainty-aware attention mechanism”, and this work was presented as a paper, “Uncertainty-aware attention mechanism for reliable and accurate prediction” at NeurIPS 2018, one of the most prominent conferences in machine learning and artificial intelligence field. They proposed the notion of input-dependent uncertainty to the attention mechanism, such that it generates attention for each feature with varying degrees of noise based on the given input, to learn larger variance on instances it is uncertain about. Figure 1: UA diagnosis system overview The key to this technology lies in uncertainty-aware attention mechanism using variational inference. Attention mechanism is an effective tool in leading deep learning models to focus on relevant features and allow easy interpretation of the model via the generated attention allocations. However, attention mechanism are still limited as means of implementing safe deep learning model for safety-critical tasks since attention strengths are commonly generated from a model that is trained in a weakly-supervised manner and thus could be incorrectly allocated. A reliable model should prevent itself from making critical mistakes and know its own limitation of when it is safe to make predictions and when it is not. This motivated the team to develop uncertainty-aware attention mechanism to allow the model to output uncertainty on each feature and leverage them when making final predictions. They modeled the attention weights as Gaussian distribution with input-dependent noise. The model then generates attentions with small variance when it’s confident about the feature contribution of the given features, and allocates noisy attentions with large variance to uncertain features. They formulated this novel uncertainty-aware attention (UA) model under the Bayesian framework and solve it with variational inference. Figure 2: Feature contribution analysis of vital signs for patient mortality This analysis of the learned attentions showed that their model generates attentions that comply with clinicians’ interpretation with high reliability, and provides richer interpretation via learned variance. Performing qualitative and quantitative experiments of clinical environment with a help of clinicians, they proved that UA diagnostic system is significantly capable of providing reliable clinical event alert and accurate interpretation analysis for predicted clinical events. His research team, clinicians, and co-working colleagues envision the potential of UA diagnostic system being actively applied to various clinical situations, starting from delivery of valuable clinical event alert (mortality, heart arrest, or stroke) as a predictive analytic tool in Intensive Care Unit (ICU), to real-time monitoring and analysis preventive system for effective assessment and early treatment as a clinical decision supporting tool.
Prof. Min H. Kim’s team of the KAIST School of Computing has developed a new method that replicates physical objects for the augmented and virtual reality (AR/VR) just using a single smartphone, without any need for additional, and oftentimes expensive, supporting hardware. To faithfully reproduce a real-world object in the VR/AR environment, we need to replicate the 3D geometry and appearance of the object. Traditionally, this has been either done manually by 3D artists, which is a labor-intensive task, or by using specialized, expensive hardware. Those setups might include a 3D laser scanner or multiple cameras, or a lighting dome with more than a hundred light sources. In contrast, this new technique only needs a single camera with a built-in flash, to produce high-quality outputs. "Many traditional methods using a single camera can capture only the 3D geometry of objects, but not the complex reflectance of real-world objects, given by the SVBRDF," notes Kim. SVBRDF, which stands for spatially-varying bidirectional reflectance distribution functions, is key in obtaining an object's real-world shape and appearance. "Using only 3D geometry cannot reproduce the realistic appearance of the object in the AR/VR environment. Our technique can capture high-quality 3D geometry as well as its material appearance so that the objects can be realistically rendered in any virtual environment. It is also straightforward, cheaper and efficient, and reproduces realistic 3D objects by just taking photos from a single camera with a built-in flash.” The research team demonstrated their framework in a series of examples in their paper, "Practical SVBRDF Acquisition of 3D Objects with Unstructured Flash Photography." The novel algorithm, which does not require any input geometry of the target object, successfully captured the geometry and appearance of 3D objects with basic, flash photography and reproduced consistent results. Examples that were showcased in the work included diverse set of objects that spanned a wide range of geometries and materials, including metal, wood, plastic, ceramic, resin and paper, and comprised of complex shapes like a finely detailed mini-statute of Nefertiti. Prof. Kim and his collaborators, Diego Gutierrez, professor of computer science at Universidad de Zaragoza in Spain, and KAIST PhD students Giljoo Nam and Joo Ho Lee, published the new study in ACM Transactions on Graphics (TOG) and presented it at ACM SIGGRAPH Asia 2018 in Tokyo. In future work, they plan to further simplify the capture process or to extend the method to larger scenes.
New technique can guide systematic testing of machine learning components Prof. Shin Yoo's team of KAIST School of Computing has developed a new guideline that enables systematic testing of machine learning modules that uses Deep Neural Networks (DNNs). DNN based Machine Learning is increasingly being adopted into safety-critical systems, such as autonomous driving cars or medical diagnosis systems. This new technique, called SADL (Surprise Adequacy for Deep Learning), provides a much-needed guideline for testing the robustness of the Machine Learning modules used within these safety-critical systems. The team behind SADL includes Jinhan Kim, a Ph.D. candidate at KAIST, and Robert Feldt, Professor of Software Engineering at Chalmers University, Sweden. Deep Neural Networks are often lauded as the new paradigm of programming, but their internal structures are fundamentally different from the traditional programs. Traditional programs are written by human developers, using structural elements such as conditional branching and loops. In contrasts, DNNs do not have such internal structures: there is no branching based on predicates, for example. Instead, every decision made by DNNs are based on numerical computations, which happen linearly, and their outcomes. A problem arises when we try to test DNN based systems using traditional testing techniques. Since we cannot try all possible test inputs, of which there are infinitely many, we tend to rely on the internal structure of programs as a surrogate adequacy guideline. For instance, if during testing we have executed 80％ of all branches in a given program under test, it is regarded to be better testing than another attempt that has only executed 60％ of the branches. While executing a branch does not guarantee to reveal any faults hidden under that branch, having executed 80％ of them with some input is at least better than having executed only 60％. The lack of internal structures in the decision process of DNNs means that there is nothing that can guide the testing, essentially forcing us to bindly sample from all test inputs out there. This is an even worse problem than that of traditional software because some of the inputs to DNNs are real-world sensory data, such as images and spoken words. The variations in these input domains are limitless, which makes input sampling for testing all the more difficult. Testing of DNNs is still a very new area in Software Engineering, with the first generation of techniques appearing within the last couple of years. Instead of counting the percentage of executed branches, these techniques counted the percentage of neurons in the DNN under test whose activation values are above a certain threshold. It is thought that the more neurons activate above the threshold, the more diverse the inputs are, and the better testing it is. "Think of testing DNNs as setting exam questions to (machine) learners," says Prof. Yoo. "To do good testing, you need good exam questions that will highlight where the learners are having difficulties. The current focus on neuron activation values is a bit like taking brain MRI scans while the learners are taking the exam in order to evaluate how good the exam questions are. We think it is too coarse-grained, especially given that we still do not fully understand why DNNs are so effective at learning. For example, these techniques do not support comparison between two test inputs in terms fo their relative difficulty to the machine learner." SADL separates itself from the crowd by focusing instead on how "surprising" a given input is when compared to the training data. "SADL allows us to measure how different the exam questions are from the textbook, which is the training data for machine learners," says Prof. Yoo. "A good exam question should be sufficiently different from what is already in the textbook, but not so different as to be completely irrelevant to the topic. SADL allows testers to set a range of relevant level of surprise, and systematically sample test inputs from this range." The team will present their new technique at the 41st ACM/IEEE International Conference on Software Engineering (ICSE), held from 25 to 31 May 2019 in Montreal, Canada. The annual conference presents the latest research and innovation from the field of software engineering.
Fighting falsehoods online is considered one of the grand challenges of the 21st century. Many news stories that are shared online lack verification of sources and veracity and thus pose significant threats to the society. When readers decide on which news stories to consume, news headlines play an important role in making the first impressions, and thereby affect the viral potential of news stories on social media platforms. Correct news headlines are even more important today, because in digital environments under information overload, people are less likely to read or click on the whole content but just consume headlines. As a result, it is common to observe link-based sharing, where people circulate news headlines without necessarily having read the full news story. Furthermore, an initial impression gained from the news headline is persistent such that its stance is known to remain even after reading the whole news content. Therefore, if a news headline does not correctly represent the news story—or is “incongruent”—it could mislead readers into advocating overrated or false information, which then becomes hard to revoke. This type of misinformation is widely known as clickbaits. The Data Science Lab (https://ds.kaist.ac.kr/, led by Prof. Meeyoung Cha) tackled this challenging problem. The team utilized deep learning approaches, which showed remarkable performance gains over traditional machine learning benchmarks. The team participated in a nation-wide AI challenge for detecting several types of online misinformation including news clickbaits. By building recurrent neural networks trained on a million-scale dataset gathered from domestic news portal services, the team discovered the potential of deep learning for the crucial problem and won the first place among college teams. The research team found that simple recurrent neural networks are unable to learn the complex textual relationship between the news headline and the full news content. The length of news article can reach up to thousands of words, leaving simple recurrent models suffer from representing numerous words into a fixed-size vector. To overcome this limitation, the team proposed a hierarchical architecture that can efficiently learn textual representation from a word-level to a paragraph-level of news article. When trained over a million-scale dataset, the models achieved a stellar accuracy of 0.977. This result is published in a paper, “Detecting Incongruity Between News Headline and Body Text via a Deep Hierarchical Encoder” at the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19). The team is currently developing a lightweight web interface that assists everyday news readers from avoiding clickbaits. The team is collaborating with AI experts in the field (Prof. Kyomin Jung at Seoul National University and Prof. Kunwoo Park at Qatar Computing Research Institute) to tackle real world problems and mitigate emerging threats of misinformation and fake news, towards building an environment for clean journalism and green web.
A research team in the KAIST School of Computing, led by Professor Junehwa Song and Professor Sung-Ju Lee, in collaboration with IBM Research, has introduced Zaturi, a system enabling parents to create an audio book for their babies by utilizing micro spare time. Micro spare time is tiny fragments of time with low cognitive load that frequently occur in our daily lives, such as waiting for an elevator, walking to a different building, waiting for public transportation, and so on. The team showed that putting together micro spare time helps a working parent to build a tangible symbol to conveying thoughts to a beloved baby and develop a feeling of parental achievement without compromising regular working hours. Zaturi, developed as a smartphone application, continuously senses the user’s activity and detects micro spare time in real time. When a moment of micro spare time is detected, Zaturi fetches a new unit of a micro recording subtask that would fit in the micro spare time, helping the user instantly and seamlessly continue the voice recording. Zaturi provides a situation-friendly interface to mitigate the social awkwardness of theatrical reading in public space. Zaturi is unique in terms of (i) building a gift for one’s child even while at work without compromising existing working hours by discovering and carefully arranging new spare time that otherwise one would be left unaware of and that would likely be wasted; (ii) not only making use of the parent’s micro spare time in a piecewise manner, but also pursuing incremental creation of a tangible outcome that the child can perceive and enjoy as a whole; and (iii) proposing a widely applicable service to the general population of working parents who commute daily, not only those work-separated over a long distance or across different time zones. The team implemented a fully working Zaturi system and deployed it to real users. Findings from the user study include the level of satisfaction of using Zaturi, the impact on work productivity, the effectiveness of intervention, and, most importantly, parents’ perceptions and self-awareness of micro spare time, which has implications for other creative ways of utilizing micro spare time to increase participation in parenting. In February of 2017, this work was published and presented at the 20th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW 2017), one of the most prominent conferences in HCI and Social Computing.
Professor Insik Shin and his research team from the Cyber Physical System (CPS) Lab introduced Mobile Plus, a novel mobile platform that allows multiple smart devices to easily share their various functionalities without modifying applications (apps). Recently, the popularization of smart devices has caused users to own a large number of devices, and types of functionalities provided by the devices have gradually become diverse, including photography, shopping, and healthcare. In such an environment, it is possible to create a variety of user experiences (UXs) if several devices share their functionalities. For example, it is likely to be insecure to carry out a payment service on a public device (e.g., a smart TV) when purchasing goods via a shopping app. However, we can safely shop on a public device if the payment is processed on a user’s personal device (e.g., a smartphone). In a similar way, we can also enjoy sensor-based games on a smart TV by borrowing sensors from a smartphone. Existing solutions to enable these exciting opportunities fall into two categories, app-level approaches that employ app cooperation and system-level approaches that design cross-device platforms. However, the former has a burden when developing specific apps for functionality sharing, and cannot support unmodified legacy apps. On the other hand, the latter enables unmodified apps to use the functionalities of other devices, but cannot support functionality sharing regardless of functionality types. Prof. Shin’s team presented Mobile Plus, a new mobile platform that provides a virtualized environment that that allows apps of different devices to be executed on the same device, which enables unmodified apps to share a wide range of functionalities across devices. To this end, Mobile Plus extends the existing remote procedure call (RPC) mechanism to multi-device environments. The RPC mechanism is an inter-process communication technique that allows an app to invoke a method of another app as if that method is a local method; most mobile platforms utilize this mechanism, so that apps can share not only system functionalities but also app functionalities within a single device. By extending the RPC mechanism, the research team has made it possible for unmodified apps to share functionalities from another device as if the same device were providing them, regardless of functionality types. The team developed Mobile Plus using two smart devices, a Nexus 6 smartphone and a Nexus 10 tablet, and demonstrated more than 20 cases of multi-device usage employing legacy apps commercialized in the Google app market. In particular, they showed that it is possible to share app functionalities such as Facebook login, Google play payment, and Adobe PDF viewer, as well as system functionalities like camera, sensor, and clipboard. Mobile Plus, because it can accelerate development of creative and useful applications to provide exciting user experiences, is expected to bring new possibilities for the smart device market. In recognition of this contribution, Prof. Shin’s team published and presented this work at MobiSys’17, a top-tier mobile computing conference, in June 2017.
Virtual memory has been one of the essential components of modern computer systems. Even if physical memory technologies have improved tremendously in the past decades, without proper support from virtual memory, abundant memory resources are not accessible for user applications. However, along with ever increasing memory capacity, demands for big data applications and address translation, critical steps for virtual memory support, have become a severe performance bottleneck. Although there have been many improvements in the system of address translation for virtual memory support, a common requirement of prior techniques is that the operating system must be able to consistently provide contiguous memory chunks suitable for each technique. However, recent studies show that such contiguous chunk allocation is not always possible, or that it can even degrade the performance of multi-socket NUMA systems. In addition to the common NUMA architectures, the emerging new memory architectures, such as 3D stacked DRAMs, network-connected hybrid memory cubes (HMC), and non-volatile memory (NVM), can further increase the non-uniformity in memory. Such increasing memory heterogeneity requires finegrained memory mapping to place frequently accessed pages in fast near memory, complicating the allocation of large contiguous memory chunks. To address the virtual support problem for future heterogeneous memory, Professor Jaehyuk Huh of the School of Computing and his research team developed a new translation architecture, called hybrid coalescing. With cooperation between the operating system and the hardware architecture, the proposed translation techniques can adapt to various memory allocation scenarios, which may change dynamically. Using the flexible translation capability, hybrid coalescing can exploit memory allocation contiguity as much as possible, even if the contiguity states are diverse. The study presents a new and efficient way of storing complex mapping information between the flat virtual space and heterogeneous physical memory space. This work was presented at the 44th International Symposium on Computer Architecture (ISCA) in June of 2017, one of the top conferences in computer architecture. The research team, consisting of Chang Hyun Park, Taekyung Heo, Jungi Jeong, and Professor Jaehyuk Huh, expanded virtual memory support for future hybrid systems with DRAM and nonvolatile memory, and for security enhancements in hardware-based trusted execution environments.
Compact Hyperspectral Imaging at Low Cost Novel method enables compact, single-shot hyperspectral imaging using just a prism With hyperspectral imaging, photographers can obtain super fine detailed images, capturing the spectrum for each pixel in an image of a scene. This technology has wide reach and is being applied in fields such as military combat, astronomy, agriculture, biomedical imaging and geoscience. Scientists, for instance, rely on hyperspectral imaging to observe and analyze materials for mining and geology, or for various applications in the medical field. However, hyperspectral imaging systems are expensive—ranging from ＄25,000 to ＄100,000—and require complex specialized hardware to operate. A team of computer scientists from KAIST, South Korea, and Universidad de Zaragoza, Spain, has devised a way for low-cost accurate hyperspectral imaging, ridding of expensive equipment and complex coding. This novel, compact single-shot hyperspectral imaging method captures images using a conventional DSLR camera equipped with just an ordinary refractive prism placed in front of the lens. The new, user-friendly method was tested on a variety of natural scenes, and the results, according to the researchers, compared well with current state-of-the-art hyperspectral imaging systems, achieving quality images without compromising accuracy. The team will present their new method at SIGGRAPH Asia 2017 in Bangkok, 27 November to 30 November. This annual conference and exhibition showcases the world’s leading professionals, academics and creative minds at the forefront of computer graphics and interactive techniques. “These hyperspectral imaging systems are generally built for specific purposes such as aerial remote sensing, or military applications, and as such they are not affordable nor practical for ordinary users,” said Min H. Kim, associate professor of computer science at KAIST and a lead author of the study. “Our system requires no advanced skills, and we are able to obtain hyperspectral images at virtually full resolution while making hyperspectral imaging practical.” Kim’s collaborators include Diego Gutierrez, associate professor at Universidad de Zaragoza; Seung-Hwan Baek, computer science PhD student at KAIST; and Incheol Kim, researcher at KAIST in Min H. Kim’s lab. A hyperspectral image can be described as a three-dimensional cube. The imaging technique involves capturing the spectrum for each pixel in an image; as a result, the digital images produce detailed characterizations of the scene or object. Since the researchers’ new setup operates without the typical coded aperture mask and professional setup with large optical components, available spectral cues are limited. To this end, the researchers developed an image formulation model that predicts the perspective projection of dispersion (splitting light into a spectrum), yielding the dispersion direction and magnitude of each wavelength at every pixel. Their technique also comprises a novel calibration method to estimate the spatially varying dispersion of the prism. It enables users to capture spectral information without requiring a large system setup with various optical components. Lastly, their reconstruction algorithm estimates the full spectral information of a scene from sparse information, addressing edge restoration of the scene being captured, gradient estimation and the spectral resolution of the image. In the study, the researchers compare the predictions of their dispersion model with those of professional optics simulation software. They place a prism in front of a 50 mm lens of a digital camera, and capture a point at a distance of 700 mm. Dispersion at every pixel is accurately predicted by their method, producing comparable results to professional physical simulation of light transport. In future work, the team plans to address the system’s current sensitivity to noise as well as performance limitations due to lighting and surfaces without edges of a scene or object.