Last updated on Jan 11, 2021.
My research involves two related areas: Modeling & Simulation (M&S) and Data Science. If we describe this relationship as a Venn diagram, as shown below, my primary research efforts focus on areas 1 and 2. On the M&S focused research side (area 1), I tackle challenges related to core M&S topics, including verification and validation, conceptual modeling, and M&S tools. My second prominent focus is on using Data Science for M&S (area 2). Particularly, I design and use data-driven simulations, conduct simulation output analytics, and use emerging machine learning techniques in different steps of the M&S process. While limited, my data science-only focused research (area 3) involves creating and using data science techniques (e.g., machine learning) to solve problems in different domains. Cybersecurity and urban science are the main application domains for my research.
Here is a list of research projects that I am involved as a participant, mentor, or lead and are highlighted according to the schema colors above. Click on the title to see the details.
Data-Driven Modeling of Agents
Data-driven mobility models for COVID-19 simulation
Disease spread is heavily influenced by human mobility. In this work, we captured human mobility in a data-driven manner based on Latent Dirichlet Allocation (LDA) fed by SafeGraph mobility data and simulated the spread of COVID-19. In our novel approach, LDA treats POIs as "words" and agent home census block groups (CBGs) as "documents" to extract "topics" of POIs that frequently appear together in CBG visits. These topics allow us to simulate agent mobility based on the LDA topic distribution of their home CBG. We compared the LDA based mobility model with competitor approaches including a naive mobility model that assumes visits to POIs are random. This is the first study in the series of several data-driven COVID-19 models we have been developing since the summer of 2020.
We have recently witnessed the proliferation of large-scale behavioral data that can be used to empirically develop agent-based models (ABMs). Despite this opportunity, the literature has neglected to offer a structured agent-based modeling approach to produce agents or its parts directly from data. In this paper, we present initial steps towards an agent-based modeling approach that focuses on individual-level data to generate agent behavioral rules and initialize agent attribute values. We present a structured way to integrate Big Data and machine learning techniques at the individual agent-level. We also describe a conceptual use case study of an urban mobility simulation driven by millions of geo-tagged Twitter social media messages. We believe our approach will advance the-state-of-the-art in developing empirical ABMs and conducting their validation. Further work is needed to assess data suitability, to compare with other approaches, to standardize data collection, and to serve all these features in near-real time.
SpringSim 2018 Paper
Application: The Spread of Wi-Fi Router Malware
This study revisits a Wi-Fi malware spread model by Hu et al. [2009, PNAS, 106(5)] with current Wi-Fi router data from WiGLE.net and a refined data selection method. We examine the temporality and scale of the malware spread applying these two updates. Despite ≈88% WPA adoption rate, we see a rapid malware spread occurring in a week and infecting ≈34% of all insecure routers (≈5.4% of all) after two weeks. This result is significantly higher than the original study projection. It occurs due to the increased use of Wi-Fi routers causing a more tightly connected graph. We argue that this projected risk can increase when current vulnerabilities introduced and connected devices are considered. Ultimately, a thorough consideration is needed to assess cybersecurity risks in Wi-Fi ecosystem and evaluate interventions to stop epidemics.
Communications and Networking Simulation Symposium
How can social media data be used in agent-based simulations?
This study briefs on current research efforts pertaining to the use of social media data to provide empirical grounding of agent-based simulations. Three examples of how data from social media can be used in agent-based modeling are presented: 1) using large data set processing and sentiment analysis to identify preferences of a population (initialization of an agent population), 2) using agents with machine learning capabilities to learn mobility patterns from individuals in a population (initialization of individual agents in a population), and 3) identifying preferences and communication patterns based on graph analysis (agent relation). Current research indicates that these techniques show promise for creating smart agents to complement those based on complex rule-based behavior, especially using a simulation's what-if capabilities.
SpringSim 2014 Paper
Social Media Analytics
Ever wondered how tourists feel in their attraction visits?
This study proposes a sentiment-based approach to investigate the temporal and spatiotemporal effects on tourists’ emotions when visiting a city’s tourist destinations. Our approach consists of four steps: data collection and preprocessing from social media; visitor origin identification; visit sentiment identification; and temporal and spatiotemporal analysis. The temporal and spatiotemporal dimensions include day of the year, season of the year, day of the week, location sentiment progression, enjoyment measure, and multi-location sentiment progression. We apply this approach to the city of Chicago using over eight million tweets. Results show that seasonal weather, as well as special days and activities like concerts, impact tourists’ emotions. In addition, our analysis suggests that tourists experience greater levels of enjoyment in places such as observatories rather than zoos. Finally, we find that local and international visitors tend to convey negative sentiment when visiting more than one attraction in a day whereas the opposite holds for out of state visitors. Below you will see some interesting results we gathered.
PLOS ONE Article
Predicting People’s Home Location From Sparse Footprints
This study develops a machine learning classifier that determines Twitter users' home location with 100 meters resolution. Our results suggest up to 0.87 overall accuracy in predicting home location for the City of Chicago. We explore the influence of time span of data collection and location-sharing habits of a user. The classifier accuracy changes by data collection time but larger than one-month time spans do not significantly increase prediction accuracy. An individual's home location can be ascertained with as few as 0.6 to 1.4 tweets/day or 75 to 225 tweets with an accuracy of over 0.8. Our results shed light on how home location information can be predicted with high accuracy and how long data needs to be collected. On the flip side, our results imply potential privacy issues on publicly available social media data.
SBP-BRIMS 2018 paper
Human Mobility Analysis
The following image shows tweeting temporality distribution for different twitter user groups based on number of tweets.
As it can be seen in the log-log scale, the data for all groups follow a log-normal-like distribution for up to 24-hour period.
Further, these groups tend to have differently shaped tails, like a power-law distribution with different exponents.
While the log-normal looking side of the graph has very similar shapes, tails show that more frequently posting twitter users' inter-tweet time is shorter.
Individuals tend to visit places that they previously visited such as home or work locations. Moreover, these visits are periodic (see: Gonzalez, Hidalgo, and Barabasi ).
The following image is a visualization that shows periodic visiting behavior of Twitter users from Washington, DC.
The blue dotted line shows the probability of visiting the same location after some hours, also named as first pessage time of a place.
With this voluntary Twitter data, it is clear to see the periodic visiting behavior is present.
Periodicity appears as 24-hour intervals.
The red line would be the probability distribution if individuals visit places randomly.
In other words, this graph shows that we are not random at all, at least when it comes to mobility.
Zipf's law, in general terms, indicates that frequency of a quantity is inversely propotional to its rank.
Applying to Twitter data, the following graph shows that Zipf's law is present in geo-located Twitter data for Washington, DC regardles of number of unique locations a person visits.
This model relies on twitter data when it comes to understanding the attraction visit mobility of people.
Attraction visits are extracted according to person's tweet location and closeby venues around.
Here, venues are gathered from Google's Places API by scanning Washington, DC map locations covering all the area.
People's proximity to attractions is the main factor when determining whether that attraction is visited.
Here below, you can see a network of attractions gathered from same-day visits of individuals.
Link weight indicates the frequency of hops between places while the intensity of nodes indicates number of visits.
Simulation Data Analytics
A Patterns of Life Simulation to Generate Large Mobility Datasets
Urban life is a complex phenomenon affected by human preferences, human behavior, and urban geography, among other factors. Agent-based models allow us to study urban life from a bottom-up perspective by capturing individuals, their actions, and interactions. In this study, we report our development of an agent-based model that simulates the patterns of urban life including daily commutes and recreational activities. We base our model on well-known theories of human behavior. We show that our model re-creates stylized facts about movement patterns and social network degree distributions. Such a model opens the door to study urban phenomena such as housing market fluctuations.
Enhancing Verification and Validation through Recent Data Science Practices
Verification and Validation (V&V) is one of the main processes in simulation development and is essential for increasing the credibility of simulations.
Due to the extensive time requirement and the lack of common V&V practices, simulation projects often conduct ad-hoc V&V checks using informal methods.
In this study, we propose a novel Verification and Validation platform that can handle large scale simulation output data and allows conducting tests on such data.
The platform relies on a seamless integration of web technologies, data management, discovery & analysis techniques pertaining to V&V, and cloud computing.
A proof-of-concept implementation that automatically makes simulation results available for V&V tests is being implemented.
We believe that this data platform will be an indispensable tool for novice to expert modelers in evaluating and conveying the credibility of their simulations.
Simulation Output Visualization for Enhancing Verification and Validation
Verification and validation (V&V) techniques commonly require modelers to collect and statistically analyze large amounts of data which require specific methods for ordering, filtering, or converting data points. Modelers need simple, intuitive, and efficient techniques for gaining insight into unexpected behaviors to help in determining if these behaviors are errors or if they are artifacts resulting from the model's specifications. We present an approach to begin addressing this need by applying heat maps and spatial plots to visually observe unexpected behaviors within agent-based models. Our approach requires the modeler to specify hypotheses about expected model behavior. Agent level outputs of interest are then used to create graphical displays to visually test the hypotheses. Visual identification of unexpected behaviors can direct focus for additional V&V efforts and inform the selection process of follow-on V&V techniques. We apply our approach to a model of obesity.
Swarmfest 2017 Paper
Simulation of Cybersecurity
- Current Status and Future Challenges
- Assessing the Impact of Cyberloafing on Cyber Risk
- Towards Modeling Factors that Enable an Attacker
- A characterization of cybersecurity simulation scenarios
Past Research Projects
Cloudes is a cloud-based discrete-event simulation development tool that’s solely operating on browser in the front-end and cloud-based infrastructure at the back-end.
I designed the initial software architecture in 2013. A master’s student from Computer Science Department at ODU helped building the initial interface.
Later, Anthony M. Barraco took the lead on development and made significant improvements on the project.
This project is active and led by Dr. Jose J Padilla. I am still making contributions to different parts of the project.
Dr. Saikou Y. Diallo and Chris J. Lynch are other members of the team. You can test the tool at cloudes.me
CLOUDES' first Wintersim paper
M&S Cube is a smart phone and tablet app that serves as a gentle introduction to the emerging field of modeling and simulation.
I developed the first version of the iPad app in 2012 and also ported the app to iPhone platform in 2013.
Other contributors are Anthony M. Barraco who developed the second version of iPad app and Android version and Anitam who helped porting the app to iPhone platform.
The project was led by Dr. Jose J Padilla and Dr Saikou Y Diallo. You can download the app using the links below.