I am interested in modeling human behavior using statistical/simulation modeling and machine learning techniques. Specifically, I have focuses on human mobility, its impact on people and everyday life (e.g., traffic, disease spread) and human behavior in cybersecurity. I use publicly accessible data sources (e.g., Twitter, WIGLE.net, US Census) to tackle these problems. On the computational side, I develop frameworks using mobility data to support activities from data collection and modeling to visualizing results. You can see a list projects and tools that I've developed or see my publications here. The list below requires some updates.
Projects & Tools
1. Human Mobility Simulation & Analysis Framework (HMSAF)
Human mobility is a significant factor when it comes to understanding, problem solving, decision-making in wide range of areas including healthcare, transportation, urban planning, and public safety among others. The objective with this project is to develop a human mobility simulation framework that facilitates 1) creating an individual agent mobility model using individual-level location footprints collected from real-world (i.e., Twitter), 2) facilitating capabilities for extending the base model through the theories human behavior and research focus, 3) providing experimentation capabilities such as what-if analysis, and 4) conducting analysis on mobility data. Overall, the goal is to answer research questions or assist decision-makers.
Here are some screenshots from models developed using HMSAF.
A Simulation of a Flu-Like Disease SpreadThis model simulates how a hypothetical flu-like disease would spread given real footprints of people. The area shown in the map is Washington, DC and location traces of individuals are gathered from Twitter. While the human mobility in this model is data-driven with tweets, the dynamics of the disease spread is modeled using well-known SEIR (Suspectible, Exposed, Infectious, and Recovered) model. This approach facilitates the combination of data and theory in the same model. Model parameters are: 10 initially infected people, 300 meter potential contact distance, and 10% chance of infecting contacted people. The graph seen in the model is comparable to the literature.
Attraction Mobility NetworkThis model relies on twitter data when it comes to understanding the attraction visit mobility of people. Attraction visits are extracted according to person's tweet location and closeby venues around. Here, venues are gathered from Google's Places API by scanning Washington, DC map locations covering all the area. People's proximity to attractions is the main factor when determining whether that attraction is visited. Here below, you can see a network of attractions gathered from same-day visits of individuals. Link weight indicates the frequency of hops between places while the intensity of nodes indicates number of visits.
Analyses of Geo-Located Twitter DataFollowing images visualize some analyses conducted on geo-located Twitter data. A paper is forthcoming explaning these visuals in detail.
Twitter user distribution with respect to the US population. While overall user distribution correlates (p>.90) with actual population, the states of CA, FL, DC, an NY are overrepresented and the states of ID, MO, SD, and WY are underrepresented.
The following image shows tweeting temporality distribution for different twitter user groups based on number of tweets. As it can be seen in the log-log scale, the data for all groups follow a log-normal-like distribution for up to 24-hour period. Further, these groups tend to have differently shaped tails, like a power-law distribution with different exponents. While the log-normal looking side of the graph has very similar shapes, tails show that more frequently posting twitter users' inter-tweet time is shorter.
Twitter postings by hour. The clear pattern is that people tend to tweet more after 7pm and tend to tweet less after 1am. tweeting frequency is almost constant from noon till evening.
Individuals tend to visit places that they previously visited such as home or work locations. Moreover, these visits are periodic (see: Gonzalez, Hidalgo, and Barabasi ). The following image is a visualization that shows periodic visiting behavior of Twitter users from Washington, DC. The blue dotted line shows the probability of visiting the same location after some hours, also named as first pessage time of a place. With this voluntary Twitter data, it is clear to see the periodic visiting behavior is present. Periodicity appears as 24-hour intervals. The red line would be the probability distribution if individuals visit places randomly. In other words, this graph shows that we are not random at all, at least when it comes to mobility.
Zipf's law, in general terms, indicates that frequency of a quantity is inversely propotional to its rank. Applying to Twitter data, the following graph shows that Zipf's law is present in geo-located Twitter data for Washington, DC regardles of number of unique locations a person visits.
Cloudes is a cloud-based discrete-event simulation development tool that’s solely operating on browser in the front-end and cloud-based infrastructure at the back-end. I designed the initial software architecture in 2013. A master’s student from Computer Science Department at ODU helped building the initial interface. Later, Anthony M. Barraco took the lead on development and made significant improvements on the project. This project is active and led by Dr. Jose J Padilla. I am still making contributions to different parts of the project. Dr. Saikou Y. Diallo and Chris J. Lynch are other members of the team. You can test the tool at cloudes.me.
3. M&S Cube
M&S Cube is a smart phone and tablet app that serves as a gentle introduction to the emerging field of modeling and simulation. I developed the first version of the iPad app in 2012 and also ported the app to iPhone platform in 2013. Other contributors are Anthony M. Barraco who developed the second version of iPad app and Android version and Anitam who helped porting the app to iPhone platform. The project was led by Dr. Jose J Padilla and Dr Saikou Y Diallo. You can download the app using the links below.
4. Some web-based simulations and tools