Skip to main content

A recent UF study exposes the caveats of using big data in multimodal transportation planning

By Ines Aviles-Spadoni, M.S., M.A., Research/Communications Coordinator, UFTI

Happy student with backpack riding an eco-friendly electric scooter, sketched city in the background Adobe stock image
Image source: Adobe Stock

Meet Sophia. Sophia is a fictitious character created to help illustrate a typical commuter and transit rider. She lives in a large city. Her morning commute includes riding an e-scooter to the nearest subway station, stopping to buy a cup of coffee and catching the subway to work.  

That ride on her e-scooter from her home to the transit station is what researchers call the first mile, last mile (FM/LM). But in many studies, the way these FM/LM trips are usually counted is not always aligned with how people actually travel. That’s because the current way to count these trips relies on buffer-based methods, which assign a zone around a transit station to estimate where trips begin or end. So, any trips originating and ending inside the zone are counted as connecting to transit, and all those outside are not. To put it plainly, if Sophia’s favorite coffee shop just so happens to be outside of the zone, then her trip will not be included.

A study led by Xiang ‘Jacob’ Yan. Ph.D., assistant professor, UF Department of Civil Engineering, and his colleagues accessed data from user-reported transit-connecting e-scooter trips to build a ground truth model. Since these trips were directly reported by the riders, they have higher validity than those that rely on the buffer-based method.

“To develop effective strategies for addressing the ‘first mile–last mile’ challenge in public transportation, we need reliable data on the first- and last-mile trips that people actually take,” Yan said. “However, these data are difficult to collect, so researchers often rely on estimates derived from other available datasets. Our work shows that existing methods for estimating first- and last-mile trips are problematic and can introduce significant biases into research findings.”

The image shows that at over 50 % of the metro stations, spatial-buffer inferred trips have a close estimate compared to reported trips. In the city center, overestimation (green points) and underestimation (blue points) are interspersed. Underestimation is predominantly observed in the city’s outskirts, where individuals may be required to travel a considerable distance, often extending beyond the predetermined buffer area, to access a shared e-scooter. (Image/source: https://doi.org/10.1016/j.trd.2025.104977)

Yan and his research team found FM/LM transit-connecting e-scooter trips were primarily influenced by the physical surroundings rather than by rider demographics. More connecting trips occur in areas with more transit stops, closer to downtown, on better-connected streets, and in university areas. Residential neighborhoods and areas with more intersections saw fewer connecting trips. The rider-reported data also revealed that neighborhoods with fewer e-scooters and less safe streets saw lower e-scooter-to-transit use.

“Although our study focuses on transit and shared micromobility, its implications extend to other transportation subfields and many areas of research,” Yan said. “The broader lesson here is that while large datasets can be incredibly valuable for scientific advancement, researchers must be cautious about the assumptions they make and the potential biases these assumptions can introduce.”

Overall, the findings suggest that the current buffer method can lead to unrealistic e-scooter-to-transit travel patterns. If thousands of trips are counted this way, the results can be skewed, leading city planners and policymakers to make decisions based on incomplete data.

As Sophia leaves work and heads to the subway station, that last-mile trip may fall outside of the zone. But through Yan’s self-reported rider data methodology, trips such as hers will no longer be unaccounted for, providing city planners and policymakers with a better tool to design FM/LM routes and connections for users like her.