"The Future of the Construction Industry Opened Up by 'Physical AI' — A Conversation between Zen Intelligence and ZVC"

by Taku Uchimaru

Digitization of physical space and reconstruction through AI—

Zen Intelligence Inc., a company leading the transformation of the construction industry, deploys "zenshot" utilizing proprietary 3D Vision technology and construction-specialized foundation models, realizing new ways of working at over 100 sites to date.

This time, in conjunction with our fundraising, we conducted a dialogue between Zen Intelligence and Z Venture Capital. We explore the company's challenge to envision an AI-native industry, starting from on-site data.


Zen Intelligence Inc.

A Physical AI startup developing Spatial Intelligence that perceives, reasons, and acts within three-dimensional physical spaces and their temporal changes. Since its founding, the company has consistently worked on developing and providing AI and robotics technologies based on on-site data, and currently deploys "zenshot," a construction AI product centered on 3D Vision and foundation models for the construction industry. Through Physical AI, the company aims for "Re-Industrialization," reconstructing the very nature of industries into an AI-native form.



【Interview Participants】

Zen Intelligence Inc. CEO Hiroki Nozaki

Majored in information engineering at Keio University and graduate school. Selected for the Exploratory IT Human Resources Project. Researched and developed soft robots that move over uneven terrain while deforming, with multiple papers accepted and presented at international robotics conferences including IEEE IROS. After graduation, worked at Arthur D. Little Japan supporting the formulation of new business strategies and mid-to-long-term strategies for manufacturing companies. Founded Zen Intelligence Inc. (formerly: SoftRoid). 5th dan in kendo. Worked as an apprentice site supervisor at a construction company for several months, where he gained the inspiration for a service that solves on-site challenges through AI and hardware technology.

Zen Intelligence Inc. CTO Taketo Yoshida

Majored in intelligent machinery and informatics at the University of Tokyo and graduate school. Conducted research on robot control using deep reinforcement learning and received the Best Paper Award at IEEE AIKE. After graduation, worked at DeepX Inc., a University of Tokyo-originated AI startup, engaging in construction machinery automation projects and developing algorithms and simulators. Also worked on open-source development for deep reinforcement learning. Co-founded Zen Intelligence Inc. (formerly: SoftRoid), performing full-stack development of AI/Web/data pipelines while deepening understanding of on-site operations through site support and sales activities.

Z Venture Capital Principal Taku Uchimaru

Taku Joined Z Venture Capital in May 2022. Prior to Z Venture Capital, Taku engaged in supporting business growth, new business creation, and business revitalization for startups to enterprise companies at Industrial Growth Platform, Inc. (IGPI). Taku started his career at General Electric in Corporate Planning Division. Taku received his Master’s degree in Informatics from Kyoto University.



The Origin of Founding — Challenge to Intelligent Real World


Uchimaru: Thank you for this dialogue opportunity in connection with our investment in Zen Intelligence. Actually, my dialogue with Nozaki has also been published in Forbes Japan, so I'd be happy if you could read that as well.

Let's get right to it—could you tell us what motivated you both to take on this challenge in this domain?


Nozaki: I had a vague interest in the convergence of information space and physical space, and from the perspective of integrating cyberspace and real space as it was called at the time, I progressed through IoT and hardware to robotics research.

When I started research on soft robotics, I came across a paper that had a profound impact on me—its opening stated'intelligence requires a body,' and this concept remains at my core today. With the desire to solve real-world and real-society challenges using robotics, I explored various possibilities and ultimately decided to tackle the challenges of the construction industry, a major sector.


Yoshida: I myself was very interested in "intelligent real world." During my graduate school days, Google's AI "AlphaGo" was gaining attention. It used a method called reinforcement learning, which is also used in current LLMs, and defeated professional Go players through repeated self-play. Around the same time, research was advancing on controlling robots with reinforcement learning, and after much trial and error, humanoid and quadruped robots were able to walk.

Looking at the web domain as well, people with machine learning backgrounds were working on advertising optimization at companies like Google and Meta. But somehow, I personally couldn't find much interest in that.

On the other hand, I felt that intelligent real world was truly a blue ocean, an unexplored territory. Japan is a country at the forefront of challenges, and with its high level of technology in manufacturing and construction, I thought we could apply AI to these areas and expand globally. That was the beginning of my career and the trigger for founding the company.


Uchimaru: You mentioned creating a new market—could you tell us why you chose the construction industry, and if you have any episodes from when you started the company?


Nozaki: While interviewing various industries, I had the opportunity to speak with people in the construction industry, and during those interviews, all I heard were challenges. Seeing this situation, I strongly felt, "Ah, there are significant challenges in this domain." After that, I actually entered the field and experienced those challenges firsthand.


Yoshida: I still remember this, but we were able to get a phone call with a site supervisor from a major general contractor, and they said, "Let's do it from 10:30," but that turned out to be 10:30 at night. And despite it being late at night, they were clearly joining the Zoom call from the site office. At that time, construction SaaS companies were going public or becoming near-unicorns, and people were saying that "construction industry challenges are largely being solved," yet seeing the reality that on-site personnel were responding from their offices at 10:30 PM, I strongly felt that on-site challenges were not being solved at all.


Nozaki: I thought it would be difficult to suddenly jump into the industry, so I initially embedded myself as an apprentice site supervisor at a construction site in my hometown of Kagawa for about 3-4 months. I went to the site early in the morning, unlocked the gates, started with morning radio exercises—exactly as you'd imagine.

I worked alongside craftsmen, helped with overtime, and developed after returning home. Through this, I cultivated not only technical skills but also an awareness of creating something that would actually be used on-site.

Mr. Nozaki entering as an apprentice at a construction site [Left] / Mr. Yoshida doing door-to-door sales at construction sites [Right]


Uchimaru: As for the product, where did you start initially?


Nozaki: Initially, we started by introducing robots to construction sites to collect data. Specifically, we focused on developing spatial AI technology to autonomously operate robots within the three-dimensional space of construction sites. However, as we advanced development, it became clear that robots alone would be difficult to scale as a business, so we pivoted toward a solution that anyone working on-site could easily use. The result was zenshot, which we currently provide.

zenshot is a construction AI product centered on 3D Vision and foundation models that realizes efficiency and automation of construction management operations at construction sites. From 360-degree video data of the site, AI automatically structures the site conditions.

In addition to improving operational efficiency through visualization of site conditions, AI captures temporal changes in schedule, safety, and quality, and extracts insights. Based on these, AI agents that make judgments and take actions realize labor-saving and automation of construction management operations.

(From zenshot – Transforming construction sites with AI and 360-degree cameras)


Uchimaru: Regarding having craftsmen take photos, how were the adoption reactions and uptake? Wasn't it difficult?


Nozaki: As we advanced R&D and sales in parallel, we discovered that there were obstacles during on-site photography that robots couldn't overcome (such as materials blocking passages). And as you mentioned, Uchimaru-san, we initially thought, "Craftsmen won't cooperate with photography, so let's automate it with robots."

However, craftsmen were already taking partial photos for reporting purposes to some extent. So when we actually asked craftsmen to cooperate with photography, the majority were cooperative, and unexpectedly many readily agreed to take photos.


Yoshida: The strength of our system is that while sophisticated 3D Computer Vision technology (SLAM and SfM) runs in the background, users simply need to walk around holding a camera—a very simple operation. If complex operations were required, many people wouldn't use it. The key point was enabling people to leverage advanced technology without even realizing it, while the users themselves only need to perform simple tasks.


Nozaki: Previously, complicated operations were required, such as pairing a 360-degree camera with a smartphone and installing an app—difficult even for young people. However, since there were many craftsmen in their 70s who said things like "I only use a flip phone" or "I can't do pairing," we improved it to a simple mechanism where you just press a button and it automatically communicates via Bluetooth to start and end photography.

Data is automatically downloaded and uploaded, and in the background, 3D Computer Vision technology automatically generates a street view. UX design at a level where anyone who can use a mobile phone can use it was extremely important, and I think that led to the result of 70-80% of people cooperating.


zenshot's Track Record and the Vision Behind the Company Name Change


Uchimaru: I'd also like to ask about your track record. To what extent has zenshot been adopted currently?

Nozaki: Currently, zenshot has been adopted by over 100 companies cumulatively and is being utilized at thousands of sites.

As for adoption effects, travel time has been reduced by more than 50%, and high-quality management has become possible without site visits. It's also bringing significant changes in terms of work style reform, overturning the conventional wisdom that "you can't understand the situation without going to the site" and realizing remote management. This is making flexible work styles possible that were previously unthinkable in the construction industry, such as shortened work hours for child-rearing.


Yoshida: Site supervisors in the residential sector often manage 10-15 sites simultaneously, and since they also have distant sites, they spend one-third to half of their day traveling by car. Since they can't use computers or smartphones while traveling, this is wasted time from an operational efficiency perspective.

When managing many sites, they can only visit each site once a week, and when they go to a site after a week, construction has progressed significantly and rework becomes necessary. By utilizing zenshot, travel time can be reduced, and since they can check sites daily, mistakes can be prevented early.

In the construction industry, it has become the norm for site supervisors to manage everything alone—a "one-person operation"—resulting in extremely high turnover rates among young workers. Meanwhile, veteran employees possess extensive knowledge and experience, but face challenges in frequently visiting sites due to changes in their life stages.

In this context, digitalization of construction sites enables team-based management, allowing veterans to remotely check and guide newcomers' sites. This not only ensures management quality but also reduces the mental burden on newcomers while maximizing the utilization of veterans' valuable experiential knowledge.


Uchimaru: Thank you. I understand that zenshot has already been adopted by about 100 companies, so I'd like to ask about the team managing this operation. How many employees does the company currently have?


Nozaki: Our current organization consists of approximately 20 full-time employees, so we're still in the founding stage. What's distinctive is that we've gathered talented engineers, primarily two types of specialists.

One type is robotics specialists—members who share our passion for robotics technology and 3D Computer Vision. The other type is AI specialists, including two Kaggle Masters (*Kaggle Master: a title given to users with certain achievements in Kaggle rankings), who joined because they resonated with the value of our proprietary data. They're attracted to our unique data sources, the means to access them, and the future AI products to be built upon them.

Few companies possess both robotics and AI technology while also holding proprietary data, and this is our major strength. In particular, having three-dimensional spatial data is currently our key differentiator.


Uchimaru: Having proprietary data is incredibly powerful, isn't it?


Nozaki: This is definitely a point we want to emphasize.


Uchimaru: There's one more thing I'd like to ask about—I'd like to hear your thoughts on the recent company name change.


Nozaki: We took this opportunity to change our company name and newly establish our Purpose. We changed our name from SoftRoid to Zen Intelligence and are now operating as a company focused on Physical AI.

As background, AI proliferation is currently driving rapid efficiency improvements in white-collar domains such as office work. From areas traditionally said to be replaceable by IT and AI, we're now seeing AI replacing even sales, consulting, design, and development domains due to the recent rise of generative AI.

On the other hand, the physical domain remains untouched because the data itself doesn't exist. Even if you try to use generative AI to improve on-site work efficiency, it's difficult because there's no data and the generative AI technology to process it hasn't been established either. We've focused on this data shortage and want to transform industries by leveraging AI and robotics technology, centered on data from physical spaces and physical work.


Uchimaru: So up until now you've primarily focused on acquiring site data, but going forward you want to go beyond that, is that right?


Nozaki: Exactly. Our core technologies have been Spatial Intelligence up to now, but going forward we plan to deploy three technologies: Operational Intelligence and Physical AI Agent in addition to that.

Through this, we want to realize operational efficiency and automation in physical domain sites that have depended on individual knowledge and skills, using three-dimensional space and its temporal changes as context.

Specifically, we aim to create AI-native construction sites and construction operations through Physical AI, ultimately working toward unmanned construction sites.

Until now, site supervisors would physically walk the site to check and make decisions, but now zenshot has digitalized the site space, enabling remote checking and decision-making.

The next step is for Physical AI agents to patrol and make decisions within this digitalized site space. We aim to realize site management where AI recognizes the site, makes decisions, and gives instructions just like a site supervisor would.

To realize this, we're developing a foundation model for VLM (Vision-Language Model) specialized in construction, and we're receiving grants from NEDO.


Uchimaru: In simple terms, it's an AI agent that functions digitally, with a concept to also develop it into a physical AI agent (robot) that operates on actual sites. That's amazing.


Yoshida: Until now, the physical world, physical spaces, and physical operations haven't been digitized and couldn't be included in AI context. zenshot digitizes these, structurally analyzing everything—positions on blueprints, dates, materials being delivered—and makes them available for AI utilization.

This technology provides both a "bird's-eye view" and a "bug's-eye view," allowing you to see the entire site from above while also zooming into detailed parts. This makes automation and support of site supervisor operations possible.


Fundraising Background and Expected Synergies with LY Corporation



Uchimaru: Next, I'd like to touch on why ZVC invested.

Generative AI has greatly benefited white-collar workers and driven their evolution. However, even in this context, Japan still lacks sufficient manpower, and those having the hardest time are blue-collar workers—the so-called essential workers. Why can't they benefit from AI? I believed that AI for these people is absolutely necessary.

I've been thinking about where to invest within this essential worker domain, and the construction industry is an extremely large market in Japan. This large market perfectly aligned with the issues I'd been concerned about. That was the first major point.

And now, while generating revenue, you're also looking toward a big goal. Although there's quite a gap between profitability and technology, the path forward is clearly visible. The destination you're aiming for is truly excellent, and the way you're generating revenue along the way—in other words, the mix of business and technology—was very good. That was the second point.

The third point overlaps with what we've been discussing, but it was that you have a team capable of making this happen. It doesn't work without people who can do both business and technology. It's extremely important that the leaders understand technology; otherwise, I don't think you can take on cutting-edge technical challenges. To summarize, those are the three points.


Nozaki: It makes me very happy to hear you say that.


Uchimaru: We have investment teams in Korea and the US as well, and we often discuss which themes to invest in which regions. To leverage Japan's unique strengths, we believe specialized AI is more important than general-purpose generative AI. We think the most critical area for Japan to specialize in is data—particularly physical data from actual sites. For Zen Intelligence, this means construction site data. We believe that if you focus on AI specialized in data with high locality that exists only in specific places, high physicality, and tangible reality, you can maintain competitiveness.


Nozaki: When I first spoke with Uchimaru-san, Kay (ZVC's Managing Partner) from ZVC was also present. Until then, we were mainly thinking about SaaS deployment, with plans to develop AI within that framework in the future. Our pitch deck and equity story were based on that thinking, but after dialogue with Kay-san, we completely renewed our direction.

The shift toward "being a company with AI at its core while also reliably generating revenue through zenshot" was a major turning point for me personally. Being able to have such intensive discussions in a short time was one of the deciding factors in ultimately choosing ZVC. The fact that you deeply understood the essence of our efforts and competitive strategies for Japan and Korea from a global perspective was extremely valuable.


Yoshida: When we were consulting with other VCs, discussions often proceeded along the SaaS formula. They tended to focus on sales strategies and "Rule of 40" type discussions. We ourselves had many discussions where we were biased into the framework of "SaaS with AI on top," but ZVC started with the Physical AI vision and had essential discussions about "how do we actually realize this?" which I found very attractive.


Uchimaru: Both Kay and I felt that we need to constantly pursue this Physical AI concept. It seems we had the hypothesis that "startups in non-American Asian regions like Japan and Korea must compete in this domain to win. Conversely, they can win in this domain." The fact that our thinking aligned like this was very good.


Nozaki: From my perspective as well, regarding what I felt during the initial meeting—there were two other things I thought were good when listening during the first meeting. First, you had just launched Fund II of approximately 30 billion yen. I was attracted to your stance of firmly creating synergies and engaging in such initiatives.


Uchimaru: Your have shown strong interest in NAVER LABS, haven't you?


Nozaki: While we're advancing our own proprietary technology development, NAVER LABS is similarly pursuing their own technology. They have world-class technical capabilities and are achieving extremely outstanding results.

To elaborate on this point, at our core is spatial intelligence—three-dimensional space. More specifically, we're tackling the challenge of how to understand temporal sequences including the time axis.

Conventionally, various 3D computer vision methods including SfM have been used, but "DUSt3R," released by NAVER LABS Europe and others in December 2023, attracted attention for its approach to reconstructing 3D from image sets with unknown cameras and unknown poses, and was also accepted to CVPR 2024. Including subsequent achievements like MASt3R, the momentum in 3D understanding is accelerating.

We anticipate that 3D Foundation Models will become increasingly important going forward, and we'll tackle challenges while leveraging them. The 3D domain has been pointed out to have a bottleneck in the shortage of large-scale, high-quality data, so data collection and utilization strategy will be key.

In this context, when I first spoke with Uchimaru-san, I had expectations for "the possibility of collaboration with NAVER LABS." They're conducting innovative technology development in the 3D Computer Vision field, and we want to effectively leverage these technologies and implement them in actual products.


Yoshida: Physical AI is widely recognized as the next frontier after LLMs. As AI agents are currently in the spotlight, NVIDIA has also positioned physical AI as a next-generation technology, and the development of humanoid robots is rapidly advancing in both the United States and China. Fei-Fei Li, a leading figure in the computer vision field, has also launched an AI startup specializing in spatial recognition. To realize physical AI, which these industry giants are focusing on, 3D spatial recognition is essential. Accurately understanding human work spaces is extremely important, and NAVER LABS is playing a pioneering role in precisely this domain.

In November 2023, they announced a 3D Foundation Model that, from our perspective, rivals GPT, and have since continued to produce groundbreaking algorithms one after another. Going forward, the possibility of sharing knowledge and collaborating with them will become a very valuable partnership for us.


The Type of Person Looking For



Uchimaru: Finally, please tell us about the talent you're currently seeking.


Yoshida: We're currently a company of about 20 people, still quite small in scale. So we're looking for people who can pioneer this field themselves. A question I often ask in interviews is, "Can you continue in this field for 5 or 10 years?" In the physical domain, a long-term commitment is essential—investing in deep tech R&D in parallel with data collection while simultaneously building the business.

I think the daily work is intense. In such an environment, it's important to have people with the passion to take on challenges with a long-term perspective of 5 or 10 years. There will undoubtedly be difficult and challenging situations ahead, but we're looking for people who won't be discouraged, who will overcome them together with us, and who want to work with this team.


Uchimaru: What about job positions?


Yoshida: We're looking for all positions, across the board. Given the current market situation where there are few products utilizing AI in the physical domain, sales talent and business development talent who can identify latent needs and make proposals are essential on the business side. On the technical side, we need AI specialists, people who want to tackle highly challenging technical problems, engineers who can develop with hardware knowledge, and web and mobile app developers—talent who can actually deliver valuable products.

Also, product managers are extremely important. We're achieving product-market fit and are short on talent for growth.


Uchimaru: Truly across the board. In the near future, for example next year, how large would you like to scale the organization?


Nozaki: Regarding organizational scale, we're aiming for a team of 100 people within next year. We currently have 20 full-time employees, but we're planning to expand our structure for the next stage. I'd be happy if people who see this specific number think, "I want to take on this challenge."


Uchimaru: It's good to have concrete numbers! Finally, please give us a summary.


Nozaki: Finally, there's a point I want to emphasize again. Since our founding, we have consistently been a company developing spatial intelligence, and this core focus has never wavered. We started with robot control at construction sites, then expanded to supporting human workers. And what made it accessible to everyone on site is our current product. Going forward, I believe our raison d'être is to solve everything in the physical domain—the physical space and physical work, including robots.

And in Japan, supply capacity shortage has become a serious challenge. Facing problems of labor shortage, aging population, and skill succession, we aim not simply to apply AI to existing technologies, but to fundamentally transform the work itself and reconstruct it as AI-native. With physical AI technology based on spatial intelligence, I strongly want to realize the unmanned operation of construction sites, unmanned site management, and unmanned site work. We are taking on this major challenge of "Re-Industrialization"—reconstructing industry. If there are people who want to take on this challenge together with us, we'd love to work with you!


Uchimaru: Nozaki-san, Yoshida-san, thank you very much for this valuable opportunity!



Recruitment information here: