Hello good afternoon, my name is mark de Costa. Co-Founder of enigma. Enigma’ is a search and discovery platform for public data, but when I speak of public data, what do I mean think about an event like South by Southwest South by Southwest has ripples in across hundreds of public databases. There are containers of swag coming in from China and showing up in u.s.
customs databases. There are a lot of high profile attendees, some of whom come in on private planes and show up in flight plan databases at night. Tens of thousands of people are crowding the bars causing a spike in Texas, alcohol sales, tax, receipt databases, and the list goes on. This kind of public data is valuable because it’s a reflection of the real world and it reveals things that you won’t find by looking at Twitter, hashtags or Foursquare check-ins, but there’s a problem here. While this data is public and while it is out there, it’s not really accessible today, public data comes in all kinds of different shapes and forms everything from restrictive web portals to messy FTP sites to 1970s database formats and everything else in between doing something as simple As a search across all of this, public data is not even possible and what that means is upfront, you’re limited to the sources of data that you know about and what’s more because this data is locked away in silos, it’s impossible to see the relationships and connections Across these public data sets, what public data needs is an infrastructure and that’s what we’re building at enigma at enigma, we’re building a scalable infrastructure for acquiring indexing and searching public data, and now my co-founder Sean will show you what that looks like hey there.
Let’S just switch computers and jump right into the live demo. Why don’t we search for a company like Google, for instance in enigma all right, so what you’re seeing now is all of the results for Google. In an enigma I mean we’re indexing over a hundred thousand data sets dozens of billions of records and Google is showing up. You know in government, filings property records technology, even immigration and there’s been a lot of debate about skilled work and immigration in tech. So, let’s go check out this h-1b visa dataset is put out by the Department of Labor all right.
So what you’re seeing now is each and every single h-1b visa application in 2012 filter to Google, and we can actually ask questions of this data so clicking on the salary column will tell us that Google spending 87 million dollars attracting foreign labor. Now we can do something even deeper right, like distribute all of the salary spend across the actual job titles. These people are being hired for, and now we see that Google is allocating about 60 million towards software engineers and developers, and it’s only when we scroll all the way down that we see that Google is actually only spending four hundred sixty seven thousand dollars recruiting designers. Now it’s not a lot and it’s interesting, but it’s even more interesting when we compare these things right. So, let’s filter this data set to Facebook.
Let’S see how much Facebook is spending total, as twenty million dollars now there’s three times less than Google. Actually, look on the Left they’re spending much more on designers, five hundred eighty thousand – something like that now it could be why my facebook profile keeps changing every three months, but you get the idea. What we do at enigma is really let you find and manipulate the data you’re. Looking for, let’s jump back to those search results, I want to show you that enigma is also about discovering new data sets. You didn’t even know we’re relevant or existed right, so you can see the scrolling.
All of this that you can build a rich portrait of a company like Google check out FCC licenses may be the latest registration for Google Glasses in there or you know, lobbying records see what issues Google is behind and how much money it’s actually committing to them. You know private investments or even map out all of Google’s subsidiaries, but there’s something that caught our attention at enigma. We actually saw Google popping up in this Department of Energy data set and let’s go take a look. Okay, now we’re looking at each and every single electricity contract purchase agreement in the United States and we’re learning that Google is actually operating like a utility buying electricity directly for itself as a registered utility. Let’S see who it’s buying electricity from there’s a lot of wind companies in there and they’re all ranked, and we know that Google has been really pushing towards sustainability in its operations.
But now now we see it play out in data. We can do things with enigma like break down how many megawatts were delivered to which facilities and how Pharr these contracts are hedged out in the future. Let’S stop for a second, I want to show you how we can get to this sort of insight without even being in the webapp right. So what if I were just browsing the Internet and going to the Google green investments web page? We have this browser plug in as soon as you activate it all of the entities on the page just light up companies, people locations – and in this case we can just click on clean power.
Finance which Google invested in and be routed back to everywhere hits an enigma. Let’S go check out. You know, government grants for this specific project, see what sort of money was received and there you have it. That’S just a taste of what we can do with enigma. You know so many companies have pioneered how we analyze the world with data, but what we’re trying to disrupt is something much deeper in the stack right, so fundamental issues of infrastructure and the content itself.
So, let’s switch over to the presentation, have a couple cool things to announce: we’re launching today we’re launching this web app as a subscription service and an API for big data analytics and all of those app developers out there very excited about that. We also want to announce the New York Times. Company has come in as a strategic investor really excited to have also other partners who share this vision. Like the gerson lehrman group, I sent P Capital IQ and the Harvard Business School. You know every day without the pain of actually having to acquire this data and work with it, we’re learning how they’re actually leveraging it in their work and in their thinking and that’s what’s exciting us the most really exposing this world.
That’S been just hidden for way too long, and that going forward, we think, will redefine the structured web and also the limits of public knowledge in general. Thank and this is our CEO gun. This is really cool. Thank you. Bringing transparency to public data should really reduce fraud and waste across the board.
In that, what’s particularly cool, is you matched your pants to mic socks today for the presentation, so clearly you planned ahead. Two questions: one: are you guys planning to open a public API to? Let application developers build on top of this data, a new set of applications, because each most of it is in a text format, but you can imagine visualizing this data as a better way of finding patterns and then, secondly, how are you thinking and pricing this? As you take it to market, the answer to your first question is yes, and the partnership we announced with S & P Capital IQ is actually an example of that so they’re building a highly verticalized product that you know we hope gets to market using our API Feed – and we hope that any developers out there watching this that have ideas of what they can do with our data, you know come come request. It in terms would be open like the Foursquare and Twitter api’s, or is it going to be through a partnering process with you guys likely through a partnering process, at least at first and then as we grow, we’ll we’ll consider and in terms of pricing?
Really, the do you mean for the web app or the API access to the data? What, in whatever form you guys might? Okay, so access to the web application will be a subscription service, we’re happy to grant anyone a free trial and it’s really customized pricing based based on the customer. You know we’re targeting professional services and academia, so finance consulting academia first with our outbound efforts, but we’re happy to have as many people as possible signing up. Can you give us a sense for how hard it is to pull this together?
It looks deceptively easy, but I’m guessing it’s not so yeah. The data acquisition is really quite a diverse in mixed bag. In a lot of ways I mean it it. I can give you a specific example, which is customs data, for instance. I mentioned it in the introduction, so the US Department of Customs every day publishes a CD of all of the bills, Lading of the containers that came into country the day before the only way to get that data is by actually physically receiving delivery of a CD And having a set of processes to then you know have that loaded on term computer parsed and uploaded into the system.
Of course, with the – and this is something that’s been really quite positive for us – is the general trend towards open government governance initiatives and transparency has put a lot of data out there, which has been really wonderful for us, but it’s it’s it’s it’s quite a mixed. Dramatic all the same, and so how do you? How do you maintain your edge over time, no Sophie? If the next group of you know smart engineers, sending an audience, is that’s a good idea, you guys are going to charge a lot for this. We’Ll do this too.
You know assuming that they’re, just as capable as you are, what enables you to stay ahead of the race. So I think that’s a super interesting question. I mean I hope, people who take a closer peek into us would kind of see. You know our edges in the way we work right. So one of the first things we did, which actually fired the general service administration for every top-level gov domain that was operated and sent crawlers out there kind of looking for the meaty files in the media, databases and what portals could be reverse engineered and so on.
And so forth, so really diving into this domain. Not only with you know the workhorse attitude, if I’m going to grab everything we kind of planning it out, seeing what the field is knowing the datasets developing rapport with people at the census. You know all of these things go into organizing information and so really architecting. That, i think, is you know our most competitive advantage right now. As you know, a team and a group of people – and I think, going forward – we have to rely on you – know technology and vision to get us there, but just to build on that a little bit.
You know we built a suite of like generic parsing tools. All these internal tools and as times gone on we’ve measured that our marginal cost of bringing in a new data set is dramatically lower than we started and we think there’s a huge learning curve that, were you know any of our competitors were taken. That being said, you know once we’re linked in with applications and on the desks of professionals. The switching costs for them are quite quite significant. Does the data improve with use in any way to rule us question like?
Can the community link certain data sources together across reference, improve it so that your data will be superior to all of its absolutely so we’ve got a tool coming out called the lift tool. Where you can, you know, take a you know, column of one data set match it up with another. Have another custom data set we’re also going to allow our premium customers to load in their own data, sets and cross-reference it. So you could take your customer list or whatever it may be, and filters say: New York, City, property, tax assessments by that customer list and see which ones of your customers own apartments in New York or whatever, whatever the use case may be. How did you come up with the idea to sort of come up from your research or from work or so yeah so forth?
Each home and I started the company two years ago and we were about independently working on different projects that revealed to us the problems of public data availability. So Yamla is working as a currency trader and I was building a suite of interactive cartography applications around climate. Climatological modeling and in both of those contexts we wanted to go out and augment and supplement what we were doing with public domain information but realized. We were really getting so caught up and tripping over ourselves to just find the data that we were looking for and get it into a usable form that at that point we really saw that there was this huge need in the market and decided to found enigma. Have you had any indication at google itself as trying to get into this?
I mean their stated. Vision is not very different from what you guys are trying to do in a sense of so probably Google does the public data tonight, and I encourage anyone to check it out as a comparison. They have a very limited resource of data in comparison to to what we have. I think it’s it’s been somewhat mothballed, but they can comment on it. I don’t know if Mark has anything here.
You know one big difference in what you’re seeing in the sort of applications is that you know, Google has really committed itself to you, know web pages, unstructured texts and these sort of things, then everything that comes in to enigma is actually structured either by us or You know by the source and what you can do with that is really a completely different range of things that what you can do you know with Google, so you can be routed to places with Google enigma. You can actually like interact in sight. You know from this raw data, you can, you know some things off. Average them distribute them, connect dots across all of these different things that are actually siloed in the real world, so that you can do that with Google with enigma there’s like infrastructural differences that I think Google, as a company, has an approach directly. What about Bloomberg?
I mean Bloomberg’s, you know, job is to sell trade related information right, I mean that’s the core of their business. They know people are trying to move into this space. Like you guys, I mean what are the things? Do those – and we think, really really differentiates ourselves and Bloomberg – did have a Bloomberg government product which was targeted really at lobbyists, is that we don’t require our users to identify the source. So, even when you’re using Bloomberg and you as a former finance guy, you know type in Google equity HDS to get the holders.
You first are identifying what specifically you’re looking for and then they provide you with that knowledge and one of the things that we thought was a problem was that so many of our customers don’t even know. What’S out there don’t know what to search for and so abstracting in that way, and just saying show me everything that’s available on Google is something that we think is quite powerful and and really a step away from the way that Bloomberg provides information. That being said, you know they have an amazing business and it’s something that we, you know, aspire to be in the same league as how much money have you raised to get to this point? How much money have you raised so far? We’Ve raised one point: four or five million, so you you mentioned New York Times as a strategic investor.
We had a history with Union Square if anybody’s here with indeed – and they were one of the founding in, but there’s a difference between founding and strategic – I think meaning early investor somebody who’s actually contributing something other than their capital. So why’d you use the description of strategic times, yeah I’ll comment on that I mean, I think really. It was just that you know they’re, actually beta user of enigma. The newsroom is – and you know we provide our service to a lot of news organizations just to get it out there and try and have have things cited and they were interested in the company and we had a financing round open quite recently actually and they asked If they could participate – and we were happy to have them on board beyond that – there’s nothing too significant to announce. So this might seem like an unusual question coming from a venture investor.
But do you think the ultimate incarnation of this platform is a for-profit business or more of a public good, something that is out available for the community to continue contributing data to, and you know, exposing fraud, waste patterns and government data? I think it’s a wonderful question. I mean, I think, looking if you look into the far future, I mean certainly decades from now. I think all of this public data will be graspable by everyone. Like you know, the same sort of levels of obscurity and obfuscation that exists now sort of across all of this public data will evaporate and be, I think, a thing that technology will come in and fix.
You know that said, I think, there’s a lot of hurdles to get through between now and that point in the future, but I think it’s something that we can look forward to so this may have been. I don’t know if this was discussed, but how do your costs increase over time? Are they are they going to increase like Google’s in terms of capital, expenditures, servers and whatnot, or is there some advantages there? Uh yeah, I think, technologically, as you said from the server’s, you can expect a very similar. You know activity.
I do think you know, and Jeremy can speak to this right after that we will have a bit more to do in terms of you know. Taking the company from a very enterprise company with you know, sales and these sort of things all the way down to you know more consumer products like people like Siri, plugging into enigma. You know that’s kind of where we have our first interface with you know. Millions of people so that so I think that’s an exciting challenge and in many ways can anticipate that sort of that sort of growth in costs. Yeah.
Do you guys give any indication of confidence or completeness of the data? If I were a order – and I were using this to say – like the example you have with Facebook and Google I’d be concerned – that maybe you’re you’ve just got a subset of the data as far as how much they’re spending and – and it sounds like you get A diverse array of data sets, each of which may have gotchas there. So do you show me like here’s? One thing you got to be aware of, or do you have a way to you know be a smiley face if looks like you’re good to go so yeah. I think that’s a great question in there’s.
There’S a couple of answers to provide to it. The first one is one thing that we’re very focused on and careful about is always linking all of this data back to the source from which we we provisioned it. So there is that sort of one-to-one mapping between what you’ll find in enigma and what’s sort of out there in the world part of the way we do that is organizing all of these sources of data in essentially a graph of where they come from in the World, so this wasn’t something you got to see in the demo, but there’s there’s a whole sort of browse piece to enigma. Where you could say go down and say, I would like to see all of the data that you have from New York State or from Nevada or California, or what have you, and that would certainly empower one of our users to get a good sense of the Map of the data that we have and how their results are relating to it. That said, there’s also a lot a lot of interesting points of cross validation in the data.
So you could, for instance, see something in a financial report from FedEx about how big their their fleet of airplanes is and then go and see that cross-referenced in a in a registry of airplanes with the FAA and see that these things do sort of start to Cross tabulate as well. Well so we’re out of time! Thank you guys very much. It is you