Archive - U.S. Agency for International Development

Videos

Speeches Shim

USAID Digi-Know Webinar Series: Machine Learning and AI: What Development Practitioners Need to Know

Machine Learning and Artificial Intelligence (AI) are often discussed as buzzworthy technologies. This webinar provides an overview of these technologies and explores how they can be used in development.

00:53:58

Video Transcript

In the lower left-hand part of your screen. Throughout the webinar, you can type questions and comments in “Chat” and our speakers will do their best to respond to as many as they can. We will have a Q&A; period after each presenter speaks. Also, feel free to share ideas, suggestions and resources with your fellow webinar participants in the chat area. I will hand the microphone over to Craig now and let him get started. [Craig] So hi everybody this is Craig. I want to start with a summary of what we're hoping to talk about in the next hour. So first, as Gail already did, we'll introduce ourselves and we'll let you know a little bit about our team's approach to emerging technologies and development. That approach is based around asking four questions. So first in this case, “What are machine learning and artificial intelligence?” This isn't going to be a deep technical explanation but hopefully we'll be able to give you enough context to let you know what it is that we're talking about. Next, “What are some current applications in development?” We'll talk about a few in detail but there are many more examples in our report which you'll hear more about at the end of the webinar. What are the implications of machine learning and artificial intelligence in development? These tools have the potential to do a lot of good but we need to be aware of their potential downsides as well. Finally, what are the questions that we should be asking? We believe that in the next few years, machine learning and AI will find their way into a larger number of development projects, so if you are managing or funding one of these projects there are some key things that you can do to make sure that your project stays on track. So we'll start with the first question: What are machine learning and artificial intelligence? This is a satellite photo of a neighborhood in Jakarta, Indonesia, and to explain a little bit about how machine learning works, imagine with me for a moment that you are working in the Indonesia Mission and you are developing a new urban strategy. You would like to know where in the city there are lots of cars so that you can get a sense of where there are problems with congestion and traffic accidents. If you're trying to map the entire city you have hundreds, probably even thousands, of images like this one. So how are you going to count the cars? One way would be to just count them. This is really simple: you just stand there with a highlighter and you circle all of the cars. The problem is that it will take forever and you will probably go crazy before you finish. The second way to do it is to get someone else to do it for you, like an intern or a volunteer. This is sort of cruel because all of the bad things that would happen to you are now going to happen to that person. The preferred method to do this is to get a computer to do it for you, and there are a couple of ways to do this. One is to come up with some rules for how the computer should spot cars. Cars are sort of rectangular; they have windshields; they tend to be located on a dark paved background. You can write a program with all of these rules and then tell the computer to go out and find the cars. This sounds like an awful lot of work and you're going to spend a lot of time fine-tuning the rules to make sure that they're exactly right. The last option is to do the machine learning approach. First you have to find some cars yourself, or get an intern to do it, but then you give those known car locations to a computer and you let the computer generate the rules for finding more cars. So this is another way of illustrating the difference between the traditional approach and machine learning. The traditional approach is on the left: you start by coming up with rules, and getting the rules right depends on your creativity and your subject matter knowledge. You will then apply those rules to imagery and you'll get car locations. In the machine learning approach on the right you start with two different kinds of data: one of those is imagery and the other one is a set of known locations. A computer will then take those two data sets as input and generate a set of rules that you can then use to go out and find more car locations. You can take those rules and apply them to as much new imagery data as you want. So this is the same idea but we're going to put it in more general terms. In the traditional approach you start with a model that's based on your own knowledge and you apply it to data to get an output. In the machine learning approach, you start with input data and some known outputs. With those two things you can get a model that comes from data instead of just from your own knowledge and insight. So I've been talking about data a lot, and all machine learning and AI systems are built on data, and data can come in a variety of different forms. That's going to include images, text, audio and numeric data. Machine learning algorithms are there to search for patterns in large data sets and they then use these patterns to make predictions about new data. So as sort of a shorthand you can think of machine learning as being data-driven predictions. Artificial intelligence systems then use these data-driven predictions to make, do or plan something. Sometimes AI systems will act directly based on automated decisions. One example would be robotics. In other cases AI systems make suggestions to a human decision-maker, like with product recommendations and online shopping. For shorthand you can think of AI as being smart automation. So now that we've set the stage a little we can get to the next question: What are machine learning and AI being used for? I'll give a few examples of different types of machine learning systems to help you get a sense of what's being done. So in classification problems a machine learning algorithm looks at many examples of something in order to learn how that thing should be assigned to categories. Supervised machine learning, which is a term you'll hear us use a bit here, that requires training data that has been labeled by a human curator. The example I gave at the beginning of finding cars in a satellite image would be an example of supervised machine learning. As another example, you might hire a person to classify thousands of images as being either dogs or fried chicken and train an algorithm to distinguish between dogs and fried chicken in the future. So this is a maybe somewhat more serious example of how machine learning can quickly filter and sort vast amounts of information. It's a system that is designed to collect reports of damage or offers for help after a natural disaster. The algorithm’s job is to direct incoming tweets to the people who will be able to use the information. This required human curators to manually label tweets according to the type of content that they contain. The algorithm then learns to associate certain words phrases or patterns with different resource needs. This allows it to prioritize information for human decision-makers. So this figure here, there's a lot going on here, but this comes from an algorithm that analyzes social media posts and classifies them by their content. Human curators had labeled snippets of text as expressing joy, sadness, disgust, anger, surprise or fear. The computer then learns to associate certain words, phrases or patterns with these categories. So to make this plot here, I collected tweets that measured that- mentioned former South President—sorry, I collected tweets that mentioned former South African President Jacob Zuma during February 2018, which was of course a big month for South African politics. I picked out a few examples of specific tweets to show how algorithms like this sometimes do very well. So for example the tweet at the top comes from someone who goes by Ms. Bucks and was identified as expressing joy. She says here “Zuma resigns, it rains, ancestors are happy #blessings” So I think the computer got it about right. She sounds pretty joyful to me. The one on the--let's see the one on the left is coming from someone who goes by Katiego Maseng and says, “Can Zuma stop reading his long speech and just resign?” And has a frowning face. The algorithm labelled this one as expressing anger and that also seems about right to me. The one just above that was coming from somebody who goes by Lazy Aiiz and it was tagged as expressing sadness. And this person wrote, “Zuma broke up with us on Valentine’s Day.” And then there's some crying emoji and it says “We'll meet somewhere” and finally a breaking heart. Now I can't know for sure, but I don't think that Mr. or Miss Lazy Aiiz was really expressing sadness. This was probably sarcasm and computers are still really bad at sarcasm. So systems like this usually work all right but they still have their flaws. So in the last three examples the goal was to automate decisions of some kind using a computer. Sometimes we have a different goal: we want the computer to find patterns that will be useful for human choices. Researchers in Colombia have worked with farmers to develop machine learning models that can predict yields based on things like weather and farming techniques. Their goal isn't really to predict yields, though, they use their model to make recommendations about how farmers can improve their yields, especially under changing climate conditions. This is a great example of a form of a partnership where farmers benefit from a machine learning model even without using it directly. One example of using machine learning to make predictions is the Harambee Youth Employment Accelerator. Harambee is a company in South Africa that helps young people who are at risk for unemployment get their first jobs. Part of their approach is to collect data on each applicant’s skills and personality and match them with a job where they're most likely to succeed. Harambee is exploring machine learning as a tool to get faster, more precise matches for a larger number of applicants. So you'll see at the bottom of the slide there's a link to a video. We won't show you that now because there's some bandwidth limitations but the slides will be available after the webinar and you can take a look at the video then. In addition to analyzing data we can also use AI systems to generate content. One example is chatbots, which can be designed to analyze text inputs and respond to them with authentic-sounding text so that it seems like you're actually having a conversation with the computer. So several startup companies have launched AI-driven mental health apps that allow users to talk through depression, anxiety or fear with a computer rather than with a therapist. One group’s website says that their goal is to deliver affordable, on-demand and quality mental wellness for everyone using psychological artificial intelligence. In some ways this is a really great application. Mental health services are often lacking in developing countries or in humanitarian crises and this could be a way of providing care to more people than have ever had access before. Some trial deployments have found that talking to a computer instead of a person removes some of the stigma that exists in some cultures around seeking mental health assistance. It also raises some really troubling questions: So could this chatbot, for example, be collecting personal information about vulnerable people? Or if someone confesses that they're planning on hurting themselves or someone else, do the system’s operators have a responsibility to inform the authorities? We'll probably be seeing more examples like this going forward and will need to be prepared for some uncertainty and ambiguity. I'm not going to go into a ton of detail on these applications. You can find more information about all of these in our new report on machine learning, AI and development, and we’ll share the link for that later in the webinar. I do want to say a little about a couple of them, though. So this one here is called Grillo and it's Spanish for “cricket” and it's the name of a USAID-funded company that is building an earthquake early warning system in Mexico. They're using a large network of inexpensive home-based sensors and using machine learning algorithms to integrate the noisy data that's coming from all of these sensors. And their goal is to be able to warn people up to two minutes before an earthquake strikes, which is enough time to take shelter and really reduce your chances of being hurt in an earthquake. The next example is--so that box isn't quite in the right place--but the next example is algorithmic credit scoring. A lot of people in developing countries don't have a formal credit history that would enable them to access credit and companies like Branch, Tala and EFL use alternative sources of data, things like social media or mobile phone usage, to predict how likely people are to repay their loans. Even if you've never had a bank account before your mobile phone metadata might show that you commute to work every day and call your mother every Sunday afternoon and those kinds of things correlate really strongly with loan repayment. So before I hand the microphone over to Aubra for the second half of the presentation, we want to pause a little bit for questions. There are a couple that my co- facilitators have already bounced over to me. One of these is coming from Mahesh who asks, “Are machine learning and AI used synonymously?” So that is that is a more complicated question than it might seem. There is a lot of fuzziness and a lot of uncertainty in the way that those terms are currently being used. I think the best way to think about it is to think of artificial intelligence as being a field of study, like chemistry or biology, that is focused on how to make computer systems that act in a way that seems intelligent. And being intelligent requires a lot of different things, but one of those is the ability to learn. And that's what machine learning does: machine learning is about teaching computers to learn from what's happened in the past, to look at examples of things, and make predictions about what's going to happen in the future based on that. There are other aspects of intelligence that are things like memory or attention or creativity and getting a computer to do those things falls under the broader umbrella of artificial intelligence. So in general machine learning is part of artificial intelligence, but artificial intelligence involves a lot more than just machine learning. We also have a question from Brian Banks, “What types of support can USAID provide to groups that are interested in piloting a machine learning approach to development challenges?” So the first type of support that's available is what we're beginning to do now with this webinar and with this report that has just gone public this morning, actually, that you'll be seeing the link for later, is we're trying to provide some guidance on what kinds of questions you can ask and how you can engage with your partners to make sure that your machine learning projects are actually serving development goals. And Aubra is going to talk a lot more about that. I think what what type of technical assistance support we're able to provide going forward is a much bigger question that's going to evolve with the whole USAID transformation process but hopefully that's something that we'll be able to do more of in the future than we have thus far. Follow-up question from Brian: What is USAID doing to support development of large machine-readable data sets that can be used as a foundation for machine learning even from within the USAID portfolio? So you might be familiar with the, with the development data library and with this mandate that USAID has that datasets that are generated through USAID projects should be made machine readable and open-source unless there's a compelling reason not to do so. A lot of the datasets in the development data library aren't really large enough or detailed enough for machine learning work, but a few of them are. I can think of things like the demographic and health surveys for example that are a really great resource. We also have in the GeoCenter in the Global Development Lab a partnership with NGA--the National Geospatial Intelligence Agency--to provide access to, for our Missions to use satellite imagery data that would be, can be very useful in machine learning contexts. So there are a few things underway but you, you may have to be a little bit more adventurous and go outside Agency resources to find the best data sets. It's a question from James Verdun--nope these questions are coming in so fast that I couldn't read that one. James Verdun: Early warning systems for famine are cited as a real-world application. In what countries has this been done? So my understanding is that the early famine early warning systems have leveraged data that were generated by the famine early warning system network or FEWSNET. I don't know off the top of my head all the countries that FEWSNET works and I know they've been involved in some of the recent famines across the Sahel in the Horn of Africa, for example, but I can't give you a lot more detail than that. This is a question from Joshua Machleder: How do we account for bad signal data that may lead to faulty artificial intelligence? So this is, this is something that Aubra is going to talk a little bit more about, but you have to be very skeptical sometimes and you have to test things really thoroughly to make sure that they're going to work the way you think they are. And if you have sensors that are putting bad data into a model, you are going to get bad predictions out of your model, and there's not a lot of ways to fix that. The best way to test whether this is actually happening is to make sure that once you've built your model you check the model’s predictions against what's happening in the real world. And so you often have, after the model development phase, you'll have this period of what people sometimes call online testing where you're checking to see whether your model is performing the way that you think it is before you put a lot of trust in what it's doing. But model testing and accuracy is a whole big complicated subject that I'd be happy to talk about in more length. So okay I'm getting a--how much more time do I have for questions? Eight minutes! Wow, okay, cool. So this is a question from Shawnee; she mentioned Grillo: “It's interesting that this is famine and earthquakes-- those are vastly different. The causal indicators for famine would be, I think, much more complicated and subjective than earthquakes.” No, you're probably right. I think the biggest difference between those two applications is probably speed, that you know you can see, see a famine coming several weeks out or even longer. An earthquake you're going to have at most a couple of minutes worth of data to prepare for it. The one interesting point that this brings up though is that often the learning algorithms themselves, the things that are sort of the gears in the middle of the AI machine, are fairly general purpose, and so you can apply them to different types of data to create models. And there are a fairly small number of algorithms that are used for this. Sometimes people will make some pretty serious mistakes by applying algorithms to things without thinking very carefully about the type of data that they're getting, but often learning is learning, and to some extent these things are generalizable. We've got a question, we don't have a name on this one: Did you did you see an example of how we can apply machine learning and AI in education in the area of development? What immediately comes to mind is I remember I've seen presentations from some of these online learning platforms like Coursera or EDX or some of these others where they have people taking courses in an online environment and as a result they're able to generate a ton of data about their students. And so they know which videos people watch more than once, they know which quiz questions people tend to get wrong, and things like that, and they can use that to fine-tune the presentation of material in future iterations. I think in terms of sort of traditional classroom presentation of education, that can be really challenging because we often don't have very much data about student behaviors or about teacher behaviors or about what's going on. And any artificial intelligence or machine learning application is going to depend a lot on having large volumes of detailed data about the thing that you're interested in. I'm sure there's a lot more that can be said about machine learning in education but that's what immediately comes to mind there. Okay so I'm getting chat messages that we should move on and wrap up. I will note that we've seen some questions come in regarding ethics and this is where Aubra is going to be focusing so I'll turn things over to her now. [Aubra speaking] All right, hi everyone, can you hear me okay? Okay, I’m getting nods from my colleagues, that’s great. So, as Craig mentioned, we’re now going to move on to a slightly different focus of the webinar. And so we’re going to be shifting gears to consider the third question on our list, looking at some of the implications of machine learning in development. So far we’ve walked through a lot of the positive potential, talking about how machine learning can help us discover new relationships or better target our interventions in a more directed and data-driven way. But there are a lot of ways that things can go wrong. When we begin to lean more heavily on machine learning and AI and our programming we need to be aware of how machine learning can actually lead to unfair exclusion or unfair targeting of people or how it can undermine accountability. I'll go through a few examples of what I mean by that now. So this is an image of a grad student at MIT, Joy Buolamwini, who discovers that the majority of commercially available facial recognition software had trouble recognizing dark-skinned faces. Sometimes it failed altogether unless she donned this white mask shown in the photo at which point it has no trouble recognizing that face. The software clearly wasn't explicitly programmed to do this. It's an algorithm that has been trained on a stock set of faces that all happened to be white. The result is an algorithm that performs poorly, or not at all, for people who don't fit the mold that corresponds to the training dataset. With these tools we can unintentionally create an unfair exclusion of particular groups. Through use of non-representative training data we can introduce bias into our work. For example, if a facial recognition algorithm has trained only on light-skinned faces, then it will be less accurate at recognizing faces with darker skin. Possible solutions to this problem exist, and they include making sure that our training data are representative of the entire population on which the algorithm will be used, or, increasing diversity on the team that developed the machine learning models so that more perspectives can help shape how the algorithm is tested. This is an example that’s based on Google Translate. The sentences that you see here are written in Turkish. Now I don't speak Turkish, but I've been told that the language doesn't distinguish between male and female pronouns. Everybody is just “O” rather than being “he” or “she.” When you translate these sentences into English, Google Translate doesn't use gender-neutral pronouns anymore. It makes the choice about whether the sentences refer to a male or a female. The choice that it makes matches traditional gender stereotypes, so men are doctors and engineers, and women are nurses and cooks. Again this isn't because someone at Google sat down and decided to write a sexist translation algorithm. It's because Google Translate was trained on human-generated text on the Internet. The algorithm learns language from humans, and we use language in a way that has these gender biases. This is a really hard problem to fix, because it's not clear what Google's responsibility really is. Should we look to them to build in some correction to even out this bias? Or should we celebrate their ability to mimic human language so well that they even get our biases right? There's really no simple answer in situations like this. Bias can come from patterns in the training data that reflect existing inequality, even if we don't tell it to, an algorithm will successfully learn to discriminate from our example. As we think about how some of these biases can play out in our work, it's important to remember that machine learning systems always evaluate new data based on what they've seen in the past. This means that they often struggle with situations unlike those they’ve seen before. Take for example the reality that many employers are beginning to use algorithms to screen job applications. These algorithms are trained on previous applications and information about which applicants were successful employees in the past. You can imagine that if for any reason a hiring process has historically been biased against women or any other minority or excluded class, then the related applicant selection algorithm will learn to reproduce the bias of past hiring managers. In general we need to question how the training data we have access to might encode social inequities that we should be striving to change. With another example, take the task of prediction of crime. A police department in a major U.S. city used a machine learning algorithm to predict where crimes were likely to occur so that they could send police officers to the right neighborhoods. Now to build a supervised machine learning model we need to be able to measure the thing we want to predict. Sometimes this is difficult, dangerous, expensive or even impossible. When this is the case we often rely on proxies. It is very difficult to measure crime because many criminals are not caught. Instead the police department in this particular case chose to use arrests as a proxy for crime. A proxy is different from the quantity we care about, but we can use it as a substitute. This map shows the distribution of drug-related arrests in the city. Now while the map on the left shows the density of drug arrests, the map on the right shows an estimate of where drugs were actually used, based on more neutral public health data. Clearly the geography of drug arrests is very different from the geography of actual drug use. This is because arrests are not a perfect proxy for crime. Arrests only happened when you have crime and police officers in the same place. Neighborhoods with a heavy police presence will have a higher arrest rate, even if their crime rate is similar to elsewhere. In the U.S. this is often true of poor and minority neighborhoods. An algorithm using driver acts as a proxy for drug crime would ultimately dispatch even more officers to these heavily policed neighborhoods, leading to an even higher arrest rate and ultimately a runaway feedback loop. In short, using the wrong proxies and algorithm development can ultimately cause us to predict the wrong thing which in this example leads to unfair targeting that can reinforce the inequity. Another danger of machine learning system is that they can make it harder to understand how a decision was made. This can be frustrating for people affected by decision, especially if they believe that mistakes were made. It can also lead the owners of automated systems to feel less accountable for what is happening. This opacity comes from three major sources. The first is proprietary interests. Sometimes companies or governments have a legitimate need to keep things secret. Companies may want to protect trade secrets. Regulators may be worried that if they reveal the details of an algorithm, it will be easier for people to game the system. The second is technical illiteracy. Machine learning algorithms are really complicated and not everyone has the ability or interest to really understand what's going on. The third reason is perhaps the most interesting one and it has to do with how machine learning algorithms work. To see what we mean by this I'll need to walk you through this figure on the right, which is the schematic how one might project an airfare price. The diagram that you see on the right depicts a deep learning algorithm. You have four inputs, origin airport, destination airport, departure date, and airline that are being used to predict the price. The interesting thing is what happens in between. You have five layers of intermediate steps and each layer contains hidden variables that are calculated from the previous layer. So the algorithm combines the four inputs to get the four variables in the first layer. It then combines those to get the five variables in the second layer and so on. The black lines represent parameters that control how variables when the different layers are combined. In this system we have 22 variables and 105 connections. Real systems can be much bigger with more inset variables and more layers. Hopefully you can see how this makes it difficult to interpret how input variables ultimately combine to generate a price prediction. There are a lot of possibile inputs. Even with only four variables, there are hundreds of possible airports and dozens of Airlines involved. There are also a lot of parameters. With 105 connecting lines in this example, it's impossible to really interpret each one. Finally many machine learning algorithms involve pre-processing steps. In this case, we might use the departure date to create variables for the day of the week, season and proximity to holidays. Pre-processing adds more complexity and can make models more challenging to interpret. The bottom line is that you need to think about who might be interested in interpreting the outputs for your model. Sometimes no one will care; in other cases your model might be making decisions that affect individual people and they have a right to understand how those decisions are made. Simpler models might be less accurate, but sometimes being able to understand the output is worth the sacrifice. Okay the last risk we'll flag is premature. Automation so this point is just to reinforce that machine learning systems can be impressive but they don't always work as advertised. These examples come from a system that analyzes photographs and puts captions on them. It's actually really impressive and usually its outputs are pretty good. Sometimes though it completely misses the point. The picture on the left does have a person riding a horse, the one in the middle does contain an airplane, and the one on the right has some people and a beach. In each case though, the computer is correctly labeling objects but failing to understand what makes those objects important in the scene. That kind of context sensitivity requires a lot of common sense and humans are much better at this than computers. So on to the last question. Given all of these implications, what questions should we be asking when we encounter this emerging technology in our work? Here, rather than going through the full list of questions to consider outlined in our report that Craig mentioned, we wanted to highlight what we see is the most important question: How can we engage with AI and machine learning in a way that amplifies the good and minimizes the bad? Unfortunately because of the highly technical nature of this topic, it's common for development professionals to want to step back and take a hands-off approach to the development and application of machine learning tools to leave the quote technical work to the quote technical folks. But our main message here is that we development professionals actually have a role to play and we have a unique responsibility to play that role. It's really up to us to ensure that as these tools are brought to bear on development challenges we can ensure that there is someone in our mix who can advocate for development problems, push for leveraging local expertise, speak up for context, and work collaboratively with model developers to critically assess tools within users in mind. The first action recommendation is that we advocate for our specific development problem, that we not let the silver bullet of machine learning outshine the target it's meant for. Put another way, it's important to be rooting not as much for the technology, as much as for the problem you're trying to solve. This will require that we consider things like the suitability of machine learning for the problem at hand, asking questions like "Is my problem actually a good fit for machine learn learning?" For example, do I need to predict something that is objective and easily measured? Do we have access to the data that is needed to build a model that addresses the problem, or given the proxies that we'll be using, how well aligned are they with the actual features of my development problems that I need to model? Also importantly, how is my problem currently being addressed? What's the status quo and how will the introduction of machine learning change that status quo for better or worse? What near dependencies might I be introducing with shift over to machine like learning backed tools? The second recommendation they list is to leverage local expertise as much as possible. If you can't rely on local machine learning experts to develop the tools themselves, you can at least ensure that developers are consulting local subject-matter and context experts as data are collected, as models are built, and as tools are integrated into practice. It also means considering how we can build in feedback processes that incorporate local testing or local validation models. Third, it's important to speak up for context. Development practitioners and local partners may be more likely to be steeped in the context of a development challenge than the people designing the machine learning tools. Bring that perspective into the mix. Consider whether the proxies that are being used in the model are appropriate or neutral proxies given the local context. It could be that using information about taxed income is a solid proxy for household wealth or it could be that in your context, many people earn income informally and therefore wouldn't be well represented by that property. If there are minority versus majority power dynamics in your context, you should be sensitive to how that might manifest in the data your algorithms feed on. Who is How can you ensure fairness of and is not represented? predictions across different groups? Issues are choices that may seem straightforward or inconsequential to the algorithm developer can almost always benefit from having a dose of local context thrown into the discussion. Lastly, machine learning is in some ways no different than other technologies. We must constantly be considering how the use of these tools will affect the end users. We should be proactive about questioning how a machine learning model performs for different groups. Does it fail more often for some populations than for others? If it failed what are the consequences? Who is harmed? How will those affected by the machine learning tool be allowed to feedback into the tools development? We all have a responsibility to ensure that these questions are considered as we begin to embrace, embrace the potential offered by machine learning. This is all included in our recently published report "Reflecting the Past, Shaping the Future: Making AI Work for International Development". Because they see so much potential in this space for good, we want to encourage our colleagues and development partners to engage on this topic to make sure we get it right. To that end there, the report explorers promising uses of machine learning and development as well as the fundamental issues around fairness and equality that we've just walked through. It also offers guidance on how we can ensure that there are appropriate safeguards so that we can use this technology in a responsible and equitable way. As we've mentioned several times now the report is publicly available online today - hot off the press. The URL is listed in a box to the bottom right I believe. We also want to flag that there will be a shorter compendium to the report which is coming online in the next few weeks. We're also engaged in ongoing research with partners at MIT through the higher education solutions network, where a group is working together to develop guidance on how to address some of the technical trade-offs that must be made as we work towards more fair and equitable use of AI in international development. They will be doing some dedicated capacity-building around the topic, which will ideally be made publicly available in the form of online training resources and potentially workshops as well. Lastly we're hosting a training in November, the 14th through the 16th, at our regional mission in Bangkok. For those of you who are internal to USAID it should be available through USAID University. We'd love to see all of yall on the class roster for that, if you're interested to stay involved in the conversation. And I believe with that well open it up for Q&A.; So I see a few coming in. So first, is there an Agency-wide effort to accompdate these type of technologies in our internal IT infrastructure. What is the role of the CIO office we begin to explore and pilot both internal systems and in our programs worldwide? That's a great question and I think it's something we have had only a small number of conversations with our CIO's office about so far, but I know that there's a strong interest in leveraging the potential of these technologies for internal purposes. I think they would be better positioned to speak to what they're working on right now, but I know that machine learning is an area of active interest from the CIO's perspective. Second question from Michael, "my question is how easily can data be used in low-income countries with resources constraint as most of this data might be paper based?" That's a really good question. In our report we addess this to some extent. A lot of what we see is that some of the data sources that are being turned to in development are those that are already digital. So we are limited in our ability to follow information in the digital format because a lot of what we do is based on pen and paper so what happens is you see a lot of the applications turning to the typical sources of call detail records, so mobile phone metadata satellite imagery, there's an increasing push for electronic health record so there's some extent that's actually being used in the world And also there is a good amount of household survey data that's available online, and so the trends that you see are that those are sort of a common data sources that get turned to for machine learning development, as well as social media data with Craig alluded to earlier. So I think because those are the more readily available data sources in the context we were saying and are already in digital form, we tend to turn to those before we turn to what might be a richer data set and so there is a consequence to that in that you're only sampling the population engaged digitally for example with social media data sources if you're relying on social media to build your machine learning algorithm, your algorithm is going to be informed by those people who opted into social media platforms, and in a lot of the countries in which we work those types of users have different characteristics than the most impoverished, or the most vulnerable or excluded, and so as we start to explore with these already digital data sources, we need to be very aware that those are the limitations of the data sets that we currently have access to in this context. So let's see another question comes from Mike V.: Algorithms are still often better than humans regarding bias. Changing the behavior of a biased algorithm is fairly easy compared to changing the perspectives, behaviors, and stereotypes of humans. So Mike that's a great point - that something that we've debated a lot internally as we've gone through this research. I think it's important not to let perfect be the enemy of the good, and there are quite a few contexts where machine learning actually offers a more neutral perspective than a human would. The concern that seems to arise as we consider the work, the use of machine learning in our work, is that machine learning offers a potential for scale that is not possible when a systems are human beings. So for example if you have a decision-making system that hinges on a biased person, that biased person is likely only going to encounter, you know a certain number of people in their decision-making day-to-day. If you develop a tool that is biased and deployed that at scale, you have introduced more bias than you ever would have with that single biased person, and so that's a unique concern when we consider how the implications of bias and machine learning might propogate. The other thing that's worth considering is that as humans make mistakes, we worked very hard to create institutional accountability and different contexts, and I think it's fair to say that we don't have the same type of institutional accountability in place for machines and algorithmic decision making, and so that's something that we see as a pressing need, is that as we turn to these algorithms for decision making that we don't essentially let them off with a clean slate or get them off with clean bill whatever the metaphor is. Lastly I think it's important to know with these types of tools, there is an implicit trust that comes along with this technology. And so people might be more willing to question the humans judgment than they are to question the prediction spit out by a blackbox algorithm and that's just human nature and I can get something that causes us to believe that we need to be more willing to engage in to question and to peer under the hood if you will to better understand exactly what goes into decision making in the world of machine learning algorithm. Okay so moving on another question from Rajesh Sharma: Which group is leading the technical implementation of AI and machine learning at USAID. So this is a great question I think, one of the really exciting things that we discovered over the course of our research of the past year as we were looking into this topic, and carrying out interviews with folks in the Agency and outside of the Agency is that there is a lot of activity, and it's not confined to one sector or one part of USAID. There is a lot of activity going on with our colleagues in Global Health - we found a few very interesting projects with the recent ZIKA Grand Challenge that was launched a year or two ago; we've seen a lot of activity from our Bureau for Food Security colleagues who are looking into machine learning to help with monitoring and evaluation efforts; we've seen a lot come in through our Development Innovation Ventures Award needs, and I think it's just a testament to the promise that machine learning offers in this space. There's a lot of excitement around leveraging these tools across the development spectrum. I don't think that we are at a point yet as an Agency where there is one group that is leading the technical implementation of this at USAID. I do believe that that might be the last question that has come in unless there are others that I'm missing. Oh! I see so there's one from Graham: How machine learning deals with the multiple and often minority languages around the world. So this is a great question one of the challenges that machine learning faces is that many of the tools are optimized only for the majority at this point because of our level of sophistication or level of maturity in the field , and so there's a big challenge in terms of being able to use, for example, natural language processing techniques with languages that are not majority, or not spoken at many contexts. This is actually something that the group at MIT that I mentioned is considering diving in to and part of their efforts to identify how to fine tune existing tools that are built for majority languages to better accommodate different dialects and different minority languages across the globe. It's definitely an issue that can't be taken for granted. So at this point I think we are we have time for one more question. So Joshua Machleder says, will there be any other training opportunities on machine learning or AI in DC in the near future? That's a great question. We don't have anything scheduled for DC right now. The only training that we have in the near future is going to be based out of RDMA, our regional mission in Bangkok, and that's scheduled for mid- November. I think we're going to be considering, once that training is done, how it could be replicated in other spots so we'll definitely keep you posted if there advances on that front.I think at this point I am supposed to hand this over to Gail. I think we're done! Hi this is Craig again, apparently they are having some some mic difficulties over at Aubura and Gail's end, so I'll wrap up. Thank you so much to everyone for tuning in, and the really great questions and comments and sharing of links and resources that was going on down in the chat box - I'm always really happy to see that sort of engagement happening. We do have a final poll that you should already be able to see here. It was asking how the how the content of the presentation match your needs. It's good to see that a lot of people answered that one already. And so I think yeah if that's if that's all, I'd like to thank my co-presenter and encourage everyone to stay tuned for the next digi-know webinar that will, that's still a bit under development now, but hopefully we'll be hearing more soon about what the next topic and what the next set of speakers is going to be Thanks.

Archived Content

Go to the current USAID.gov website for up-to-date information

Archive - U.S. Agency for International Development

Videos

Speeches Shim

More Videos

Pages

Join

Act

Partner

Comment

Connect